Currently, the reward is scaled based on hits and misses.
The agent isn't making use of the cache capacity available.
We should pass the cache capacity as an argument and scale the reward that it encourages caching at the beginning. i.e. reward is * (1 + cached_entry/cache_capacity)
Conduct experiments with this reward vs other rewards
Currently, the reward is scaled based on hits and misses.
The agent isn't making use of the cache capacity available.
We should pass the cache capacity as an argument and scale the reward that it encourages caching at the beginning. i.e. reward is * (1 + cached_entry/cache_capacity)
Conduct experiments with this reward vs other rewards