Open AlexLewandowski opened 3 years ago
I can see why the base reward might not be very useful at increments of 0.5 in some cases.
I am not entirely sure why it was set this way except that there was a concern that we were adding rewards too frequently.
We cannot change the default reward without "breaking the interface" but I think this is a candidate for adding an lever in the AgentInterface to adjust this increment with a default at 0.5.
in smarts/core/sensors.py:941,
threshold_for_counting_wp = 0.5
is set. My understanding of this variable is that it sets a minimum on the reward, by only reporting distances over the threshold. For example, if a vehicle only travels0.01
for 5 time steps, they will receive 0 reward. When the vehicle begins to accelerate and travels0.1
for 5 time steps, the agent will receive a reward of0.55
on the very last time step. This is problematic because the reward can be sparse and is no longer a function of state to state transitions but of entire trajectories. Is there a reason for it to be set to0.5
? I would suggest we set this to0.0
or allow for customization.