nakamotoo / Cal-QL

official implementation for our paper Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
https://nakamotoo.github.io/Cal-QL
76 stars 5 forks source link

adroit environments are not actually sparse #6

Closed trevormcinroe closed 2 months ago

trevormcinroe commented 9 months ago

In the paper, the adroit environments are described as being sparse: "The agent obtains a sparse binary +1/0 reward if it succeeds in solving the task".

But, the code actually makes them not sparse due to reward_scale and reward_bias in: https://github.com/nakamotoo/Cal-QL/blob/main/scripts/run_adroit.sh#L30-L31

Is the code incorrect?

nakamotoo commented 3 months ago

Hi, we rescaled the reward from -1/0 to -5/+5, but it is still sparse in the sense that the trajectory contains only two values -- positive and negative rewards. We found that rescaling the reward can make CQL/Cal-QL train more stably.