liuzuxin / OSRL

🤖 Elegant implementations of offline safe RL algorithms in PyTorch
https://offline-saferl.org
Apache License 2.0
178 stars 12 forks source link

The parameters in cdt_configs.py #20

Closed ZhengyeHan closed 1 year ago

ZhengyeHan commented 1 year ago

I have successfully written a custom environment in the gymnasium and used it in CDT successfully,Here's the environment I created: image

but I ran into two problems:

  1. When the created data in my environment, such as 'actions','rewards', etc. are between 0 and 1, it will cause an error in CDT,It only works if the data range is between 0-100. I think I should modify the parameters in cdt_configs.py, but I don't know which parameter should be modified and how?

    2.There are some parameters in cdt_configs.py that I don't know what will happen if I change them?

image image image For example, num_heads, target_returns, cost_limit, deg, max_rew_decrease, max_reward, reward_scale should I change,?How should I change,?And what does them do? Or Is there any documentation for these parameters? May you help me?I'm really confused about these parameters!

liuzuxin commented 1 year ago

When the created data in my environment, such as 'actions','rewards', etc. are between 0 and 1, it will cause an error in CDT,It only works if the data range is between 0-100. I think I should modify the parameters in cdt_configs.py, but I don't know which parameter should be modified and how?

I am not sure what errors are you referring to. The code can not run or the results are not expected? The datasets should adhere to the DSRL dataset format, and it should be further processed to the trajectory dataset such that the inputs to the CDT model should be the target reward return and target cost return for one episode instead of one single step's immediate reward/cost. You can also consider rescaling your data range to 0-100 if that works well for you.

For example, num_heads, target_returns, cost_limit, deg, max_rew_decrease, max_reward, reward_scale should I change,?How should I change,?And what do they do? Or Is there any documentation for these parameters?

The target returns depend on your datasets, I would suggest you plot the cost-reward return plot of your datasets just like Fig. 1 and Fig. 7 in the CDT paper, and then select the proper target returns based on the plot. The cost limit corresponds to the cost threshold for your problem, it should be the same as your target cost return for CDT. deg, max_rew_decrease, max_reward are data augmentation-related parameters, you can find more details of them in Fig. 3 in the paper and in the code. Again, some key parameters such as max_reward and min_reward and max_reward_decrease depends on the cost-reward return plots of your datasets, while most other data augmentation parameters are quite general.

Other model architecture parameters are standard for Decision Transformer. If you're new to this domain, understanding the simpler Decision Transformer first can give you a foundational understanding which will make grasping CDT's nuances much easier. Therefore, I highly recommend you first read the CORL's implementation of DT, as it keeps the minimalist for the offline RL problem with sequential modeling. If your problem doesn't have additional constraint violation costs but only rewards, using DT should be enough.

ZhengyeHan commented 1 year ago

When the created data in my environment, such as 'actions','rewards', etc. are between 0 and 1, it will cause an error in CDT,It only works if the data range is between 0-100. I think I should modify the parameters in cdt_configs.py, but I don't know which parameter should be modified and how?

I am not sure what errors are you referring to. The code can not run or the results are not expected? The datasets should adhere to the DSRL dataset format, and it should be further processed to the trajectory dataset such that the inputs to the CDT model should be the target reward return and target cost return for one episode instead of one single step's immediate reward/cost. You can also consider rescaling your data range to 0-100 if that works well for you.

For example, num_heads, target_returns, cost_limit, deg, max_rew_decrease, max_reward, reward_scale should I change,?How should I change,?And what do they do? Or Is there any documentation for these parameters?

The target returns depend on your datasets, I would suggest you plot the cost-reward return plot of your datasets just like Fig. 1 and Fig. 7 in the CDT paper, and then select the proper target returns based on the plot. The cost limit corresponds to the cost threshold for your problem, it should be the same as your target cost return for CDT. deg, max_rew_decrease, max_reward are data augmentation-related parameters, you can find more details of them in Fig. 3 in the paper and in the code. Again, some key parameters such as max_reward and min_reward and max_reward_decrease depends on the cost-reward return plots of your datasets, while most other data augmentation parameters are quite general.

Other model architecture parameters are standard for Decision Transformer. If you're new to this domain, understanding the simpler Decision Transformer first can give you a foundational understanding which will make grasping CDT's nuances much easier. Therefore, I highly recommend you first read the CORL's implementation of DT, as it keeps the minimalist for the offline RL problem with sequential modeling. If your problem doesn't have additional constraint violation costs but only rewards, using DT should be enough.

Thank you very much,You are really nice and your answers are very detailed!Here is the error message If the data is in the range 0-1.

image

liuzuxin commented 1 year ago

You are welcome. I think you might adopt some different cost definitions than our datasets, where we use a binary indicator signal to represent constraint violation (1) or safe (0), so we add an additional cost prediction head as an auxiliary loss during training. If you have a different cost definition, you might need to change the cost prediction head here and loss here.

In fact, the cost pred loss doesn't affect the final performance a lot, so the simplest way is to comment all the code related to the cost prediction cost_loss. Then you might be able to get rid of this issue.

ZhengyeHan commented 1 year ago

You are welcome. I think you might adopt some different cost definitions than our datasets, where we use a binary indicator signal to represent constraint violation (1) or safe (0), so we add an additional cost prediction head as an auxiliary loss during training. If you have a different cost definition, you might need to change the cost prediction head here and loss here.

In fact, the cost pred loss doesn't affect the final performance a lot, so the simplest way is to comment all the code related to the cost prediction cost_loss. Then you might be able to get rid of this issue.

That's great! You found the point of the problem, I created the data in the following way, My cost is not 1 or 0, but a floating point number.: image

so I think I should change the "2" in self.cost_pred_head=nn.Linear(embedding_dim,2)to 1 image

but I need to make sure the algorithm works as well as possible, so I try not to comment the code. One more question,I also get this error when I run it, which parameter do you think is related to it?

image

liuzuxin commented 1 year ago

This is potentially because the filtering parameters are not correct. Again, I would suggest you plot your dataset's reward and cost return figure to adjust the cmin, cmax, rmin, rmax, and bins parameters. If these parameters are out of your dataset's scopes, then the data augmentation step can not extract the pareto frontier point and thus can not perform data augmentation.

If you don't want to use data augmentation, you can also comment out related code to proceed.

ZhengyeHan commented 1 year ago

This is potentially because the filtering parameters are not correct. Again, I would suggest you plot your dataset's reward and cost return figure to adjust the cmin, cmax, rmin, rmax, and bins parameters. If these parameters are out of your dataset's scopes, then the data augmentation step can not extract the pareto frontier point and thus can not perform data augmentation.

If you don't want to use data augmentation, you can also comment out related code to proceed.

You've helped me a lot. Please close this.