Closed thomashirtz closed 1 week ago
I think that would be a nice addition, happy to review it :smile:
To be honest I think the discounts should probably be booleans while they are stored in the timestep because for me they just indicated end of episode, but I think this would add nice flexibility
To be honest I think the discounts should probably be booleans while they are stored in the timestep because for me they just indicated end of episode, but I think this would add nice flexibility
I'm fine with both, as long as it doesn't take too much space. I go with argument set by default to boolean ? or just boolean ?
My only issue is that this strays from the original dm_env api where it is a float so it can represent both RL discount (gamma) and done.
Let's definitely add it as an argument, but for the default I'm not sure if boolean or float32 is best @clement-bonnet any thoughts on this?
To my knowledge, having the discount as a float is more common than as a boolean for the reasons you mentioned @sash-a. I would keep it a float unless there are strong reasons to do otherwise :)
Great then if you could add the argument with a default of float32, I'm happy to accept the PR
Great then if you could add the argument with a default of float32, I'm happy to accept the PR
The PR is available for review :)
Is your feature request related to a problem? Please describe
In a personal project, I had to be very efficient in the memory management, my reward were taking a lot of space. In my case I needed to change the reward and the discount to float16 instead of float32. I had to copy over the types file to do the modification locally. However I feel like some use case may need this flexibility.
Describe the solution you'd like
Give an extra parameter to the step functions to give dtype. (Example with one of them)
I would be happy to do the PR.
Misc