Closed pjwozny closed 9 months ago
one extra addition to make the extraction of the env_obj a bit more flexible. if i have num_workers > 0, then my local_worker is an empty object. this change just gets it the env_obj from all workers and takes the env from the 1st. The 0th is the local_worker which is still empty. Unsure why that is the case, but this works and trains.
Hmmm interesting, I had initially added to string to get it to work with the new version of Ray. I'll test locally tomorrow or Tuesday, and merge when I confirm it works
Strange, for ref heres my ray version from my conda env:
ray 2.8.1 pypi_0 pypi
Works for python 3.7, python 3.9 with ray==2.7.2 (this is the latest version compatible with python 3.7).
removed the cast reward dict keys to string, fixes the 0 reward issue
Is there a certain instance where we need the reward dict keys to be string? If not, this should do the trick.