ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.35k stars 5.65k forks source link

Local Mode - TypeError: unhashable type: 'list' #4917

Closed sheetalsh456 closed 5 years ago

sheetalsh456 commented 5 years ago

I'm training an A3C trainer with Pytorch and tune.run().

     tune.run(a3c.A3CTrainer, local_dir=".", stop={"episode_reward_mean": 0.5}, 
     resources_per_trial={"cpu":1, "gpu":1},
     config={"env": "my_env",  
                             "use_pytorch": True,
                             "model": {"custom_model": "my_model"},
                             "num_workers": 1,
                             "train_batch_size": 32,
                             "sample_batch_size": 16,
                             "num_envs_per_worker": 1,
            })

I'm getting the error File "/mnt/miniconda/envs/proteus_rl_env/lib/python3.6/site-packages/ray/rllib/optimizers/async_gradients_optimizer.py", line 43, in step pending_gradients[future] = e TypeError: unhashable type: 'list'

The same code with another trainer, i.e, PGTrainer is working:

     tune.run(pg.PGTrainer, local_dir=".", stop={"episode_reward_mean": 0.5}, 
     resources_per_trial={"cpu":1, "gpu":1},
     config={"env": "my_env",  
                             "use_pytorch": True,
                             "model": {"custom_model": "my_model"},
                            "num_workers": 1,
                             "train_batch_size": 32,
                             "sample_batch_size": 16,
                             "num_envs_per_worker": 1,
                             })

Are there some specific changes to be done while running the A3C trainer?

Any leads will be appreciated!

richardliaw commented 5 years ago

Do you have local_mode=True set in ray.init?

roireshef commented 5 years ago

@richardliaw - I get the same result when setting

local_mode=true

What's wrong with it? Isn't it suppose to be running locally well for debug purposes? How else would you debug the remote processes (using breakpoints, etc.)?

richardliaw commented 5 years ago

Oh, I think that comment was to make sure that local_mode was not on.

Could you try upgrading ray to 0.7.2 to see if you still have the same error?

roireshef commented 5 years ago

I'm actually running ray 0.7.2 with this parameter set to True. When running training with A3C it fails with the same error as the title says. From brief investigation, it looks like ray doesn't return the same object type for the .remote() calls when running with local_mode=true vs local_mode=false.

In local_mode=true, one of ray remote() calls returns a list, and the calling code block tries to add the returned value to a dictionary's list of keys, and a list can't act as a dict's key since it is not a "hashable" type.

richardliaw commented 5 years ago

I think this is fixed in the upcoming release (0.7.3). If you need the feature now, you can install the latest snapshot of master (https://ray.readthedocs.io/en/latest/installation.html#trying-snapshots-from-master)

richardliaw commented 5 years ago

OK this should be fixed in the release; closing for now but reopen if needed.