Open duncanldavis opened 2 years ago
Ok, it is related to how the ray cluster is setup, when not connecting to the cluster via .init() the trainer works. Working through why everything else works but ppotrainer breaks.
When using num_workers: 0 PPOTrainer works but when it is 1+ I get the attached stack error
Latest ray libraries via pip install on python 3.8
code breaking
Error RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.init() (pid=2452, ip=10.139.64.8, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7fc4f840ed60>) At least one of the input arguments for this task could not be computed: ray.exceptions.RaySystemError: System error: _restore() takes 3 positional arguments but 4 were given traceback: Traceback (most recent call last): File "/databricks/python/lib/python3.8/site-packages/ray/serialization.py", line 332, in deserialize_objects obj = self._deserialize_object(data, metadata, object_ref) File "/databricks/python/lib/python3.8/site-packages/ray/serialization.py", line 235, in _deserialize_object return self._deserialize_msgpack_data(data, metadata_fields) File "/databricks/python/lib/python3.8/site-packages/ray/serialization.py", line 190, in _deserialize_msgpack_data python_objects = self._deserialize_pickle5_data(pickle5_data) File "/databricks/python/lib/python3.8/site-packages/ray/serialization.py", line 180, in _deserialize_pickle5_data obj = pickle.loads(in_band) TypeError: _restore() takes 3 positional arguments but 4 were given