Cannot save training episodes: "TypeError: Object of type ndarray is not JSON serializable"

FKhadivpour commented 3 years ago

I am trying to train a ppo agent using this repository, which is a repo for the DRL implementations compatible with a multi-agent environment (Overcooked). I am trying to run this code, which is for training one agent with PPO. I just made a small change in this script, which is for the rllib agent and training utils.

In line 579, where I want to define a ppo trainer from ray rllib, I added a couple of parameters for trajectory traces ("output" and "output_compress_columns"):

trainer = PPOTrainer(env="overcooked_multi_agent", config={
    "multiagent": multi_agent_config,
    "callbacks" : TrainingCallbacks,
    "custom_eval_function" : get_rllib_eval_function(evaluation_params, environment_params['eval_mdp_params'], environment_params['env_params'],
                                    environment_params["outer_shape"], 'ppo', 'ppo' if self_play else 'bc'),
    "env_config" : environment_params,
    "output": "logdir",
    "output_compress_columns": ["obs", "new_obs"],
    "eager" : False,
    **training_params
}, logger_creator=custom_logger_creator)
return trainer

I get an error when I try to execute the code:

` ERROR - PPO RLLib From Params - Failed after 0:00:13! Traceback (most recent calls WITHOUT Sacred internals): File "/Users/faraz/human_aware_rl/human_aware_rl/ppo/ppo_rllib_from_params_client.py", line 427, in main result = run(params) File "/Users/faraz/human_aware_rl/human_aware_rl/ppo/ppo_rllib_from_params_client.py", line 389, in run result = trainer.train() File "/Users/faraz/human_aware_rl/venv/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 495, in train raise e File "/Users/faraz/human_aware_rl/venv/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 484, in train result = Trainable.train(self) File "/Users/faraz/human_aware_rl/venv/lib/python3.8/site-packages/ray/tune/trainable.py", line 261, in train result = self._train() File "/Users/faraz/human_aware_rl/venv/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 151, in _train fetches = self.optimizer.step() File "/Users/faraz/human_aware_rl/venv/lib/python3.8/site-packages/ray/rllib/optimizers/multi_gpu_optimizer.py", line 145, in step samples = collect_samples(self.workers.remote_workers(), File "/Users/faraz/human_aware_rl/venv/lib/python3.8/site-packages/ray/rllib/optimizers/rollout.py", line 25, in collect_samples next_sample = ray_get_and_free(fut_sample) File "/Users/faraz/human_aware_rl/venv/lib/python3.8/site-packages/ray/rllib/utils/memory.py", line 32, in ray_get_and_free return ray.get(object_ids) File "/Users/faraz/human_aware_rl/venv/lib/python3.8/site-packages/ray/worker.py", line 1515, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError(TypeError): ray::RolloutWorker.sample() (pid=61728, ip=192.168.1.69) File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 417, in ray._raylet.execute_task.function_executor File "/Users/faraz/human_aware_rl/venv/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 531, in sample self.output_writer.write(batch) File "/Users/faraz/human_aware_rl/venv/lib/python3.8/site-packages/ray/rllib/offline/json_writer.py", line 64, in write data = _to_json(sample_batch, self.compress_columns) File "/Users/faraz/human_aware_rl/venv/lib/python3.8/site-packages/ray/rllib/offline/json_writer.py", line 121, in _to_json return json.dumps(out) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/init.py", line 231, in dumps return _default_encoder.encode(obj) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/encoder.py", line 199, in encode chunks = self.iterencode(o, _one_shot=True) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/encoder.py", line 257, in iterencode return _iterencode(o, 0) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/encoder.py", line 179, in default raise TypeError(f'Object of type {o.class.name} ' TypeError: Object of type ndarray is not JSON serializable

Process finished with exit code 1 `

I was wondering if there is a way to save all training episodes?

richardliaw commented 3 years ago

@sven1977 can you take a look at this? it seems like when the worker writes out certain datatypes it fails?

sven1977 commented 3 years ago

Could you provide a small, self-sufficient reproduction script? Happy to install missing modules.

sven1977 commented 3 years ago

Also, please provide your ray/rllib and python versions and your OS.

FKhadivpour commented 3 years ago

Hi, it is really easy to clone this repo and run this code (human_aware_rl/ppo/ppo_rllib_from_params_client.py) to train a ppo agent. You just need to add these two lines:

"output": "logdir", "output_compress_columns": ["obs", "new_obs"],

into line 579 of this script (human_aware_rl/rllib/rllib.py) after cloning the repo. These are the versions:

macOS Catalina Version 10.15.7 python 3.8 ray[rllib] 0.8.5

ray-project / ray

Cannot save training episodes: "TypeError: Object of type ndarray is not JSON serializable" #12951