Cannot train model on ML-AGENTS Release 13

pawnjiang commented 3 years ago

Hi, I am a rookie in using Unity ML-AGENTS, and this dogfight model is really helpful. But I cannot train a new model for dogfight. When I press the play button, the scene got stuck and then quitted the training. I have tried many versions of ML-AGENTS, Pillow, but it doesn't work. If you need any other information, please tell me and I will reply to you soon.

Version information: Python 3.7.1 Unity 2020.2.6f1c1 personal ml-agents: 0.23.0, ml-agents-envs: 0.23.0, Communicator API: 1.3.0, PyTorch: 1.7.0+cpu Pillow: 8.1.2

2021-03-24 09:42:50 INFO [environment.py:205] Listening on port 5004. Start training by pressing the Play button in the Unity Editor. 2021-03-24 09:42:57 INFO [environment.py:111] Connected to Unity environment with package version 1.8.0-preview and communication version 1.4.0 2021-03-24 09:42:57 INFO [environment.py:271] Connected new brain: Pilot?team=0 2021-03-24 09:42:57 ERROR [subprocess_env_manager.py:193] UnityEnvironment worker 0: environment raised an unexpected exception. 2021-03-24 09:42:57 INFO [trainer_controller.py:85] Saved Model Traceback (most recent call last): File "c:\anaconda3\envs\ml-agents\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "c:\anaconda3\envs\ml-agents\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Anaconda3\envs\ml-agents\Scripts\mlagents-learn.exe__main__.py", line 7, in File "c:\anaconda3\envs\ml-agents\lib\site-packages\mlagents\trainers\learn.py", line 280, in main run_cli(parse_command_line()) File "c:\anaconda3\envs\ml-agents\lib\site-packages\mlagents\trainers\learn.py", line 276, in run_cli run_training(run_seed, options) File "c:\anaconda3\envs\ml-agents\lib\site-packages\mlagents\trainers\learn.py", line 153, in run_training tc.start_learning(env_manager) File "c:\anaconda3\envs\ml-agents\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped return func(*args, *kwargs) File "c:\anaconda3\envs\ml-agents\lib\site-packages\mlagents\trainers\trainer_controller.py", line 174, in start_learning self._reset_env(env_manager) File "c:\anaconda3\envs\ml-agents\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped return func(args, **kwargs) File "c:\anaconda3\envs\ml-agents\lib\site-packages\mlagents\trainers\trainer_controller.py", line 109, in _reset_env env_manager.reset(config=new_config) File "c:\anaconda3\envs\ml-agents\lib\site-packages\mlagents\trainers\env_manager.py", line 67, in reset self.first_step_infos = self._reset_env(config) File "c:\anaconda3\envs\ml-agents\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 299, in _reset_env ew.previous_step = EnvironmentStep(ew.recv().payload, ew.worker_id, {}, {}) File "c:\anaconda3\envs\ml-agents\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 95, in recv raise env_exception PIL.UnidentifiedImageError: cannot identify image file <mlagents_envs.rpc_utils.OffsetBytesIO object at 0x0000020A256A49B0>

mbaske commented 3 years ago

Thanks for pointing this out @pawnjiang. I don't think this is a version issue, although I've now upgraded the package file to the latest ml-agents release 1.9.0. The problem appears to be some prefab serialization issue, causing detectable object tags to dissapear from the sensor settings. Which in turn causes the sensor observations to be empty, hence the unidentifiable image error. Strangely, this only happens after opening the project and, at least for me, only when I retrieve the project from Github. Re-opening my local project seems fine. Anyway, I've added a notice to the example projects section - Please clone the updated repo and, after opening the project, reimport the two prefabs containing the sensors. Go to Assets/Examples/ReImport/ and reimport CarWithSensors and SpaceshipWithSensors.

I can't think of another way to fix this right now. Creating a fresh project apparently doesn't help. This might be a related issue: https://forum.unity.com/threads/data-corruption-with-prefab-import-with-new-prefab-workflow-still.660037/

pawnjiang commented 3 years ago

@mbaske ,cooooooool, thanks for helping! Now I can train new models for dogfight, I'll report back if there are any problems.

mbaske / grid-sensor

Cannot train model on ML-AGENTS Release 13 #2