vita-epfl / CrowdNav

[ICRA19] Crowd-aware Robot Navigation with Attention-based Deep Reinforcement Learning
MIT License
560 stars 166 forks source link

The training stopped halfway #50

Open cherrymilk opened 2 years ago

cherrymilk commented 2 years ago

When I start to train the agent, it seems some errors will stop the training. For example

2022-04-01 22:02:19, INFO: Current git head hash code: %s
2022-04-01 22:02:19, INFO: Using device: cpu
2022-04-01 22:02:19, INFO: Policy: CADRL without occupancy map
2022-04-01 22:02:19, INFO: human number: 5
2022-04-01 22:02:19, INFO: Not randomize human's radius and preferred speed
2022-04-01 22:02:19, INFO: Training simulation: circle_crossing, test simulation: circle_crossing
2022-04-01 22:02:19, INFO: Square width: 10.0, circle width: 4.0
2022-04-01 22:02:19, INFO: Current learning rate: 0.010000
malloc_consolidate(): invalid chunk size

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

or

..........................
...........................
2022-04-01 21:41:03, INFO: TRAIN in episode 321 has success rate: 0.00, collision rate: 0.00, nav time: 25.00, total reward: 0.0000
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

Can I ask what caused this?

ChanganVR commented 2 years ago

Hi @cherrymilk, I haven't seen this error before and it looks like some kind of memory error. Are you sure there are no other processes using too much memory and etc? It might be worth to install the repo on a clean environment to check if there is something wrong with your linux setup.