real-stanford / diffusion_policy

[RSS 2023] Diffusion Policy Visuomotor Policy Learning via Action Diffusion
https://diffusion-policy.cs.columbia.edu/
MIT License
1.39k stars 260 forks source link

Segmentation fault probably associated with OpenGL or multithreading? #76

Open sys-shutdown opened 4 months ago

sys-shutdown commented 4 months ago

Thanks for your excellent work!

I am trying to apply this method to my own task by imitating the PushT example. I get the observation of the simulated environment through OpenGL. I have also written a script like the demoPushT.py to collect demonstrations and made my own dataset. My Env can render frames through OpenGL or Pygame successfully in the demonstration script. However, after I have written my own EnvRunner according to the pusht_image_runner.py and trained the policy, it always fails at the eval part of the first epoch with a segfault.

Exactly, at the beginning of "env.reset()" function, when I tried to initialize the display window using "pygame.display.init()","pygame.display.set_mode()" or "glfw.create_window()", the segfault happens. I have tried to create a dummy_env_fun as you suggested in README, and also tried the SyncVectorEnv, or even with only one environment run at once, but cannot solve it. Besides, I also tried to add the above codes like "pygame.display.init()" before the "obs=env.reset()" in pusht_image_runner.py , and also reproduced the segfault.

The simulator I used only give OpenGL-based APIs for visualization, so it is inevitable to do these initializations. In addition, the backtrace information also shows the error seems to be associated with "pthread_mutex_lock" but I don't know the exact problem. Do you have any ideas?

sys-shutdown commented 4 months ago

I fixed this problem by adding "import pygame" in the train.py file, and "pygame.display.init()" at the beginning of my env_fn. I guess this bug has something to do with the Pygame or OpenGL module's initialization in sub-threads.