lgsvl / simulator

A ROS/ROS2 Multi-robot Simulator for Autonomous Vehicles
Other
2.28k stars 781 forks source link

Clock Sensor Stops Publishing After A While #1932

Open alperenbayraktar opened 2 years ago

alperenbayraktar commented 2 years ago

Trying scenarios with different vehicle configs and it turns out when I go with more than 1 lidar, the clock sensor stops publishing after 20 to 30 messages. Although I get like 200 fps on the build in Windows computer, 5 lidars work on 5Hz and we see sync problems on rviz on the other ubuntu computer of the lidar data like the points hitting on the moving npc are stuttering, teleporting back. -use_sim_time on the autoware is set to true. -Using a custom ros bridge. -Using a custom build(2021.3) -Tried on different computers with 2080ti and 3090. -Should I overwrite something on the clock sensor? Is it because the Sending status stays true on the Publisher while writing somehow? I could see this when debugging the api run in the engine. -It shows that I have extra GPU memory left for usage while running on the task manager. Linking @issue1885 Linking @issue653

EricBoiseLGSVL commented 2 years ago

Are you using one machine for both autoware and simulator? We recommend using multiple computers with distributed simulations to handle multiple heavy sensor configs including a separate machine for autoware.

alperenbayraktar commented 2 years ago

@EricBoiseLGSVL I've made some progress meanwhile, We are working on different machines with a 1gbit connection and cat6 ethernet cables. What we have tested: -Sim on 2080ti windows and autoware on 2070 max q ubuntu 18.04: Works fine, I get like 120fps on the build autoware on 2080 max q ubuntu 18.04: same and they do not have the problem. -Sim and autoware both on 2080 max q ubuntu with headless mode on: lidars are slower than normal, the clock stops. -Sim on 3090 windows and autoware on 2080 max q ubuntu: close to 200 fps with no problem -Sim on 3090 ubuntu 18.04 and autoware on 2080 max q ubuntu: we get like 30fps and clock stops, not only in API mode but on normal mode too. The scenario freezes after a while but the sim is still interactable.

I've made some changes on the PointCloudData.cs after finding out my 32 channel lidar shows my 128 channel points on rviz and causing a serious error in BridgeMessageDispatcher, I was about to create a pr request but someone already did: #1907 This fixes the slowing down problem in a drastic way. To make sure I have an extra core left to use, on the QueueTask function I use maxWorkerCount - 1, I've also changed the ClockSensor a bit to send the last message on the queue and force set Sending to true after a while(commented it out) but I guess these two become irrelevant after solving the lidar problem.

I will try forcing the build to run on Nvidia's GPU but not the intel tomorrow although I see it occupies sources on the 3090 by checking watch nvidia-smi on 3090 ubuntu. Is this also related: #1441?

EricBoiseLGSVL commented 2 years ago

Yes, that PR fix is good. Yes, you need to make sure the simulator is using the GPU not the intel. Hopefully this fixes the issue for you.