motional / nuplan-devkit

The devkit of the nuPlan dataset.
https://www.nuplan.org
Other
662 stars 126 forks source link

Can we get the max iteration.index of a scence to close our thread in Planner? #300

Closed piqiuni closed 1 year ago

piqiuni commented 1 year ago

Since we are using ROS to run our rule based planner, we need to open rosnode, which lead to error at the end of one scene. And the error info is shown blow, our Planner with thread cannot be pickled.

pickle_object = pickle.dumps(self, protocol=pickle.HIGHEST_PROTOCOL) TypeError: cannot pickle '_thread.RLock' object

In some scenes, we close the rosnode at 'iteration.index == 148' to avoid the error, the simulation could run successfully. But in othors, the end iteration.index is not 148 but 198, we can't close the rosnode and unable to avoid the error.

So we hope to get the maximum iteration of the scene to solve this problem, or would you provide us a better solution?

Thanks!

patk-motional commented 1 year ago

Hi @piqiuni,

The reason that the scenario length changes is because it defaults to 20 seconds if the scenario type is unknown. You could manually change the default value or specify scenario types when you simulate.

For the competition, the scenario length should always be 15s at 10hz. However, we cannot give you the max iteration. This would require us to change a core interface.

CristianGariboldi commented 8 months ago

Hello, I have a similar problem to the one of @piqiuni and would like to know if you have found a solution. I built a bridge between nuPlan and ROS2 environment to be able to use my planner in the simulator. The problem is that when I run the ros bridge, the simulation outputs this error:

File "/home/dept/nuplan-devkit/nuplan/planning/simulation/simulation_log.py", line 38, in _dump_to_msgpack
(wrapped_fn pid=2576829)     pickle_object = pickle.dumps(self, protocol=pickle.HIGHEST_PROTOCOL)
(wrapped_fn pid=2576829) TypeError: cannot pickle 'select.epoll' object
(wrapped_fn pid=2576829) WARNING:nuplan.planning.simulation.runner.executor:Simulation failed with error:
(wrapped_fn pid=2576829)  cannot pickle 'select.epoll' object

I get the histograms with the metrics, but I cannot visualize the scenarios on nuboard. I was wondering if you have found a solution to the problem. Thank you!

piqiuni commented 8 months ago

@CristianGariboldi
Since it's been a long time, I don't clearly remember the process. In my code, I init the rosnode and deal with rosmsg in class SIM, and let it be self.sim in "class HAStarIDMPlanner(AbstractIDMPlanner):", when iteration over, I set self.sim = None to delete the ros related functions

if iteration.index == 148: print(f"over, iteration.index={iteration.index}") self.close_node() def close_node(self): self.sim = None

Above, we can run the whole test and visualize the scenarios on nuboard Hope it can help you~

CristianGariboldi commented 7 months ago

@piqiuni thank you very much for the information and sorry for my late reply, busy times.

so, to the best of my understanding, what you do is reading the index of the iteration and before the scenario ends, you delete the ros related functions, but may I ask how do you do that?

and also, how do you start your node again for the next scenario? do you read the index of the iteration again and if it is 0 you start the ros related functions again? if yes, how?

also, does this process work for both training and simulation?

thanks a lot in advance!

piqiuni commented 7 months ago

@CristianGariboldi Unfortunately, days ago I broke my SSD and Nuplan code just been the little unbackuped files. Yes, We read the index of the iteration and before the scenario ends, delete the ros related functions, and delete code as shown above. The ROS init ralated code starts with our Planner.init(), I remember that in a scenario, the Planner will init() and del () automatically It work for simulation, and should work for trainning, I dont`t remember if we traning it.

CristianGariboldi commented 7 months ago

@piqiuni thanks again for your feedback, much appreciated!

It's weird, I reproduced the same process you developed but I still get the error:

File "/home/dept/nuplan-devkit/nuplan/planning/simulation/simulation_log.py", line 38, in _dump_to_msgpack
(wrapped_fn pid=2576829)     pickle_object = pickle.dumps(self, protocol=pickle.HIGHEST_PROTOCOL)
(wrapped_fn pid=2576829) TypeError: cannot pickle 'select.epoll' object
(wrapped_fn pid=2576829) WARNING:nuplan.planning.simulation.runner.executor:Simulation failed with error:
(wrapped_fn pid=2576829)  cannot pickle 'select.epoll' object

I built the bridge using roslibpy on ros2, and when the index of the iteration is 145, I close the communication, but still cannot avoid the error. I also tried to terminate the communication instead of just closing it, but still no improvements.

May I ask how you built the bridge? Did you also use roslibpy?

piqiuni commented 7 months ago

@CristianGariboldi We are using ROS1 on Ubuntu20, the bridge should be rospy, we just use it to initROSnode, Pub, and Sub msgs. According to your Error Info, I don`t know if the select.epollis a part of roslibpy. I think maybe you didn't clean all the ros related objects? You can try my method above to delete all of them.

let it be self.sim in "class HAStarIDMPlanner(AbstractIDMPlanner):", when iteration over, I set self.sim = None to delete the ros related functions

And print iteration and delete info to make sure it works.

CristianGariboldi commented 7 months ago

@piqiuni I think we are doing the same thing here:

if (self.client_traj is None or not self.client_traj.is_connected) and 2 <= index_iter3 < 141:
            self.client_traj = Ros(host='localhost', port=9090)
            self.client_traj.run()

        elif self.client_traj is not None and self.client_traj.is_connected and index_iter3 >= 141:
              self.client_traj.terminate()
              self.client_traj = None

In this code, when the iteration index is between the threshold, I run the client and start the roslibpy websocket connection, and I start to subscribe and publish topics. When iteration is equal to 141, I terminate the connection and set the client to None.

But I still get the same error. Am I missing something here? I also tried to call the roslibpy functions in another class and pass the arguments to AbstractIDMPlanner to avoid interferences with the planner class but nothing changed.

@patk-motional maybe do you also have an idea? Thanks!

piqiuni commented 7 months ago

@CristianGariboldi Maybe not caused by ros? Check if there are errors running without ros.

CristianGariboldi commented 7 months ago

@piqiuni no worries, I have solved the problem. I had to set also the RosBridge class to None when simulation ends, this way, there is no conflict.

Thanks a lot for your huge help, I really appreciate it!

CristianGariboldi commented 7 months ago

@piqiuni I'm sorry, I'd like just to ask you one more thing:

when running the simulation, do you set the workers to sequential or ray_distributed? I set it to sequential in order not to have parallelized scenarios.

also, when a scenario ends, you set the ros related functions to none, but how do you start the connection again when the next scenario starts? I'm having problems to automatically reconnect my rosbridge when a scenario ends and the next one is starting.

thanks again for your availability

piqiuni commented 7 months ago

@CristianGariboldi We use sequential, and it can run parallel with different node name and topic namespace. I remember the platform will rebuild an object(ego = Planner()) and run Planner.init(), and init in it. Or you can add an if in step() to init ros in the first frame.