Closed akeaveny closed 3 years ago
Hi!
Are you running robo-gym and the robot-server on the same pc or on 2 separate machines?
Could it be related to the the fact that after 4 hours the pc goes in sleep mode or something like that?
Same pc!
I don't believe my pc goes to sleep as I've ran other process overnight. I also disabled sleep mode earlier this week using
sudo systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target
.
Aidan
Do you have access to the output log from the Server Manager to see if some more info is given in there?
Hmm, I checked /home/akeaveny/robogym_ws/logs/uwrt_robot_server
but couldn't find anything useful.
Was this what you had in mind?
Mmm no, we introduced some time ago a logger, and you should have the logs under '/home/akeaveny/robogym_ws/src/robo-gym-robot-servers/logs/ at least for the ur robot server, this is initialised here https://github.com/jr-robotics/robo-gym-robot-servers/blob/35802004460600f2ea2d3f7d1b5205969c7f65a9/ur_robot_server/scripts/robot_server.py#L62 . I don't know if you have the same for your robot server but adding this to your robot server definitely will help you to see more information.
Now going back to our issue, we always train for long times (up to 48h) and we never had that specific issue.
Once we had an issue with the network card of pc that was going to 'sleep' and we just have leave a terminal open with a ping to google.com to solve that.
The weird thing here
File "/home/akeaveny/git/robo-gym/robo_gym/envs/UWRTArm/UWRTArm.py", line 362, in reset rs_state = copy.deepcopy(np.nan_to_num(np.array(self.client.get_state_msg().state))) AttributeError: 'UWRTArmSim' object has no attribute 'client'
is that it cannot find the client attribute and this is just an attribute of the object that was there all the time, most of the times if something goes wrong with the connection we see gRPC errors.
Have you always had the same error in multiple overnight trainings?
Thanks for this, I added this block to our robot_server.py :)
Yeah, I've had the same errors for two consective nights... Similar to your envs, I init UWRTArmSim which wraps our UWRTArmEnv here. Then connect to the Robot server here.
What's strange is that is gives this error ~4hrs into training each time, so my first guess was my pc was sleeping at this point.
Yes, it is very strange indeed. Have you ever trained overnight with the same algorithm on other environments, for instance from the OpenAI Gym? This could help to understand if the error is related to robo-gym or if it is something related to the pc settings.
I'm going to close this as I don't think it's related to robo-gym. I verified that it isn't a connection issue as I ran it during the day, yesterday. It's strange because we ran the same script with OpenAI env & PyBullet here.
Cheers!
Ok, I am sorry to hear that, I hope you can manage to fix the issue soon!
Hi @matteolucchi,
I need your help again!
My desktop has limited resources so I train overnight. My latest issue is that the robot server cannot communicate with the client after ~4 hours.
Here's the error message:
Cheers, Aidan