Closed osrf-migration closed 4 years ago
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
I re-tried with Urban Qualification world with the same result :disappointed:
(robotika, ver26urbanQb, 049839f8-8e47-44c6-8511-689cbae1ffad, Terminated, Error: InitializationFailed)
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
p.s. what changed on our side is that now we are subscribing to /clock
Original comment by Hector Escobar (Bitbucket: hector_escobar).
Make sure you have any visualization off. Like no plots from Rviz. We saw that happening in the past when an image had plots enabled.
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
thanks for hint - but I am not using ROS except subscribing to topics and newly /clock
can this be related to cloudsim update? I will check old ver25 later on, thanks
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
it looks like it is somehow related to new solution docker image (ver26) because experiment with old (ver25) just passed on “Simple Urban 02” … hmmm
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
Can we have access to AWS log files? This feedback is very slow :disappointed: … as I mentioned what changed is usage of /clock
and in other topic (#261) there are discussions that maybe clock was the problem during finals of Tunnel Circuit. The modified code is here:
https://github.com/robotika/osgar/commit/99ad3e16204075f58d1355d0bf0dc036e9ca8f22
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
Note, that I can download the ROS stdout logs (both CloudSim as well as both robots) and I see that for example clock was subscribed:
[rosmaster.master][INFO] 2019-11-09 08:21:52,976: publisherUpdate[/clock] -> http://10.40.160.2:40507/ ['http://10.40.160.2:46267/']
[rosmaster.master][INFO] 2019-11-09 08:21:52,977: publisherUpdate[/clock] -> http://10.40.160.2:45579/ ['http://10.40.160.2:46267/']
[rosmaster.master][INFO] 2019-11-09 08:21:52,977: publisherUpdate[/clock] -> http://10.40.160.2:40507/ ['http://10.40.160.2:46267/']: sec=0.00, result=[1, '', 0]
[rosmaster.master][INFO] 2019-11-09 08:21:52,978: publisherUpdate[/clock] -> http://10.40.160.2:46267/ ['http://10.40.160.2:46267/']
[rosmaster.master][INFO] 2019-11-09 08:21:53,000: publisherUpdate[/clock] -> http://10.40.160.2:46267/ ['http://10.40.160.2:46267/']: sec=0.02, result=[1, '', 0]
[rosmaster.master][INFO] 2019-11-09 08:21:53,006: publisherUpdate[/clock] -> http://10.40.160.2:45579/ ['http://10.40.160.2:46267/']: sec=0.03, result=[1, '', 0]
but the ROS stdout is almost empty (example robot A):
0.a527ae8c41762a101acbf2474382f82acd977df2 Node Startup
0.a527ae8c41762a101acbf2474382f82acd977df2 INFO /rosbag_record_console [/tmp/binarydeb/ros-melodic-rosbag-1.14.3/src/recorder.cpp:210(shared_ptr<ros::Subscriber> rosbag::Recorder::subscribe)] [topics: /rosout] Subscribing to record
54.960000000 INFO /rosbag_record_console [/tmp/binarydeb/ros-melodic-rosbag-1.14.3/src/recorder.cpp:394(Recorder::startWriting)] [topics: /rosout] Recording to /home/developer/.ros/rosout_agg.bag.
0.a527ae8c41762a101acbf2474382f82acd977df2 INFO /tf_world_static [/tmp/binarydeb/ros-melodic-tf2-ros-0.6.5/src/static_transform_broadcaster_program.cpp:142(main)] [topics: /rosout, /tf_static] Spinning until killed publishing world to urban_qual
Original comment by Nate Koenig (Bitbucket: Nathan Koenig).
Does your solution work using docker-compose?
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
Locally I use 3 docker solution - suggested by Alfredo when I moved to Docker Ignition, i.e. CloudSim, Bridge and Solution. What is the difference when docker-compose is used?
Original comment by Alfredo Bencomo (Bitbucket: bencomo).
Martin,
Just for clarification. Running CloudSim + Bridge + Solution, it is just one of the suggested scenarios for development.
In preparation for a Cloudsim submission, we encourage teams to test/run their solution using Docker-Compose as the final step before they submit their solution(s) into the Cloudsim portal. It provides you with the closest `local` scenario to the real Cloudsim environment and it allow you to visually spot any last minute issues since it generate scores and logs locally.
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
OK, I will give it a try. For development I work inside the Solution docker (typically running git pull
, upload OSGAR logs, etc.) and in other two consoles I have CloudSim and Bridge which I periodically restart. Only before the final docker image completion and submission I run it also “externally”.
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
OK, I test it and it worked fine, as with 3 dockers …
...
solution1_1 | [ INFO] [1573748402.389528967, 44.892000000]: Python3: stdout 976000 b'\x19\xcd\x18\x18\xcd\x18/\xcd\x1f7\xcd\x1f7\xcd\x19\xd0\x
cd\x19\xc2\xcd\x19\x1f\xcd\x18\xb8\xcd\x18\xab\xcd\x18\xd3\xcd\x18\xd1\xcd\x18\xe7\xcd\x19\x14\xcd\x19\x16\xcd\x19\x1f\xcd\x199\xcd\x19S\xcd\x1
9l\xcd\x19\x88\xcd\x19\x8f\xcd\x19\xbb\xcd\x19\xb9\xcd\x19\xe9\xcd\x19\xfe\xcd\x1a\t\xcd\x1a&\xcd\x1aE\xcd\x1a\\\xcd\x1a{\xcd\x1a\xa9\xcd\x1a\x
ad\xcd\x1a\xda\xcd\x1b\x04'
solution1_1 | [ INFO] [1573748402.389555794, 44.892000000]: Python3: stdout 976100 b'\xcd\x1b\x06\xcd\x1b(\xcd\x1bG\xcd\x1bt\xcd\x1b\x85\xcd\x
1b\xa8\xcd\x1b\xd1\xcd\x1b\xfb\xcd\x1c\x08\xcd\x1c5\xcd\x1cc\xcd\x1ct\xcd\x1c\xa2\xcd\x1c\xd8\xcd\x1c\xed\xcd\x1d\x1f\xcd\x1dD\xcd\x1di\xcd\x1d
\xa0\xcd\x1d\xb2\xcd\x1d\xeb\xcd\x1e\x1e\xcd\x1e.\xcd\x1e\x7f\xcd\x1e\xb8\xcd\x1e\xd9\xcd\x1f\x0c\xcd\x1f;\xcd\x1fn\xcd\x1f\x9c\xcd\x1f\xe3\xca
X\xb7\x03\r\x00\x01'
^CGracefully stopping... (press Ctrl+C again to force)
Stopping docker_solution1_1 ... done
Stopping docker_bridge1_1 ... done
Stopping docker_sim_1 ... done
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
(there is still dump of OSGAR log file into stdout, so that was reason why I killed it by Ctrl+C)
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
I added simple “unittest” for subscription of topis including /clock
:
https://github.com/robotika/osgar/tree/feature/virtual-ignition/subt/docker/unittest
It is based on subt-seed example/docker image. It only counts incoming messages and every 100 it prints the count (except clock which is every 1000). Now it is running with single robot (the command is (0, 0), i.e. not moving.
We will see.
p.s. “robotika, unittest1, 9067dcb9-ec62-42c5-b271-cb035c58d6b7”
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
Terminated with no errors … hmm, looks good … hmm
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
0.a527ae8c41762a101acbf2474382f82acd977df2 Node Startup
0.a527ae8c41762a101acbf2474382f82acd977df2 INFO /tf_world_static [/tmp/binarydeb/ros-melodic-tf2-ros-0.6.5/src/static_transform_broadcaster_program.cpp:142(main)] [topics: /rosout, /tf_static] Spinning until killed publishing world to urban_qual
0.a527ae8c41762a101acbf2474382f82acd977df2 WARN /rosbag_robot_data [/tmp/binarydeb/ros-melodic-rosbag-1.14.3/src/recorder.cpp:165(Recorder::run)] [topics: /rosout] /use_sim_time set to true and no clock published. Still waiting for valid time...
0.a527ae8c41762a101acbf2474382f82acd977df2 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:167(Controller)] [topics: /rosout] Waiting for /clock, /subt/start, and /subt/pose_from_artifact
so it was “running” but did not receive any data??
was_started: 0
sim_time_duration_sec: 0
real_time_duration_sec: 0
model_count: 1
hmm
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
I run docker compose with unittest above and
bridge1_1 | [ WARN] [1573848592.055461422]: /use_sim_time set to true and no clock published. Still waiting for valid time...
is this message OK? Is it expected?
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
but then it works locally (again I forgot about that extra 30s waiting in the original run:
solution1_1 |
solution1_1 | [ INFO] [1573848755.458464420]: Waiting for /clock, /subt/start,
and /subt/pose_from_artifact
solution1_1 | [ INFO] [1573848755.751396921, 5.708000000]: Using robot name[X2L]
solution1_1 |
solution1_1 | [ INFO] [1573848755.855716860, 5.708000000]: Sent start signal.
solution1_1 | [ INFO] [1573848756.116897510, 5.776000000]: received Clock 0
solution1_1 | [ INFO] [1573848756.305169740, 5.856000000]: received Odom 0
solution1_1 | [ INFO] [1573848756.316850391, 5.860000000]: received Imu 0
solution1_1 | [ INFO] [1573848756.333251669, 5.864000000]: received Image 0
solution1_1 | [ INFO] [1573848756.348732267, 5.872000000]: received Scan 0
solution1_1 | [ INFO] [1573848761.730888492, 7.856000000]: received Odom 100
solution1_1 | [ INFO] [1573848761.742405871, 7.860000000]: received Imu 100
solution1_1 | [ INFO] [1573848766.898960025, 9.768000000]: received Clock 1000
solution1_1 | [ INFO] [1573848767.147004835, 9.856000000]: received Odom 200
solution1_1 | [ INFO] [1573848767.153366674, 9.860000000]: received Imu 200
solution1_1 | [ INFO] [1573848769.453284897, 10.680000000]: received Image 100
solution1_1 | [ INFO] [1573848769.497763395, 10.696000000]: received Scan 100
solution1_1 | [ INFO] [1573848772.675202569, 11.856000000]: received Odom 300
solution1_1 | [ INFO] [1573848772.683042708, 11.860000000]: received Imu 300
solution1_1 | [ INFO] [1573848777.947787878, 13.768000000]: received Clock 2000
(locally)
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
Alfredo Bencomo (bencomo) is it OK, that subt_solution_latest
on https://hub.docker.com/r/osrf/subt-virtual-testbed/tags is 2 months old? Our image is based on this example solution, so I am just checking. Also can you please confirm that for you the unittest works locally fine?
p.s. I am going to comment out subscription of /clock
topic and check it on AWS as unittest2
Original comment by Alfredo Bencomo (Bitbucket: bencomo).
Martin,
I just tested the subt_solution_latest
locally and still works fine with the latest Docker images for the Urban Circuit, as shown below.
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
Alfredo,
can you please test also the “unittest solution docker” mentioned above? Ideally locally and on AWS. I commented out subscription of /clock
(https://github.com/robotika/osgar/commit/8e37ea0ea4dbd30488c5f192ae45de475b2a99c2) but on AWS is the result still the same - no stdout:
0.a527ae8c41762a101acbf2474382f82acd977df2 Node Startup
0.a527ae8c41762a101acbf2474382f82acd977df2 INFO /tf_world_static [/tmp/binarydeb/ros-melodic-tf2-ros-0.6.5/src/static_transform_broadcaster_program.cpp:142(main)] [topics: /rosout, /tf_static] Spinning until killed publishing world to urban_qual
0.a527ae8c41762a101acbf2474382f82acd977df2 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:167(Controller)] [topics: /rosout] Waiting for /clock, /subt/start, and /subt/pose_from_artifact
and no simulation running:
was_started: 0
sim_time_duration_sec: 0
real_time_duration_sec: 0
model_count: 1
The docker images are: 200670743174.dkr.ecr.us-east-1.amazonaws.com/subt/robotika:unittest1
and 200670743174.dkr.ecr.us-east-1.amazonaws.com/subt/robotika:unittest2
thanks
Martin
p.s. I tested it on Urban Qualification world with robot X2 config #4, 9067dcb9-ec62-42c5-b271-cb035c58d6b7
and 596715e8-e8bb-4881-8884-5dbe7c6c3294
Original comment by Nate Koenig (Bitbucket: Nathan Koenig).
Martin,
We added a sidecar container to the cloudsim bridge computer on Nov 13th. This sidecar container served to capture log files in the event of an unexpected termination of Gazebo or a team solution. There is an unexplained problem with this sidecar that caused some, but not all, topics to fail. You encountered this problem with your unit test. We have since taken an alternate approach, which was deployed on November 19th. Can you please try your unit test again.
This was brief error, due to a new feature that was deployed after the Tunnel Circuit. Thanks for making your unit test. We plan to keep using it along with our other tests.
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
OK, thanks for info. unittest1b
started as cbb0594b-2a7d-4904-b945-0d467a6f4820
now …
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
p.s. much better :wink:
208.632000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:210(Update)] [topics: ] Sent start signal.
208.660000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:50(clockCallback)] [topics: ] received Clock 0
208.880000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:58(imuCallback)] [topics: ] received Imu 0
208.876000000 INFO /X0F120R/ros1_ign_bridge_twist [/tmp/ros1-ign-bridge-release/include/ros1_ign_bridge/factory.hpp:137(Factory<ROS1_T, IGN_T>::ros1_callback)] [topics: /rosout] Passing message from ROS1 geometry_msgs/Twist to Ignition ignition.msgs.Twist (showing msg only once per type
208.884000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:82(odomCallback)] [topics: ] received Odom 0
208.888000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:66(scanCallback)] [topics: ] received Scan 0
208.908000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:74(imageCallback)] [topics: ] received Image 0
...
3858.880000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:58(imuCallback)] [topics: ] received Imu 182500
3858.884000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:82(odomCallback)] [topics: ] received Odom 182500
3858.892000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:66(scanCallback)] [topics: ] received Scan 73000
3858.904000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:74(imageCallback)] [topics: ] received Image 73000
3860.652000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:50(clockCallback)] [topics: ] received Clock 913000
3860.880000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:58(imuCallback)] [topics: ] received Imu 182600
3860.888000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:82(odomCallback)] [topics: ] received Odom 182600
thanks m.
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
Nate Koenig (Nathan Koenig) - I also re-run ver26
(ver26urbanQe, aefa9509-28d4-4b15-bfb0-7f454daff6f3
), but there is still the old problem Terminated Error: InitializationFailed
Original comment by Nate Koenig (Bitbucket: Nathan Koenig).
If I remember correctly, your version 25 worked. What had changed in your code between version 25 and version 26?
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
Nate Koenig (Nathan Koenig) there was only a small change related to subscription of /clock
… but I release newer one ver27
which worked fine … until I found that there was error during build and it was actually the same as ver26
:
200670743174.dkr.ecr.us-east-1.amazonaws.com/subt/robotika ver26 6add9f8c520f 2 weeks ago 3.59GB
200670743174.dkr.ecr.us-east-1.amazonaws.com/subt/robotika ver27 6add9f8c520f 2 weeks ago 3.59GB
so my current guess is that it is related to wrong sleep
in subt seed example:
https://github.com/robotika/osgar/pull/260#discussion_r349928330
or directly:
#!/osrf/subt_seed/src/default/docker/subt_solution/run_solution.bash
All new releases (now we are testing ver28
) already use --wait
instead of wrong sleep
. We will see if it helps.
As a side note - the dockers are still downloading external models, so I see errors like:
0.a527ae8c41762a101acbf2474382f82acd977df2 ERROR /X0F300R/ros_ign_bridge_gas [/tmp/ros-ign-bridge-release/src/parameter_bridge.cpp:180(main)] [topics: /rosout] Failed to create a bridge for topic [/model/X0F300R/gas_detected] with ROS type [std_msgs/Bool] and Ignition Transport type [ignition.msgs.Boolean]
and the orientation is not updated on AWS … but I expect that you plan a new release soon, maybe even today?
Original comment by Nate Koenig (Bitbucket: Nathan Koenig).
Sorry for the delay. I've seen more of your submissions. Are you still experiencing these problems?
Original comment by Martin Dlouhy (Bitbucket: robotikacz).
Nate Koenig (Nathan Koenig) - it is fine now (with --wait
) … but what I wanted to point out is that the example with sleep
is wrong/misleading.
Original comment by Nate Koenig (Bitbucket: Nathan Koenig).
Problem has been addressed.
Original report (archived issue) by Martin Dlouhy (Bitbucket: robotikacz).
Hello,
I tried to run new “ver26urbanS1” on “Simple Urban 1” world (5x X2 robots, robotika, 5d85f584-2f9e-444e-afda-30361336a26f) and it ended with “Terminated Error: InitializationFailed”.
What is recommended procedure in such case?
I tried locally single X2 robot and it works fine.
Would it be possible to add “Re-try” button? Something like in Jenkins “Rebuild” where you can edit parameters would be even better.:slight_smile: