osrf / cloudsim-legacy

Apache License 2.0
1 stars 1 forks source link

ROS Network is not working properly from the field computer to the local computer #49

Closed osrf-migration closed 11 years ago

osrf-migration commented 11 years ago

Original report (archived issue) by michele hallak (Bitbucket: mhallak).


Hi, We have the following configuration: a named "C11_Agent" running in the field computer and a named "c11" node running on a local computer. The two nodes communicate with ROS. However, if the communication from the local to the field computer is working well and smooth, the communication from the field computer to the local has a problem. For example, on the field computer:

ubuntu@ip-10-0-0-52:~/integrator/robil/C0_Scripts$ rosnode info /c11
--------------------------------------------------------------------------------
Node [/c11]
Publications:
 * /rosout [rosgraph_msgs/Log]

Subscriptions:
 * /clock [rosgraph_msgs/Clock]

Services:
 * /C11/push_img
 * /C11
 * /c11/get_loggers
 * /c11/set_logger_level
 * /C11/push_occupancy_grid

**contacting node http://localhost:49821/ ...
ERROR: Communication with node[http://localhost:49821/] failed!**

For sure, it shouldn't be local node but the node with ip 11.8.0.2.

I have no problem on the local computer accessing the ROS nodes running on field computer. We are running the trio constellation with all our nodes on the field computer apart of this specific node that needs to be on the local computer.

osrf-migration commented 11 years ago

Original comment by Hugo Boyer (Bitbucket: hugomatic, GitHub: hugomatic).


the ROS_IP of the local node is: 11.8.0.2 I can do everything from the local side and rosnode list and topics displays the whole list. However, when a node in the field computer has to send message to a node in the local computer, then, it doesn't work.

#!python

****************************************************************************************************************
Local Node ROS environment:
mhallak@darpaws4:~/trio_tutorial/router_cxe378f80a$ env | grep ROS
ROS_ROOT=/opt/ros/fuerte/share/ros
ROSLISP_PACKAGE_DIRECTORY=/opt/ros/fuerte/share/common-lisp/ros
ROS_PACKAGE_PATH=/usr/share/osrf-common-1.0/ros:/usr/share/drcsim-2.0/ros:/usr/share/sandia-hand-5.1/ros:/usr/share/drcsim-2.0/ros:/usr/share/drcsim-2.0/gazebo_models/irobot_hand_description/irobot_hand:/usr/share/drcsim-2.0/gazebo_models/irobot_hand_description/irobot_hand_on_box:/usr/share/drcsim-2.0/gazebo_models/multisense_sl_description/multisense_sl_description:/usr/share/drcsim-2.0/gazebo_models/multisense_sl_description/multisense_sl_on_box:/usr/share/drcsim-2.0/gazebo_models/atlas_description/atlas:/usr/share/drcsim-2.0/gazebo_models/atlas_description/atlas_sandia_hands:/usr/share/drcsim-2.0/gazebo_models/atlas_description/atlas_irobot_hands:/usr/share/drcsim-2.0/gazebo_models/environments/drc_vehicle:/usr/share/drcsim-2.0/gazebo_models/environments/standpipe:/usr/share/drcsim-2.0/gazebo_models/environments/powerplant:/usr/share/drcsim-2.0/gazebo_models/environments/drc_terrain:/usr/share/drcsim-2.0/gazebo_models/environments/golf_cart:/usr/share/drcsim-2.0/gazebo_models/environments/fire_hose:/opt/ros/fuerte/share:/opt/ros/fuerte/stacks:/userhome/mhallak/:/opt/ros/fuerte/share:/opt/ros/fuerte/stacks
ROS_MASTER_URI=http://10.0.0.51:11311
ROS_HOSTNAME=localhost
ROS_DISTRO=fuerte
ROS_IP=11.8.0.2
ROS_ETC_DIR=/opt/ros/fuerte/etc/ros

Local Node GAZEBO environment:
mhallak@darpaws4:~/trio_tutorial/router_cxe378f80a$ env | grep GAZ
GAZEBO_MODEL_PATH=/usr/share/sandia-hand-5.1/ros/sandia_hand_description/gazebo:/usr/share/drcsim-2.0/gazebo_models/irobot_hand_description:/usr/share/drcsim-2.0/gazebo_models/multisense_sl_description:/usr/share/drcsim-2.0/gazebo_models/atlas_description:/usr/share/drcsim-2.0/gazebo_models/environments:
GAZEBO_RESOURCE_PATH=/userhome/mhallak/michele/worlds:/usr/share/drcsim-2.0/worlds:/usr/share/gazebo-1.4:/usr/share/gazebo_models
GAZEBO_MASTER_URI=http://10.0.0.51:11345
GAZEBO_PLUGIN_PATH=/usr/lib/drcsim-2.0/plugins:/usr/lib/gazebo-1.4/plugins
GAZEBO_MODEL_DATABASE_URI=http://gazebosim.org/models
GAZEBO_IP=11.8.0.2

*****************************************************************************************************************************************
osrf-migration commented 11 years ago

Original comment by Hugo Boyer (Bitbucket: hugomatic, GitHub: hugomatic).


The environment did not yield any clue to your problem, and I am having trouble reproducing your problem here.

Can you try the following from the field computer:

rostopic pub /from_field std_msgs/String '{ data: bar }' -r 50

And on the local machine (OCU):

rostopic echo /from_field

Normally you should get data coming from the field computer to the OCU, like I do here.

Could you also give me the content of the following commands for the OCU and the field computer? It would not hurt to get them for the router as well, while we're at it.

ifconfig

route -n

This is the result I get from route -n on my OCU:


Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.1.254   0.0.0.0         UG    0      0        0 eth2
10.0.0.0        11.8.0.2        255.255.255.0   UG    0      0        0 tun0
11.8.0.1        0.0.0.0         255.255.255.255 UH    0      0        0 tun0
169.254.0.0     0.0.0.0         255.255.0.0     U     1000   0        0 eth2
192.168.1.0     0.0.0.0         255.255.255.0   U     1      0        0 eth2

And on my field computer:

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.0.0.1        0.0.0.0         UG    100    0        0 eth0
10.0.0.0        0.0.0.0         255.255.255.0   U     0      0        0 eth0
11.8.0.2        10.0.0.50       255.255.255.255 UGH   0      0        0 eth0
osrf-migration commented 11 years ago

Original comment by michele hallak (Bitbucket: mhallak).


Hi Hugo: I run the rosropic pub command as you said in the field computer. On the local machine, I got:

#!script

michele@darpaws5:~/trio/router_cxe378f80a$ rostopic echo /from_field
data: bar
---
data: bar
---
data: bar
---
data: bar
---
data: bar
---
data: bar
---

The field computer configuration:

#!script

ubuntu@ip-10-0-0-52:~$ ifconfig 
eth0      Link encap:Ethernet  HWaddr 12:9f:0d:ed:00:1c  
          inet addr:10.0.0.52  Bcast:10.0.0.255  Mask:255.255.255.0
          inet6 addr: fe80::109f:dff:feed:1c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:1037 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1088 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:91719 (91.7 KB)  TX bytes:108389 (108.3 KB)
          Interrupt:67 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

ubuntu@ip-10-0-0-52:~$ 
ubuntu@ip-10-0-0-52:~$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.0.0.1        0.0.0.0         UG    100    0        0 eth0
10.0.0.0        0.0.0.0         255.255.255.0   U     0      0        0 eth0
11.8.0.2        10.0.0.50       255.255.255.255 UGH   0      0        0 eth0

On the OCU:

#!script

michele@darpaws5:~/trio/router_cxe378f80a$ ifconfig
eth0      Link encap:Ethernet  HWaddr 00:0a:f7:0f:27:58  
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

eth1      Link encap:Ethernet  HWaddr 00:0a:f7:0f:27:5a  
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

eth2      Link encap:Ethernet  HWaddr 10:60:4b:6f:90:1f  
          inet addr:172.23.1.138  Bcast:172.23.1.159  Mask:255.255.255.224
          inet6 addr: fe80::1260:4bff:fe6f:901f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:564576 errors:0 dropped:0 overruns:0 frame:0
          TX packets:532433 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:385560540 (385.5 MB)  TX bytes:167961732 (167.9 MB)
          Interrupt:20 Memory:f6100000-f6120000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:69622506 errors:0 dropped:0 overruns:0 frame:0
          TX packets:69622506 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:380821744376 (380.8 GB)  TX bytes:380821744376 (380.8 GB)

tun0      Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
          inet addr:11.8.0.2  P-t-P:11.8.0.1  Mask:255.255.255.255
          UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1500  Metric:1
          RX packets:819 errors:0 dropped:0 overruns:0 frame:0
          TX packets:762 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100 
          RX bytes:523718 (523.7 KB)  TX bytes:66827 (66.8 KB)

michele@darpaws5:~/trio/router_cxe378f80a$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.23.1.158    0.0.0.0         UG    0      0        0 eth2
10.0.0.0        11.8.0.2        255.255.255.0   UG    0      0        0 tun0
11.8.0.1        0.0.0.0         255.255.255.255 UH    0      0        0 tun0
169.254.0.0     0.0.0.0         255.255.0.0     U     1000   0        0 eth2
172.23.1.128    0.0.0.0         255.255.255.224 U     1      0        0 eth2
michele@darpaws5:~/trio/router_cxe378f80a$ 

On the router:

#!script

ubuntu@ip-10-0-0-50:~$ ifconfig
eth0      Link encap:Ethernet  HWaddr 12:9f:0d:cb:1e:d7  
          inet addr:10.0.0.50  Bcast:10.0.0.255  Mask:255.255.255.0
          inet6 addr: fe80::109f:dff:fecb:1ed7/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:11223889 errors:0 dropped:0 overruns:0 frame:0
          TX packets:14182503 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:3939819115 (3.9 GB)  TX bytes:4179392315 (4.1 GB)
          Interrupt:25 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:941760 errors:0 dropped:0 overruns:0 frame:0
          TX packets:941760 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:105477120 (105.4 MB)  TX bytes:105477120 (105.4 MB)

tun0      Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
          inet addr:11.8.0.1  P-t-P:11.8.0.2  Mask:255.255.255.255
          UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1500  Metric:1
          RX packets:1487612 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2183565 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100 
          RX bytes:80544562 (80.5 MB)  TX bytes:2367468022 (2.3 GB)

ubuntu@ip-10-0-0-50:~$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.0.0.1        0.0.0.0         UG    100    0        0 eth0
10.0.0.0        0.0.0.0         255.255.255.0   U     0      0        0 eth0
11.8.0.2        0.0.0.0         255.255.255.255 UH    0      0        0 tun0

IMHO, the problem is not a networking problem but a ROS problem.

Again: On the field computer, regarding c11 that runs on the local computer:

#!script

ubuntu@ip-10-0-0-52:~/integrator/robil/C0_Scripts$ rosnode info c11
--------------------------------------------------------------------------------
Node [/c11]
Publications: 
 * /rosout [rosgraph_msgs/Log]

Subscriptions: 
 * /clock [rosgraph_msgs/Clock]

Services: 
 * /C11/push_img
 * /C11
 * /c11/get_loggers
 * /c11/set_logger_level
 * /C11/push_occupancy_grid

contacting node http://localhost:53563/ ...
ERROR: Communication with node[http://localhost:53563/] failed!

On the local computer:

#!script

michele@darpaws5:~/git/robil/C11_OperatorControl$ rosnode info c11
--------------------------------------------------------------------------------
Node [/c11]
Publications: 
 * /rosout [rosgraph_msgs/Log]

Subscriptions: 
 * /clock [rosgraph_msgs/Clock]

Services: 
 * /C11/push_img
 * /C11
 * /c11/get_loggers
 * /c11/set_logger_level
 * /C11/push_occupancy_grid

contacting node http://localhost:53563/ ...
Pid: 1108
Connections:
 * topic: /clock
    * to: /gazebo (http://10.0.0.51:38299/)
    * direction: inbound
    * transport: TCPROS
osrf-migration commented 11 years ago

Original comment by John Hsu (Bitbucket: hsu, GitHub: hsu).


osrf-migration commented 11 years ago

Original comment by Hugo Boyer (Bitbucket: hugomatic, GitHub: hugomatic).


We're trying to reproduce the problem here but we don't see the behavior you describe. The only time we observe the error you describe is when the ROS_IP is unset at the moment we start the node (and setting it later does not fix anything).

Can you confirm that the problems persist if you restart your node? Also, can you reproduce the bug using only rostopic pub or is it related to your node?

osrf-migration commented 11 years ago

Original comment by michele hallak (Bitbucket: mhallak).


The only node that runs in the local OCU is our node. We are not running our node before setting the environment (ros.sh) so, the ROS_IP and GAZEBO_IP are setup beforehand.

I made more checks and I am wondering: all the nodes running on the field computer report that they can be contacted via the simulator ip:

For example, the node C21 runs on the field computer and the following is done on the field computer:

#!script

ubuntu@ip-10-0-0-52:~/integrator/robil/C0_Scripts$ rosnode info /C21_VisionAndLidar 
--------------------------------------------------------------------------------
Node [/C21_VisionAndLidar]
Publications: 
 * /C21/C22 [C21_VisionAndLidar/C21_C22]
 * /C21/right_camera/image [sensor_msgs/Image]
 * /rosout [rosgraph_msgs/Log]
 * /C21/left_camera/image [sensor_msgs/Image]

Subscriptions: 
 * /multisense_sl/camera/left/image_raw [sensor_msgs/Image]
 * /tf [tf/tfMessage]
 * /multisense_sl/camera/points2 [sensor_msgs/PointCloud2]
 * /multisense_sl/camera/right/image_raw [sensor_msgs/Image]
 * /clock [rosgraph_msgs/Clock]

Services: 
 * /C21/Panorama
 * /C21_VisionAndLidar/tf_frames
 * /C21_VisionAndLidar/set_logger_level
 * /C21/Pic
 * /C21_VisionAndLidar/get_loggers
 * /C21

contacting node http://10.0.0.52:34888/ ...
Pid: 1949
Connections:
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /C21/C22
    * to: /c22_groundReconition_and_mapping
    * direction: outbound
    * transport: TCPROS
 * topic: /C21/C22
    * to: /C25_GlobalPosition
    * direction: outbound
    * transport: TCPROS
 * topic: /clock
    * to: /gazebo (http://10.0.0.51:35307/)
    * direction: inbound
    * transport: TCPROS
 * topic: /multisense_sl/camera/left/image_raw
    * to: /gazebo (http://10.0.0.51:35307/)
    * direction: inbound
    * transport: TCPROS
 * topic: /multisense_sl/camera/right/image_raw
    * to: /gazebo (http://10.0.0.51:35307/)
    * direction: inbound
    * transport: TCPROS
 * topic: /multisense_sl/camera/points2
    * to: /multisense_sl/camera/stereo_proc (http://10.0.0.51:45902/)
    * direction: inbound
    * transport: TCPROS
 * topic: /tf
    * to: /multisense_sl_robot_state_publisher (http://10.0.0.51:46307/)
    * direction: inbound
    * transport: TCPROS
 * topic: /tf
    * to: /atlas_robot_state_publisher (http://10.0.0.51:56306/)
    * direction: inbound
    * transport: TCPROS

But I guess that it is normal since the master is 10.0.0.51

Note that when I am running the node at the local OCU, it prints the following: (Note the localhost)

#!script

michele@darpaws5:~/trio/router_cxe378f80a$ roslaunch C11_OperatorControl C11.launch 
... logging to /home/michele/.ros/log/b55b5af2-8556-11e2-bd49-129f0dc065c6/roslaunch-darpaws5-3455.log
Checking log directory for disk usage. This may take awhile.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

started roslaunch server http://localhost:45983/

SUMMARY
========

PARAMETERS
 * /rosdistro
 * /rosversion

NODES
  /
    c11 (C11_OperatorControl/c11)

ROS_MASTER_URI=http://10.0.0.51:11311

core service [/rosout] found
Exception AttributeError: AttributeError("'_DummyThread' object has no attribute '_Thread__block'",) in <module 'threading' from '/usr/lib/python2.7/threading.pyc'> ignored
process[c11-1]: started with pid [3466]

Actually, I run different nodes on the local OCU and I get the same result consistently: the nodes running locally are not accessible from the field and simulator computers.

For fun, I run the OCU local node on the field computer and it displays the following: (Note the server is not local host here)

#!script

ubuntu@ip-10-0-0-52:~/integrator/robil/C0_Scripts$ roslaunch C11_OperatorControl C11.launch 
... logging to /home/ubuntu/.ros/log/b55b5af2-8556-11e2-bd49-129f0dc065c6/roslaunch-ip-10-0-0-52-2084.log
Checking log directory for disk usage. This may take awhile.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

started roslaunch server http://10.0.0.52:40430/

SUMMARY
========

PARAMETERS
 * /rosdistro
 * /rosversion

NODES
  /
    c11 (C11_OperatorControl/c11)

ROS_MASTER_URI=http://10.0.0.51:11311

core service [/rosout] found
Exception AttributeError: AttributeError("'_DummyThread' object has no attribute '_Thread__block'",) in <module 'threading' from '/usr/lib/python2.7/threading.pyc'> ignored
process[c11-1]: started with pid [2095]
osrf-migration commented 11 years ago

Original comment by Carlos Agüero (Bitbucket: caguero, GitHub: caguero).


Hi Michele,

could you try to set ROS_IP environment variables with the proper IPs in all the machines involved?

osrf-migration commented 11 years ago

Original comment by michele hallak (Bitbucket: mhallak).


Hi Carlos. Isn't is the case: Local OCU:

#!script

ROS_MASTER_URI=http://10.0.0.51:11311
ROS_HOSTNAME=localhost
ROS_DISTRO=fuerte
ROS_IP=11.8.0.2
ROS_ETC_DIR=/opt/ros/fuerte/etc/ros

Field computer:

#!script

ROS_MASTER_URI=http://10.0.0.51:11311
ROS_DISTRO=fuerte
ROS_IP=10.0.0.52
ROS_ETC_DIR=/opt/ros/fuerte/etc/ros

What should I change?

osrf-migration commented 11 years ago

Original comment by michele hallak (Bitbucket: mhallak).


While answering you, I realized that: ROS_HOSTNAME=localhost on the local OCU. So I changed with export ROS_HOSTNAME=11.8.0.2 and that now works fine! So my advise, add

#!script
export ROS_HOSTNAME=11.8.0.2

to the ros.sh

osrf-migration commented 11 years ago

Original comment by Carlos Agüero (Bitbucket: caguero, GitHub: caguero).


Great!

ROS_IP and ROS_HOSTNAME [1] are mutually exclusive. I'd suggest to remove the export ROS_HOSTNAME from your configuration scripts or unset ROS_HOSTNAME, using:

#!python
unset ROS_HOSTNAME

[1] http://www.ros.org/wiki/ROS/EnvironmentVariables#ROS_IP.2BAC8-ROS_HOSTNAME

osrf-migration commented 11 years ago

Original comment by Carlos Agüero (Bitbucket: caguero, GitHub: caguero).


Solved