openbmc / phosphor-rest-server

REST server that transposes dbus interfaces to REST
Apache License 2.0
4 stars 11 forks source link

RestAPI curl couldn't connect to host after changing IP #24

Closed Kenthliu closed 8 years ago

Kenthliu commented 8 years ago

At the beginning, the RestAPI worked normally. After we changed IP using the curl command "curl b cjar -c cjar -k -H "Content-Type: application/json" -X POST -d "{"data": ["eth0","ip","netmask","gateway"]}" https://bmcip/org/openbmc/NetworkManager/Interface/action/SetAddress4" The BMC can still ping successfully. But the RestAPI couldn't connect to host. It needed to wait 10 minutes to back normal. 1467182732082

williamspatrick commented 8 years ago

@bradbishop - Do you think this is a problem with the rest server specifically or the bmc networking in general?

williamspatrick commented 8 years ago

@Kenthliu - Your picture shows a ping to 10.32.8.227 but a curl to 10.32.9.33. I'm not sure which is intended to be the "new" and "old" ip address. Can you confirm that you can ping to the new address before attempting a REST operation?

DSTeddys commented 8 years ago

@williamspatrick - We found a problem based on BMC version 1.01. The duplicate steps as below:

  1. Login BMC from RESTAPI via Crul command.
  2. Modify BMC IP address from DHCP change to STATIC.(Curl Command did not respond, but BMC has been performed)
  3. Use CURL command to re-sign through new IP address , The response is couldn't connect to host.
  4. kill obmc-rest process and wait obmc-rest process restart in BMC console.
  5. Retry curl login through new IP address is right. bmc-issue-01 bmc-issue-02 bmc-issue-03
williamspatrick commented 8 years ago

@bradbishop - We aren't binding to a particular IP address, are we? Any ideas what is going on here?

bradbishop commented 8 years ago

We bind to 0.0.0.0. @Kenthliu are you able to ssh to the bmc after changing the IP address?

DSTeddys commented 8 years ago

@bradbishop Yes, I can ssh to BMC after changing the IP address.

bradbishop commented 8 years ago

I haven't been able to reproduce this:

[toshiba:build]$ date
Mon Jul 11 09:19:52 EDT 2016
[toshiba:build]$ curl -X POST -d '{"data": ["root", "0penBmc"]}' -H"Content-Type:application/json" https://192.168.253.97/login
{
  "data": "User 'root' logged in",
  "message": "200 OK",
  "status": "ok"
}
[toshiba:build]$ curl -X POST https://192.168.253.97/org/openbmc/NetworkManager/Interface/action/SetAddress4 -H "Content-Type:application/json" -d '{"data": ["eth0", "192.168.253.98", "255.255.252.0", "192.168.252.1"]}'
^C
[toshiba:build]$ date
Mon Jul 11 09:20:22 EDT 2016
[toshiba:build]$ curl -X POST -d '{"data": ["root", "0penBmc"]}' -H"Content-Type:application/json" https://192.168.253.98/login
{
  "data": "User 'root' logged in",
  "message": "200 OK",
  "status": "ok"
}
[toshiba:build]$
bradbishop commented 8 years ago

@Kenthliu @DSTeddys do you have any additional information about the test case that might help me reproduce this?

DSTeddys commented 8 years ago

@bradbishop We found this issue need from DHCP mode change to static. The test sequence as below: $ /usr/local/bin/curl -c cjar -b cjar -k -H "Content-Type: application/json" -X POST https://10.32.9.116/login -d "{\"data\": [ \"root\", \"0penBmc\" ] }" {
"data": "User 'root' logged in",
"message": "200 OK",
"status": "ok"
$ /usr/local/bin/curl -c cjar -b cjar -k -H "Content-Type: application/json" -X
POST https://10.32.9.116/org/openbmc/NetworkManager/Interface/action/EnableDHCP
-d "{\"data\": [\"eth0\"] }"
Killed
$ /usr/local/bin/curl -c cjar -b cjar -k -H "Content-Type: application/json" -X
POST https://10.32.9.114/login -d "{\"data\": [ \"root\", \"0penBmc\" ] }" {
"data": "User 'root' logged in",
"message": "200 OK",
"status": "ok"
$ /usr/local/bin/curl -c cjar -b cjar -k -H "Content-Type: application/json" -X POST https://10.32.9.114/org/openbmc/NetworkManager/Interface/action/SetAddress4 -d "{\"data\": [\"eth0\",\"10.32.9.154\",\"255.255.254.0\",\"10.32.8.1\"] }"
Killed
$ /usr/local/bin/curl -c cjar -b cjar -k -H "Content-Type: application/json" -X
POST https://10.32.9.154/login -d "{\"data\": [ \"root\", \"0penBmc\" ] }" curl: (7) couldn't connect to host
$ ping 10.32.9.154
PING 10.32.9.154 (10.32.9.154): 56 data bytes
64 bytes from 10.32.9.154: seq=0 ttl=64 time=1.525 ms
64 bytes from 10.32.9.154: seq=1 ttl=64 time=0.844 ms

thanks your helping.

williamspatrick commented 8 years ago

@DSTeddys , @Kenthliu - This debug sequence you give still doesn't convince us that the ip-address change has happened BEFORE you attempt to connect to the REST server. Also, the 'ping' shows that eventually the ip is active, but we don't know how many dropped packets there were before the ping was successful.

Test sequence should be:

  1. Use REST to change IP address.
  2. ping new address until address is active.
  3. Use REST on new IP address to confirm it works.

All of your examples of failure show steps 2 and 3 swapped. We know there is a certain amount of time to get a new address set up, so these failure outputs don't prove anything to us.

@bradbishop has tried this test sequence (1, 2, 3) and found it to be successful every time on our network.

johnhcwang commented 8 years ago

Hi, we think the loading on REST service is key point. The issue was reported from RMC team and there're many sessions from RMC to BMC REST as I know. We try to simulate this condition to recreate the issue follow the below steps so that you might be recreate easily.

Step 1: ping BMC IP 10.32.9.105 and make sure that we can get the response Step 2: login by curl command Step 3: create 5 sessions to get inventory continually and make sure that all of them can get the information correctly

~$ for x in `seq 1 100`;do curl -b cjar -k https://10.32.9.105/org/openbmc/inventory/system/chassis/motherboard/enumerate ;done &
~$ for x in `seq 1 100`;do curl -b cjar -k https://10.32.9.105/org/openbmc/inventory/system/chassis/motherboard/enumerate ;done &
~$ for x in `seq 1 100`;do curl -b cjar -k https://10.32.9.105/org/openbmc/inventory/system/chassis/motherboard/enumerate ;done &
~$ for x in `seq 1 100`;do curl -b cjar -k https://10.32.9.105/org/openbmc/inventory/system/chassis/motherboard/enumerate ;done &
~$ for x in `seq 1 100`;do curl -b cjar -k https://10.32.9.105/org/openbmc/inventory/system/chassis/motherboard/enumerate ;done &

Step 4: change BMC IP by curl command when above sessions are still running

journal log while ip change ===
Jul 21 00:41:13 barreleye systemd[1]: systemd-hwdb-update.service: Cannot add dependency job, ignoring: Unit systemd-hwdb-update.service is masked.
Jul 21 00:41:13 barreleye systemd[1]: Starting Network Service...
Jul 21 00:41:14 barreleye systemd[1]: systemd-hwdb-update.service: Cannot add dependency job, ignoring: Unit systemd-hwdb-update.service is masked.
Jul 21 00:41:14 barreleye systemd-timesyncd[887]: No network connectivity, watching for changes.
Jul 21 00:41:14 barreleye systemd-networkd[1092]: Enumeration completed
Jul 21 00:41:14 barreleye systemd-timesyncd[887]: No network connectivity, watching for changes.
Jul 21 00:41:14 barreleye systemd-timesyncd[887]: No network connectivity, watching for changes.
Jul 21 00:41:14 barreleye systemd[1]: Stopped Network Service.
Jul 21 00:41:14 barreleye systemd[1]: Starting Network Service...
Jul 21 00:41:15 barreleye systemd-timesyncd[887]: No network connectivity, watching for changes.
Jul 21 00:41:15 barreleye systemd-networkd[1095]: Enumeration completed
Jul 21 00:41:15 barreleye systemd[1]: Started Network Service.
Jul 21 00:41:15 barreleye systemd-timesyncd[887]: No network connectivity, watching for changes.
Jul 21 00:41:15 barreleye systemd-timesyncd[887]: No network connectivity, watching for changes.
Jul 21 00:41:15 barreleye systemd-timesyncd[887]: No network connectivity, watching for changes.
Jul 21 00:41:15 barreleye systemd-networkd[1095]: eth0: Configured

Step 5: ping new BMC IP and make sure that we can get the response Step 6: login by curl command with new IP => Fail

journal log while login fail (no any log except timesyncd) ===
Jul 21 00:41:57 barreleye systemd-timesyncd[887]: Timed out waiting for reply from 216.239.35.0:123 (time1.google.com).
Jul 21 00:42:08 barreleye systemd-timesyncd[887]: Timed out waiting for reply from 216.239.35.4:123 (time2.google.com).
Jul 21 00:42:18 barreleye systemd-timesyncd[887]: Timed out waiting for reply from 216.239.35.8:123 (time3.google.com).
Jul 21 00:42:28 barreleye systemd-timesyncd[887]: Timed out waiting for reply from 216.239.35.12:123 (time4.google.com).

Step 7: restart obmc-rest.service (it takes long time to restart) and then we can login through REST with new IP

journal log while restart obmc-rest service ===
Jul 21 00:47:23 barreleye systemd[1]: Stopping Phosphor OpenBMC DBus REST daemon...
Jul 21 00:47:25 barreleye systemd-timesyncd[887]: Timed out waiting for reply from 216.239.35.8:123 (time3.google.com).
Jul 21 00:47:35 barreleye systemd-timesyncd[887]: Timed out waiting for reply from 216.239.35.12:123 (time4.google.com).
Jul 21 00:48:54 barreleye systemd[1]: obmc-rest.service: State 'stop-sigterm' timed out. Killing.
Jul 21 00:48:54 barreleye systemd[1]: obmc-rest.service: Main process exited, code=killed, status=9/KILL
Jul 21 00:48:54 barreleye systemd[1]: Stopped Phosphor OpenBMC DBus REST daemon.
Jul 21 00:48:54 barreleye systemd[1]: obmc-rest.service: Unit entered failed state.
Jul 21 00:48:54 barreleye systemd[1]: obmc-rest.service: Failed with result 'signal'.
Jul 21 00:48:54 barreleye systemd[1]: Started Phosphor OpenBMC DBus REST daemon.
Jul 21 00:50:47 barreleye systemd[1]: Starting Cleanup of Temporary Directories...
Jul 21 00:50:47 barreleye systemd-tmpfiles[1124]: Failed to open directory /var/tmp: Too many levels of symbolic links
Jul 21 00:50:47 barreleye systemd[1]: Started Cleanup of Temporary Directories.
anoo1 commented 8 years ago

I'm able to recreate and get this in the journal:

Jul 19 22:38:24 barreleye systemd[1]: Stopped Phosphor OpenBMC DBus REST daemon.
Jul 19 22:38:24 barreleye systemd[1]: Started Phosphor OpenBMC DBus REST daemon.
TCP: request_sock_TCP: Possible SYN flooding on port 443. Dropping request.  Check SNMP counters.
Jul 19 22:39:17 barreleye kernel: TCP: request_sock_TCP: Possible SYN flooding on port 443. Dropping request.  Check SNMP counters.
Jul 19 22:39:17 barreleye kernel[941]: TCP: request_sock_TCP: Possible SYN flooding on port 443. Dropping request.  Check SNMP counters.

Norm mentions that the kernel thinks there is a DoS attack. Tried increasing this value: sysctl -w net.ipv4.tcp_max_syn_backlog=2048 but the issue still recreates.

williamspatrick commented 8 years ago

@shenki - We are still investigating overall, but do you think we should CONFIG_SYN_COOKIES=y in the kernels?

anoo1 commented 8 years ago

As workaround for now, we'll add a change to restart the rest-server when networkd restarts (when the network config is changed).

williamspatrick commented 8 years ago

Fix should be present in openbmc/openbmc v1.0 and master branches. We have openbmc/openbmc#501 to track a better solution in master branch.