sctplab / usrsctp

A portable SCTP userland stack
BSD 3-Clause "New" or "Revised" License
665 stars 279 forks source link

usrsctp QNX porting #264

Open Sadrieh opened 6 years ago

Sadrieh commented 6 years ago

Hi,

I am trying to port the usrsctp library for QNX700, the port was done, but when trying to run the test application results are not as expected. daytime_server and client examples have been used, the client receives the information from the server and displays it on the screen but it doesn't shut down the connection gracefully. When running the binaries on two VMware machines and Wiresharking the VMware virtual network, it seems the COOKIE_ACK packet is not being recognized. There is a 50-byte packet being transferred but Wireshark doesn't recognize it as COOKIE_ACK. The tcpdump is attached. The requirement for us is to use native SCTP protocol, otherwise, the encapsulated version works perfectly fine.

QNX support claims, their network stack is based on NetBSD, so during build time I tried to mimic the NetBSD flags, I am cross compiling on a Linux machine for QNX binary. The diffs attached as the changes required for porting to QNX:

  1. IPv6 structure defined differently in QNX
  2. The CMSG_ALIGN(n) is defined at __CMSG_ALIGN(n) for QNX
  3. min and max macros are already defined for QNX
  4. LIST_FOREACH_SAFE macro must be defined for QNX
  5. uio_seg and uio_rw enums are not defined in QNX

I used the following commands to build the library (qcc is the gcc equivalent in QNX world, for QNX700 it is based on gcc5.4):

  1. ./bootstrap
  2. CC=qcc ./configure
  3. make all CFLAGS="-DQNX -UNetBSD -D__Userspace_os_NetBSD"

and these two commands to build the test apps:

  1. qcc -DQNX -UNetBSD -D__Userspace_os_NetBSD -o daytime_server-qcc.bin daytime_server.o ../usrsctplib/.libs/libusrsctp.a -lsocket -Wl,-rpath -Wl,/home/afs/workspace/SCTP/usrsctp.qcc/build/usrsctplib/.libs
  2. qcc -DQNX -UNetBSD -D__Userspace_os_NetBSD -o client-qcc.bin client.o ../usrsctplib/.libs/libusrsctp.a -lsocket -Wl,-rpath -Wl,/home/afs/workspace/SCTP/usrsctp.qcc/build/usrsctplib/.libs The diff from the of the changed files is attached.

Any help you provide is highly appreciated.

diff.txt sctp_dump.txt

tuexen commented 6 years ago

Thanks for the patch. I think we could integrate a variant of it in the source code using something like __Userspace_os_QNX... Is a QNX system available for free download such that we could use it for testing? Regarding you tracefile: Can you provide it in .pcap or .pcapng format. That would be much easier to handle for me...

Sadrieh commented 5 years ago

Thanks for the quick reply, It would be great if usrsctp supports QNX out-of-the-box. Bellow is the link for evaluation downloads, note that they are not providing QNX700 for evaluation (I don't know why), but hopefully, 650 is close enough to 700.

http://www.qnx.com/download/group.html?programid=16780

GitHub didn't allow to attach the pcap dump, so the zipped version attached:

sctp_dump.zip

tuexen commented 5 years ago

@weinrank Have a look at the download link above.

@Sadrieh The problem with fragmented packets is most likely a byte ordering problem when setting ip->ip_off in sctp_output.c. I'm not sure which code path your code currently used (the one with htons() or without), but please try the other.

Sadrieh commented 5 years ago

Thanks a lot @tuexen. It worked for me. Please find the more recent diff as attahced to this comment. diff_sctp.txt

Sadrieh commented 5 years ago

Hi @tuexen, Me back again... I'm facing a similar problem. When trying to connect from QNX machine to Linux machine using Echo_server and Client, I get CheckSum mismatch. But the only difference is the endianness of checksums. Got the logging enabled on both Client and Echo_server. See the attachment. I have also attached a tcp_dump of this transaction (just change the extention from txt to pcap)

Any help is highly appreciated.

Regards, Afshin client.log echo_server.log dump.txt

tuexen commented 5 years ago

Is your host byte order little or big endian?

tuexen commented 5 years ago

I guess that BYTE_ORDER, BIG_ENDIAN, and LITTLE_ENDIAN are undefined. Then sctp_crc32.c:562, for example, assumes that your platform is big endian. If it is actually little endian, then you end up in the problem above. Can you add appropriate defines to your build environment?

tcf8461 commented 3 years ago

Hello,

Afshin left us to go to a great job at Google. I would like to say that we fixed the endedness issue and its been compiling well for us. I do have a question about client configuration for a multi-homed setup. I am not able to get rehoming working despite reading the RFCs and attempting a number of different ways of configuring it. Can I ask questions here or is there a forum somewhere?

Thank you, Tim

tuexen commented 3 years ago

You can ask questions here...

tcf8461 commented 3 years ago

Thank you. I will try to keep this brief. I am attempting to get rehoming working. To test this out I have two VMs, both with two virtual interfaces that are on separate networks, this is from the client side displaying routes to the server:

admin@tcf-VirtualBox ~/ $ ip route get 192.168.1.10 192.168.1.10 dev enp0s10 src 192.168.1.2

admin@tcf-VirtualBox ~/ $ ip route get 10.40.53.91 10.40.53.91 dev enp0s9 src 10.40.54.54

I am setting up what I think is a valid one-to-many configuration. I have a simple SCTP server that does a bindx() to its to IP addresses (10.40.53.91 and 192.168.1.10) then starts accepting. The client does a bindx() to 192.168.1.2 and 10.40.54.54 then a connect() to the two server addresses. If I do not set a peer on the client it normally works off of the 192.168.1.2 address. When I try to set 10.40.54.54 as the peer it seems to ignore it. I never see a heartbeat on the 10.40.54.54 IP.

If I use Iptables or 'ifconfig 192.168.1.2 down' to stop traffic on the 192.168.1.2 interface, it doesn't switch over to the 10.40 network. If I do one-to-one connections on each network, they heartbeat and send data just fine. I have mucked around with a lot of settings. Adding heartbeat information for all the IP's, CMT settings, etc.

I have attached a pcapng file of a quick run and downed the 192.168.1.2 interface about 1 minute and a half in. It retransmits a few times then aborts. Restarting that interface restarts the process. I would have thought it would have switched over to the 10.40.x.x interface.

Any help is appreciated :) Thank you, Tim

tcf8461 commented 3 years ago

The client does a bindx() to 192.168.1.2 and 10.40.54.54 then a connect() to the two server addresses connect'x'() not connect test_data.zip

tuexen commented 3 years ago

So the client owns: 192.168.1.2 and 10.40.54.54 and the server owns 192.168.1.10 and 10.40.53.91. I don't see any packet in the tracefile from 10.40.54.54 to 10.40.53.91. Can you provide the output of ifconfig on the client side? I would like to see the prefix/subnetmask. What happens when you run ping 10.40.53.91 on the client side? Can you provide a .pcap for that?

tcf8461 commented 3 years ago

client_side server_side

Thank you for having a look 👍

Here are the client/server side ifconfig outputs. I will get the pcap as soon as I can and attach it as well.

tcf8461 commented 3 years ago

server_ping

Client side to server side ping output

tcf8461 commented 3 years ago

ping_53_91_host.zip

Ping of .53.91 from .54.54

tcf8461 commented 3 years ago

Your comment about not seeing any traffic from 10.40.54.54 to 10.40.53.91 is where I am stumped as well. Since I am binding to both the 192.168 and 10.40 addresses on both sides I was expecting that heartbeats would be generated to/from both sides and I never see any from 10.40.54.54. I believe there were a small number from 192.168.1.10 ->10.40.54.54 but the two networks are isolated from each other (this is a replica of an environment I am forced to be in as a third-party interface) and that will not result in an ACK.

Again, I really appreciate you having a look, Tim

tuexen commented 3 years ago

OK. Is it possible that you configure non-private addresses in the VM's? Something like 1.1.1.1/24 and 2.2.2.1/24 on the client side and 1.1.1.2/24 and 2.2.2.2/24 on the server side?

tcf8461 commented 3 years ago

I will try to set it up next week. Have a nice weekend :)

tcf8461 commented 3 years ago

I removed all networks from two Linux VMs and added two networks of "host-only" type to each. After starting the VMs I set the IP's using ifconfig as requested. I was able to ping from the .1 side to both .2 addresses and vice-versa. Still the issue persists with one side not getting heartbeat returns. using_publics.zip

My offsider is getting some hardware we are putting together for production installations and we are going to use them as guinea pigs. More in a day or two or three on that.

tcf8461 commented 3 years ago

Hello, I have spent a day and half or so mucking around with two new hosts I was able to get for a few days that have QNX version 7 which is what we are using on this project. I also mucked about in the client and discard server code (I have missed straight C, did it for many years in the mid-80s to early 90's :]) to have the same basic sequence of events as my larger apps so I could share them if you need them. Both have full usrsctp logging turned on. I am going to attach a document that outlines what I saw as the stopper. Let me know what you think.

Test Setup.pdf

test.zip

tcf8461 commented 3 years ago

Hello, just wondering if there is anything else I can provide to assist with this issue?

tcf8461 commented 3 years ago

I hope you had a great holiday season :) Can I possibly get some help with this?

tcf8461 commented 3 years ago

I hope you had a great holiday season :) Can I possibly get some help with this?

tuexen commented 3 years ago

Can you try not binding to explicit addresses, but just use 0.0.0.0 or :: and see if that has an impact?

tcf8461 commented 3 years ago

Thank you for the reply. We moved offices over Christmas and I have to sort out the equipment we used for this test and get it connected to the new network. I should have results sometime later today or tomorrow.

tcf8461 commented 3 years ago

I tried it and it made no difference