Pion <-> usrsctp under 10% packet loss gets an abort

Sean-Der commented 4 years ago

Using main.go
demo.js
simulating packet loss using comcast --device=lo --latency=150 --packet-loss=10% --target-proto=udp

I get an ABORT after ~5 seconds.

Reass 30000007,CI:ffffffff,TSN=1e5e52d3,SID=0002,FSN=1e5e52d3,SSN:0000

This abort is generated here in usrsctp. I am not sure why yet though, I need to read more about SCTP to fully understand.

Sean-Der commented 4 years ago

This seems to be triggered by the size of the message we are sending. If I drop this to a smaller size the crash goes away.

The crash seems to be around DATA chunk fragmenting. When we do messages @ ~3500 I don't see the abort anymore.

Sean-Der commented 4 years ago

This is the full logic of the assert we are hitting

                  if ((tsn == (asoc->cumulative_tsn + 1) && (asoc->idata_supported == 0))) {
                          /* Ok we created this control and now
                           * lets validate that its legal i.e. there
                           * is a B bit set, if not and we have
                           * up to the cum-ack then its invalid.
                           */
                          if ((chk->rec.data.rcv_flags & SCTP_DATA_FIRST_FRAG) == 0) {
                                  sctp_abort_in_reasm(stcb, control, chk,
                                                      abort_flag,
                                                      SCTP_FROM_SCTP_INDATA + SCTP_LOC_7);
                                  return;
                          }
                  }

It seems to be saying

TSN of this new chunk is exactly 1 greater then what we have gotten
It doesn't have the B field set (beginning fragment)
This is the first packet ever created_control means it is the first thing we have ever processed.

I will work on getting a capture of stuff we are sending and see what I can find

Sean-Der commented 4 years ago

Here is a pcap with this message Protocol Violation: Reass 30000007,CI:ffffffff,TSN=0bf4ed02,SID=0002,FSN=bf4ed02,SSN:0000

pion.tar.gz

Sean-Der commented 4 years ago

@enobufs this looks promising! This happens right after a storm of Forward TSN packets. Then a DATA packet right after causes the issue. Should we not be sending a Forward TSN in the middle of a fragment maybe?

Sean-Der commented 4 years ago

2020-02-01-014412_761x319_scrot

2020-02-01-014420_844x326_scrot

Sean-Der commented 4 years ago

I think the new forward TSN flushes any caching in usrsctp.

Then when the next data comes in it is confused because it goes to do a lookup and we have a chunk that the B bit is unset.

Sean-Der commented 4 years ago

I am figuring out the best way to make sure when abandoned is set it takes into account the whole fragment.

checkPartialReliabilityStatus is called on a single chunk. Maybe we can update this to take a payloadQueue + an index. Then it can handle iterating and making sure the whole chunked fragment is marked abandoned.

enobufs commented 4 years ago

@Sean-Der I think you are on the right track! Indeed, forwarding TSN to a fragment with (BE)=(00) doesn't make sense, it should forward to the last fragment, TSN E-bit being 1.

Sean-Der commented 4 years ago

@enobufs That's good to hear :) Any suggestions on best way to refactor/change the API?

This is what I have so far

diff --git a/association.go b/association.go
index e47195d..a870687 100644
--- a/association.go
+++ b/association.go
@@ -623,7 +623,7 @@ func (a *Association) gatherOutboundFrastRetransmissionPackets(rawPackets [][]by
                        fastRetransSize += dataChunkSize
                        a.stats.incFastRetrans()
                        c.nSent++
-                       a.checkPartialReliabilityStatus(c)
+                       a.checkPartialReliabilityStatus(c, a.inflightQueue)
                        toFastRetrans = append(toFastRetrans, c)
                        a.log.Tracef("[%s] fast-retransmit: tsn=%d sent=%d htna=%d",
                                a.name, c.tsn, c.nSent, a.fastRecoverExitPoint)
@@ -1715,7 +1715,7 @@ func (a *Association) movePendingDataChunkToInflightQueue(c *chunkPayloadData) {
        c.since = time.Now() // use to calculate RTT and also for maxPacketLifeTime
        c.nSent = 1          // being sent for the first time

-       a.checkPartialReliabilityStatus(c)
+       a.checkPartialReliabilityStatus(c, a.inflightQueue)

        a.log.Tracef("[%s] sending tsn=%d ssn=%d sent=%d len=%d",
                a.name, c.tsn, c.streamSequenceNumber, c.nSent, len(c.userData))
@@ -1837,7 +1837,7 @@ func (a *Association) sendPayloadData(chunks []*chunkPayloadData) error {
 }

 // The caller should hold the lock.
-func (a *Association) checkPartialReliabilityStatus(c *chunkPayloadData) {
+func (a *Association) checkPartialReliabilityStatus(c *chunkPayloadData, q chunkQueue) {
        if !a.useForwardTSN {
                return
        }
@@ -1858,6 +1858,28 @@ func (a *Association) checkPartialReliabilityStatus(c *chunkPayloadData) {
                        }
                }
                s.lock.RUnlock()
+
+               // From RFC 3758 Sec 3.6 A3:
+               // When a TSN is "abandoned", if it is part of a fragmented message,
+               // all other TSN's within that fragmented message MUST be abandoned
+               // at the same time.
+               if !c.abandoned || c.beginningFragment {
+                       return
+               }
+
+               currentTSN := c.tsn + 1
+               for {
+                       fragment, ok := q.get(currentTSN)
+                       if !ok {
+                               a.log.Warnf("[%s] attempted to abandon chunk, but missing fragment tsn=%d ", a.name, currentTSN)
+                               return
+                       } else if fragment.beginningFragment {
+                               return
+                       }
+
+                       fragment.abandoned = true
+                       currentTSN++
+               }
        }
 }

@@ -1894,7 +1916,7 @@ func (a *Association) getDataPacketsToRetransmit() []*packet {

                c.nSent++

-               a.checkPartialReliabilityStatus(c)
+               a.checkPartialReliabilityStatus(c, a.inflightQueue)

                a.log.Tracef("[%s] retransmitting tsn=%d ssn=%d sent=%d", a.name, c.tsn, c.streamSequenceNumber, c.nSent)

enobufs commented 4 years ago

Ah... one thing I realize that may make things complicated is, the rest of the fragments that also need to be abandoned may still be in the pendingQueue.

When sendPayloadData() is called, the chunks passed would first be written into pendingQueue. The chunks in the pending queue do not have TSN assigned yet (so that urgent messages can be reordered - and that is why the pendingQueue has ordered and reordered queues.

Pulling a chunk from the pendingQueue is done by peek(), then pop(). When the peek() returns a chunk that is a fragment of a message (E-bit being 0), then then the next peek() would persist to the ordered or unorder queue whichever the earlier chunk was, until it pop() the last fragment. (using pendingQueue.selected flag in the code). So, this concern is taken care of. This is done because all TSN numbers of the same group of fragments must be consecutive.

When there is an available congestion window, the chunks are moved from pendingQueue to inflightQueue, and the chunks will have TSN assigned at that point. (see: movePendingDataChunkToInflightQueue())

So, the challenging part is, we currently do not have a easy way to set abandoned flag set to true for the chunks that are still in the pendingQueue.

What I would do is (not entirely sure if works), when move chunks from pendingQueue to inflightQueue, check a.cumulativeTSNAckPoint against the newly assigned TSN number. If the number is equal to or less than a.cumulativeTSNAckPoint, I believe we should mark them as abandoned at that time, make sure the chunk won't be sent out.

enobufs commented 4 years ago

There is one thing in my mind that concerns me a bit about the protocol violation error.

Say, we sent chunks with TSN 1(B=1), 2, 3, 4, 5(E=1). The sender forwards TSN to 3, stops retransmitting 4, 5 (which we are trying to do), but those chunks with 4 and 5 might still be in-flight (reordered delay), and later received by the remote.

So, I am not 100% sure if the workaround would work perfectly. usrsctp maybe should have just ignored the DATA chunk silently instead of sending the error. Or there may be other causes of the violation error..

Sean-Der commented 4 years ago

oh that is a really good point.

I think that is worth asking @tuexen I can file an issue on the usrsctp bug tracker and see if has any ideas!

Sean-Der commented 4 years ago

I am just walking out the door, but I can file a ticket right when I get back! Or feel free to do it if you are bored/want to get into it :)

enobufs commented 4 years ago

I won't be available for this issue (as the main assignee) due to other commitments right now but will try my best to support your work like this. It is great to have another eye on sctp also!

tuexen commented 4 years ago

Is it possible to get

a .pcapng file taken at the side sending the ABORT chunk.
a description what is done at the data sender side (message size, ordered / unordered, PR-SCTP policy)

Then I can try to reproduce the issue.

Sean-Der commented 4 years ago

Fantastic! Thank you @tuexen

This is the chrome.pcap.tar.gz that is coming from usrsctp in Chromium.

The sender is here

Unordered
0 Retransmissions
Each message is 50000 bytes, broken into 1200 byte fragments.

The abort comes from usrsctp here

You can reproduce by doing

Using main.go
demo.js
simulating packet loss using comcast --device=lo --latency=150 --packet-loss=10% --target-proto=udp

I get an ABORT after ~5 seconds.

Reass 30000007,CI:ffffffff,TSN=1e5e52d3,SID=0002,FSN=1e5e52d3,SSN:0000

I am happy to answer any questions about our SCTP implementation! You can easily run it by doing this

I also documented everything as I went in this ticket. This wasn't filed by a user, so everything should be relevant! Thank you so much for the help :)

tuexen commented 4 years ago

Could you provide a tracefile from the usrsctp side which contains the packets received and sent? Right now it only contains one direction.

One note about your configuration: a single user message needs 42 fragments, which make up one packet each. Using a packet loss rate of 10 %, this gives a chance of (9/10)^42 that a user message is received completely. This is approximately 1%. Not sure if this is what you intended to test, just making sure what we should observe. So for 100 messages you are sending, 1 should be received successfully.

mturnshek commented 4 years ago

@tuexen can you elaborate a bit on that derivation for a noob? 50000 bytes message = 42 fragments in SCTP, which are one packet each?

Sean-Der commented 4 years ago

@tuexen here is another one! Pion got Abort chunk, with following errors: (Protocol Violation: Reass 30000007,CI:ffffffff,TSN=cf800b55,SID=0002,FSN=cf800b55,SSN:0000)

fromChrome.tar.gz intoChrome.tar.gz

The 10% packet loss is just for an easy repro. @mturnshek is still hitting this in production though. My goal is just to have everything be as resilient as possible/not making users debug things. I don't have any expectations over how much data is actually transferred and at what speed.

@mturnshek yep exactly! Each SCTP packet contains at most a data fragment of 1200 bytes, so you need to split across all those packets.

Then once you split each of those fragments inside 42 separate packets have a TSN (SCTP protocols unique id). If any of those individual packets are lost, then the whole larger message needs to be discarded as well.

tuexen commented 4 years ago

@mturnshek @Sean-Der gives the correct explanation. If you are sending a user message of 50000 bytes and in a packet fit only 1200 bytes and considering 50000/1200 = 41 + 2/3, you need 42 packets. Since the number of retransmissions is 0, a single drop of a fragment required the receiver to drop all fragments of the user message.

tuexen commented 4 years ago

@Sean-Der Can't you provide a single tracefile showing the packets processed by the usrsctp stack within Chrome? I need the correct timing between received packets and sent packets. Firefox has this capability and I think Chrome has it so. Something like https://github.com/nplab/WebRTC-Data-Channel-Playground#trace-with-wireshark

I need to understand which packet sequence is sent by the peer triggering the ABORT. Either the packet sequence sent by the peer is violating the specification (which means the ABORT is fine and the peer stack needs to be fixed) or the code detecting the violation is wrong (which means the ABORT is wrong and we need to fix usrsctp). Therefore I need the packet sequence sent/received by usersctp with the correct timing.

If you can't get the logging from Chrome, maybe you can try with Firefox. It also uses the usrsctp stack...

Sean-Der commented 4 years ago

Unfortunately enabling the logging in Chromium/FireFox stops me from getting the abort sent back :(

Let me keep trying different combinations!

tuexen commented 4 years ago

OK, that is strange since it only changes the timing and I don't expect a race condition to be a problem here. How are you taking the traces if not via logging from usrsctp?

Sean-Der commented 4 years ago

I have been capturing them going in/out of the Go SCTP implementation.

Just adding it to the Read/Write functions.

If I put them in the same pcap would it be helpful? I will keep trying with the browsers though, or maybe just write a little C program and get browsers out of the equation.

Thanks again for all the help

tuexen commented 4 years ago

I see.

It would be best to see the perspective of the usrsctp stack, since Go's perspective is different. Packets are dropped, which are sent the the Go implementation. I can try to look at these files. At least I can see if the Go implementation behaves as expected.

Sean-Der commented 4 years ago

@tuexen Do you think it is possible that the cleaning up of reassembly queues from the forward TSN can cause this?

When getting a FORWARD-TSN the code cleans up some of the control stuff here

It it possible that when we get another fragment we enter the block for a new control here

Then we abort because we have a !B for the first fragment we have ever seen here

Also ping @lgrahl if you have any bandwidth/debug ideas I am out of them :( maybe I can spin up a rawrtc instance and repro/log easier.

lgrahl commented 4 years ago

Even though it's usrsctp-based, RAWRTC often behaves very different than browsers. So, I can't provide a definitive answer but it's probably worth a try? Keep in mind that Chrome uses an outdated usrsctp version and IIRC they don't even handle abandoned messages, so you'll likely end up with garbled messages sometimes. Anyway, I can give you pointers if needed but I atm don't know what you need. :slightly_smiling_face:

mturnshek commented 4 years ago

Unfortunately enabling the logging in Chromium/FireFox stops me from getting the abort sent back :(

Let me keep trying different combinations!

@Sean-Der Does this mean the data channels don't close when testing with Chrome with webrtc debug logs enabled?

tuexen commented 4 years ago

@Sean-Der I took a look at the trace files you provided.

One strange thing I observed is that the file intoChrome.pcap, which actually should contain all packets sent by your implementation, I think, only contains every other (or so) DATA chunk.

Looking into the concrete problem (only using the last 4 decimal digits for the TSNs):

Your implementation is sending at the end of the tracefile:

A user message using DATA chunks with TSN 5186, ..., TSN 5227.
A user message using DATA chunks with TSN 5228, ..., TSN 5269.
A user message using DATA chunks with TSN 5270, ..., TSN 5311.
A user message using DATA chunks with TSN 5312, ..., TSN 5353 (TSN 5329 is the last one contained in the tracefile).

For whatever reason, you implementation is sending a FORWARD-TSN chunk (the last packet in your tracefile), using 5221 as the new cumulative TSN ACK. This violates the specification, since it is the middle of the user message using the TSN 5186 ... TSN 5227. See A3 of RFC 3758. Therefore, I think the SCTP used by Chrome is sending the ABORT correctly.

Please correct me, if I'm wrong.

mturnshek commented 4 years ago

I don't know how helpful this will be for resolving, but I've collected some stats on when data channel closing seems to occur to find out if there was some set of supported conditions our production use can lean on.

I tested locally using modified versions of the gists Sean shared in the OP, using comcast to introduce varying packet loss and ~150ms latency on loopback.

Data points where I did not experience any ABORTs:

Packet loss	Message size	Rate
1%	2KB	10Hz
1%	800 bytes	100Hz

For the first case, maybe I didn't wait long enough. I saw ~10+ minutes without a failure with that data point, and I saw a ~1 minute failure with 4% 2KB 20Hz.

Some examples of conditions when ABORT occurs eventually:

Packet loss	Message size	Rate
0.50%	20KB	10Hz
1%	4KB	10Hz
1%	2KB	100Hz

Increasing the rate or message size increases the likelihood of an ABORT per unit time and reduces the proportion of messages received by the other side.

Within constant conditions, ABORT seems to follow a Poisson distribution (just as likely to fail from one moment to the next with constant probability per unit time), or at least semi-random, rather than a build-up and fail at a consistent time, from what I have observed.

The 1% packet loss, 2KB at 100Hz issuing ABORT while 800 bytes message does not issue ABORT looks to be in line with the current theory in discussion, though there may be other reasons why 2 fragments would cause an issue here where 1 fragment doesn't.

It's important to note is that the ABORT never appears to happen when there is no packet loss during these local tests. We have also experienced this issue more frequently when CPU usage on the Pion-side is very high due to competing processes.

tuexen commented 4 years ago

Assuming that my analysis is correct and there is a bug in the Pion SCTP stack as I described, then you will not observe is for small messages. The larger the message size, the more likely you run into this issue. I'm also not sure if all instances of such a bug will be caught by usrsctp. If needed I can see how pedantic usrsctp (or the FreeBSD implementation) is by writing some packetdrill scripts and test it. But I think there should be enough information to look at the Pion SCTP stack to figure out if the abandoning of fragmented user messages is implemented correctly.

The test case is a good one!

enobufs commented 4 years ago

Thank you @tuexen. Now, I feel more confident that fixing FWD-TSN to point to the end of fragments would fix the issue. (I wasn't fully aware of A3 of RFC 3758 https://tools.ietf.org/html/rfc3758#section-3.5 being a suggestion to how TSN is forwarded).

@Sean-Der I have some ideas for the fix. Would you like me to take over this tho it would need to be this weekend? Please let me know.

Sean-Der commented 4 years ago

Oh that is fantastic, thank you so much @tuexen! I didn't even question the TSN that the forward TSN was picking.

@enobufs if you have the bandwidth! Would you be ok if I just updated the code in FORWARD TSN that picks the TSN? For now I am just going to make it only select a DATA chunk that is a beginning packet.

w/e you have time we can do a deeper fix :) but please no rush!

Sean-Der commented 4 years ago

I can just add a if !beginningFragment here

enobufs commented 4 years ago

Of course - no problem. I will catch up with you this w/e!

enobufs commented 4 years ago

With the current master branch, I was able to repro the Protocol Violation error with Chrome, packet loss ratio ~ 4%, 32KB per message.

I tried the 585060f, but it stalls in about 10 seconds. By not advancing advancedPeerTSNAckPoint, it would stop generating Fwd TSN packet, which would result in leaving abandoned packets in the inflight queue forever because SACK will never get received for the chunks left in the queue without sending Fwd TSN packets (or actually retransmit them).

enobufs commented 4 years ago

This made me realized that advancing TSN to the end of the last fragment of a large message is not practical because the message size could be much larger than the current cwd (which could be as small as 1 MSS (1280 B), the rest of the fragments may not even have TSN assigned yet.

I also noticed, way before the abort occurs, we have been sending Fwd TSN chunk which advances the TSN to the chunk that is in the middle, without getting the violation error. There must be "other" reason that was causing the Abort.

So, I am going to step back and try to find out other causes of Abort. The weird thing I saw was that when I was browsing chrome://webrtc-internals/, the Abort never occured. (race or something?) - need to double-check.

enobufs commented 4 years ago

Let me take that back. Fwd TSN does not need to point to the TSN that would be the end of the chunk. It can be the last fragment that has TSN (it does not have to go beyond the inflightQueue because no one knows where the end would be). Important think (I believe) is to make sure that the residual fragments in the pending queue has to be removed also, so that pion/sctp would not send the tailing chunks that wouldn't have the "beginning" chunk. <= This could be the reason of the protocol violation error.

enobufs commented 4 years ago

I am still working at this, but I am seeing Chrome stop advancing cumulative TSN, and advertized RWND is being zero!

I am using partial reliability, unorder remix being 0 (no retraminsmits), forcibly dropping packets at 4%... Each message is 32KB, fragmented into 28 chunks always.

sctp TRACE: 22:14:55.128202 association.go:2077: [0xc000330000] updated cwnd=1228 ssthresh=4912 inflight=2400 (RTO)
sctp DEBUG: 22:14:55.128265 association.go:2080: [0xc000330000] T3-rtx timed out: nRtos=3 cwnd=1228 ssthresh=4912
sctp DEBUG: 22:14:55.128295 association.go:2081:    - advancedPeerTSNAckPoint=2935603294
sctp DEBUG: 22:14:55.128312 association.go:2082:    - cumulativeTSNAckPoint=2935603294
sctp DEBUG: 22:14:55.128344 association.go:2087:    - [0] tsn=2935603295 acked=false abandoned=false (false,false) len=1200
sctp DEBUG: 22:14:55.128375 association.go:2087:    - [1] tsn=2935603296 acked=false abandoned=false (false,false) len=1200
sctp TRACE: 22:14:55.128419 association.go:1861: [0xc000330000] final (abandoned) tsn=2935603295 (remix: 4)
sctp TRACE: 22:14:55.128448 association.go:1909: [0xc000330000] retransmitting tsn=2935603295 ssn=1 sent=4
sctp TRACE: 22:14:55.128497 association.go:449: [0xc000330000] sending a packet
sctp TRACE: 22:14:55.361747 association.go:1394: [0xc000330000] SACK: cumTSN=2935603294 a_rwnd=0
sctp TRACE: 22:14:55.361824 association.go:1482: [0xc000330000] T3-rtx timer start (pt3)
sctp TRACE: 22:14:59.329558 association.go:947: [0xc000330000] chunkHeartbeat
sctp WARNING: 2020/02/09 22:14:59 [0xc000330000] forcibly droping a packet
sctp TRACE: 22:15:03.132927 association.go:2077: [0xc000330000] updated cwnd=1228 ssthresh=4912 inflight=2400 (RTO)
sctp DEBUG: 22:15:03.133006 association.go:2080: [0xc000330000] T3-rtx timed out: nRtos=4 cwnd=1228 ssthresh=4912
sctp DEBUG: 22:15:03.133044 association.go:2081:    - advancedPeerTSNAckPoint=2935603294
sctp DEBUG: 22:15:03.133071 association.go:2082:    - cumulativeTSNAckPoint=2935603294
sctp DEBUG: 22:15:03.133122 association.go:2087:    - [0] tsn=2935603295 acked=false abandoned=false (false,false) len=1200
sctp DEBUG: 22:15:03.133168 association.go:2087:    - [1] tsn=2935603296 acked=false abandoned=false (false,false) len=1200
sctp TRACE: 22:15:03.133240 association.go:1861: [0xc000330000] final (abandoned) tsn=2935603295 (remix: 5)
sctp TRACE: 22:15:03.133284 association.go:1909: [0xc000330000] retransmitting tsn=2935603295 ssn=1 sent=5
sctp TRACE: 22:15:03.133407 association.go:449: [0xc000330000] sending a packet
sctp TRACE: 22:15:03.361400 association.go:1394: [0xc000330000] SACK: cumTSN=2935603294 a_rwnd=0
sctp TRACE: 22:15:03.361474 association.go:1482: [0xc000330000] T3-rtx timer start (pt3)
sctp TRACE: 22:15:19.137241 association.go:2077: [0xc000330000] updated cwnd=1228 ssthresh=4912 inflight=2400 (RTO)
sctp DEBUG: 22:15:19.137358 association.go:2080: [0xc000330000] T3-rtx timed out: nRtos=5 cwnd=1228 ssthresh=4912
sctp DEBUG: 22:15:19.137405 association.go:2081:    - advancedPeerTSNAckPoint=2935603294
sctp DEBUG: 22:15:19.137429 association.go:2082:    - cumulativeTSNAckPoint=2935603294
sctp DEBUG: 22:15:19.137475 association.go:2087:    - [0] tsn=2935603295 acked=false abandoned=false (false,false) len=1200
sctp DEBUG: 22:15:19.137519 association.go:2087:    - [1] tsn=2935603296 acked=false abandoned=false (false,false) len=1200
sctp TRACE: 22:15:19.137615 association.go:1861: [0xc000330000] final (abandoned) tsn=2935603295 (remix: 6)
sctp TRACE: 22:15:19.137677 association.go:1909: [0xc000330000] retransmitting tsn=2935603295 ssn=1 sent=6
sctp TRACE: 22:15:19.137913 association.go:449: [0xc000330000] sending a packet
sctp TRACE: 22:15:19.369791 association.go:1394: [0xc000330000] SACK: cumTSN=2935603294 a_rwnd=0
sctp TRACE: 22:15:19.369863 association.go:1482: [0xc000330000] T3-rtx timer start (pt3)
sctp TRACE: 22:15:35.002331 association.go:947: [0xc000330000] chunkHeartbeat
sctp WARNING: 2020/02/09 22:15:35 [0xc000330000] forcibly droping a packet

@tuexen Do you have any thoughts?

The above log shows two chunks in the inflight queue (TSN=2935603295, and TSN=2935603296). On T3-rtx timeout, pion retransmits 2935603295. SACK comes back but with the older TSN (it did receive the chunk but did not accept because the receive buffer is full (a_wnd=0). My app on Chrome browser does not notice anything, waiting for the onmessage callback on the data channel...

Chrome I am using: Version 79.0.3945.130 (Official Build) (64-bit) On macOS.

enobufs commented 4 years ago

@tuexen @Sean-Der I think I found the cause of the a_rwnd=0 I mentioned above, which could be the root cause of the protocol violation we saw earlier.

Forward TSN chunk has the new cumulative TSN ack point, as well as a set of stream identifiers, or SI and a stream sequence number, or SSN. (RFC 3758 sec 3.2)

I noticed, Fwd-TSN pion/sctp sends occasionally have the SI/SSN field empty. This was caused by a race condition between createForwardTSN() and handleSack() routines happening simultaneously. By fixing it, the stall by a_rwnd=0 stopped happening, and the partially-reliable traffic is flowing between pion and Chrome without a problem.

(@tuexen usrsctp could send protocol violation error on this case rather than quietly stops handling inbound data - might be dead-locked?)

@Sean-Der Finally I see the way out, but I need to fix a bunch of testing before I can push a PR. Please give me a few moments to do so.

tuexen commented 4 years ago

@enobufs For unordered messages you don't fill in the SID/SSN fields, since the SSN is irrelevant for unordered messages.

enobufs commented 4 years ago

@tuexen Ah right, I was using 'ordered' just for now because it happens right away.

Even tho SSN is irrelevant in 'unordered', SI is useful when handing the FwdTSN to the "stream" layer.

In usrsctp, are you sure it does not use SI value in 'unordered'? I saw the Chrome side stall with a_rwnd=0 also with unordered chunks. I can say "a_rwnd=0" happens right away (more quickly) with 'ordered' traffic with the empty SID/SSN fields.

   Stream Sequence-N: 16 bit u_int

    This field holds the sequence number associated with the stream
    that was skipped.  The stream sequence field holds the largest
    stream sequence number in this stream being skipped.  The receiver
    of the FWD-TSN's can use the Stream-N and Stream Sequence-N fields
    to enable delivery of any stranded TSN's that remain on the stream
    re-ordering queues.  This field MUST NOT report TSN's corresponding
    to DATA chunks that are marked as unordered.  For ordered DATA
    chunks this field MUST be filled in.

This field MUST NOT report TSN's corresponding to DATA chunks that are marked as unordered. - this is not clear to me but I would assume you still want to have at least one SID/SSN field and SSN shouldn't be used when "unordered".... What do you think?

tuexen commented 4 years ago

I agree that the text is not optimal... However, the intention is that you list for ordered messages the SID/SSN pair, for unordered nothing. It has the be like this, because the receiver has no idea what was sent ordered or unordered. So it can't know if a SID only is listed or an SID/SSN pair.

See the loop at sctp_output.c:11052. Please note that for I-DATA, things are different.

enobufs commented 4 years ago

because the receiver has no idea what was sent ordered or unordered

Yes, it can (as pion/sctp currently works that way). If you have SID, the stream layer with the SID would know which one it is expecting.

It's too bad, I thought sending a SID/SSN pair always really helps to forward the fwdtsn to appropriate stream layer, and it still makes sense. @tuexen you are the co-author of the RFC, and usrsctp, pion/sctp needs to modify how to handle it regardless...

I saw this line generating SCTP_CAUSE_PROTOCOL_VIOLATION during the handling of fwdtsn.

    asoc->cumulative_tsn = new_cum_tsn;
    if (gap >= m_size) {
        if ((long)gap > sctp_sbspace(&stcb->asoc, &stcb->sctp_socket->so_rcv)) {
            struct mbuf *op_err;
            char msg[SCTP_DIAG_INFO_LEN];

            /*
             * out of range (of single byte chunks in the rwnd I
             * give out). This must be an attacker.
             */
            *abort_flag = 1;
            snprintf(msg, sizeof(msg),
                     "New cum ack %8.8x too high, highest TSN %8.8x",
                     new_cum_tsn, asoc->highest_tsn_inside_map);
            op_err = sctp_generate_cause(SCTP_CAUSE_PROTOCOL_VIOLATION, msg);
            stcb->sctp_ep->last_abort_code = SCTP_FROM_SCTP_INDATA + SCTP_LOC_36;
            sctp_abort_an_association(stcb->sctp_ep, stcb, op_err, SCTP_SO_NOT_LOCKED);
            return;
        }

Having a hard time deciphering the condition and wondering if you could give some of your insight... Do you think it could be related to the error we saw?

Also, is there a good way to map a Chrome version to usrsctp revision?

enobufs commented 4 years ago

Sorry... never mind. The error code was SCTP_FROM_SCTP_INDATA + SCTP_LOC_7 (30000007). So, that is not it. Now, in my fix, FwdTSN always points to a chunk with B = 1. I no longer see the error.

enobufs commented 4 years ago

I have double-checked, with unordered data channel, if the sender (pion) sends fwdtsn chunk without SID/SSN pair, Chrome stops processing inbound data and its receive buffer gets exhausted (SACK with a_rwnd = 0 comes back). With an SID/SSN pair, it flows no problem.

tuexen commented 4 years ago

because the receiver has no idea what was sent ordered or unordered

Yes, it can (as pion/sctp currently works that way). If you have SID, the stream layer with the SID would know which one it is expecting.

This is only true for the use case of WebRTC... In SCTP, the ordered / unordered property is not a property of the stream, but a property of the user message. Therefore, an SCTP receiver doesn't know. Only WebRTC (an upper layer from the perspective of the SCTP stack) sends all user messages of the same stream with the same property.

It's too bad, I thought sending a SID/SSN pair always really helps to forward the fwdtsn to appropriate stream layer, and it still makes sense. @tuexen you are the co-author of the RFC, and As I said: With I-DATA you get message identifiers and reports for unordered and unordered messages in I-FORWARD-TSN chunks. usrsctp, pion/sctp needs to modify how to handle it regardless...

I saw this line generating SCTP_CAUSE_PROTOCOL_VIOLATION during the handling of fwdtsn.
  asoc->cumulative_tsn = new_cum_tsn;
  if (gap >= m_size) {
      if ((long)gap > sctp_sbspace(&stcb->asoc, &stcb->sctp_socket->so_rcv)) {
          struct mbuf *op_err;
          char msg[SCTP_DIAG_INFO_LEN];

          /*
           * out of range (of single byte chunks in the rwnd I
           * give out). This must be an attacker.
           */
          *abort_flag = 1;
          snprintf(msg, sizeof(msg),
                   "New cum ack %8.8x too high, highest TSN %8.8x",
                   new_cum_tsn, asoc->highest_tsn_inside_map);
          op_err = sctp_generate_cause(SCTP_CAUSE_PROTOCOL_VIOLATION, msg);
          stcb->sctp_ep->last_abort_code = SCTP_FROM_SCTP_INDATA + SCTP_LOC_36;
          sctp_abort_an_association(stcb->sctp_ep, stcb, op_err, SCTP_SO_NOT_LOCKED);
          return;
      }
Having a hard time deciphering the condition and wondering if you could give some of your insight... Do you think it could be related to the error we saw?

If you have a gap of TSNs which is larger than the receive buffer size (each TSN contains at leaast one byte of user data) then the peer is not honouring our receive buffer size. So we send an ABORT.

Also, is there a good way to map a Chrome version to usrsctp revision?

No. That is up to google. I know that they haven't updated for some time.

tuexen commented 4 years ago

Sorry... never mind. The error code was SCTP_FROM_SCTP_INDATA + SCTP_LOC_7 (30000007). So, that is not it. Now, in my fix, FwdTSN always points to a chunk with B = 1. I no longer see the error.

Shouldn't the New Cumulative TSN you put in the FORWARD-TSN be be the TSN of the chunk with the E-bit set (the last fragment of the abandoned user message)?

tuexen commented 4 years ago

I have double-checked, with unordered data channel, if the sender (pion) sends fwdtsn chunk without SID/SSN pair, Chrome stops processing inbound data and its receive buffer gets exhausted (SACK with a_rwnd = 0 comes back). With an SID/SSN pair, it flows no problem.

Please set the New Cumulative TSN correctly (see my earlier response). If the problem persists, can you provide a .pcapng file showing the SCTP packets?

yutakasg commented 4 years ago

Oops, yes I meant E=1. I want to check with packets on the wire too. I will try to get pcap and share that with you. Thanks @tuexen for your support. Really appreciated.

pion / sctp

Pion <-> usrsctp under 10% packet loss gets an abort #104