nasa-jpl / ION-DTN

NASA Open Source ION Software implementation of Delay Tolerant Networking. ION development is managed by the Jet Propulsion Lab; regression testing and code management are provided by Ohio University.
https://ion-dtn.readthedocs.io/en/latest/
Other
26 stars 6 forks source link

CFDP issue #31

Open dz-dtn opened 8 months ago

dz-dtn commented 8 months ago

CFDP reception issue. If the final segment is shorter than the rest of the segments and it is received prior to the next to last segment then the file gets corrupted. Depending on the relative lengths, the last segment can be partially corrupted or the file can end up longer than the original file. I ran into this issue while using LTP with dropped packets causing numerous cancelled sessions. It is easier to recreate if configured for 1 bundle per LTP session.

It looks like the code was written to support duplicate and overlapping segments of various sizes so the snippet below attempts to continue that tradition.

In libcfdpP.c:

4169     /*  Now skip over all data that were written when
4170      *  this extent was posted.       */
4171 
4172     //BUG:   bytesToSkip = nextExtent.length;
4173     bytesToSkip = bytesToWrite;
4174     if (bytesRemaining > bytesToWrite) {
4175         if ((bytesRemaining - bytesToWrite) <= nextExtent.length) {
4176             // the remainder is fully contained within the next extent
4177             bytesToSkip = bytesRemaining;
4178         } else {
4179             // this segment extends past the nextExtent - is that possible?
4180             bytesToSkip += nextExtent.length;
4181         }
4182     } 
4183 
4184     segmentOffset += bytesToSkip;
4185     cursor += bytesToSkip;
4186     bytesRemaining -= bytesToSkip;
type6six commented 8 months ago

Hi, I tried to recreate the problem you are reporting by making the final segments arrive out of order.

For example ....segment n-3, n-2, n-1, n, EOF

I send ....segment n-3, n-2, n, n-1, EOF

Is this correct? The file I received appears to be error-free.

Can you indicate the signs of CFDP failure? Was it a log message, or did you check the file, and it failed checksum? You indicated that the file can sometimes be larger than expected, which I did not see in my test.

I did not use LTP; instead, I simply used UDPCLA to make it easier to manipulate the arrival order of the CFDP segments. It would be very helpful if you could provide additional information on how ION reports the error.

Also if you can tell us the ION version and BP version used that would be great! Thanks!

type6six commented 8 months ago

I have also tried to swap the EOF arrival order this way:

....segment n-3, n-2, n-1, EOF, n

So far this does not produce any error, I will continue testing.

dz-dtn commented 8 months ago

Hello,

I run md5sum on the original file and then again on the received file to verify it was received correctly.

What is the size of the file and the CFDP segment size you are using?

I will get more info for you.

Thanks,

DZ

From: type6six @.> Sent: Monday, March 11, 2024 8:47 PM To: nasa-jpl/ION-DTN @.> Cc: dz-dtn @.>; Author @.> Subject: Re: [nasa-jpl/ION-DTN] CFDP issue (Issue #31)

Hi, I tried to recreate the problem you are reporting by making the final segments arrive out of order.

For example ....segment n-3, n-2, n-1, n, EOF

I send ....segment n-3, n-2, n, n-1, EOF

Is this correct? The file I received appears to be error-free.

Can you indicate what are the signs of CFDP failure? Was it a log message, or did you check the file and it failed checked sum? You indicated the file can sometimes be larger too.

I did not use LTP, instead I simply use UDPCLA to make it easier to manipulate the UDP packet arrival order. If you can provide additional information on how ION reports the error, that will be helpful. Thanks!

— Reply to this email directly, view it on GitHub https://github.com/nasa-jpl/ION-DTN/issues/31#issuecomment-1989766800 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ATDYK7W3PMBDINTHVRLWCVDYXZUDPAVCNFSM6AAAAABDVYZOI6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBZG43DMOBQGA . You are receiving this because you authored the thread. https://github.com/notifications/beacon/ATDYK7SIKXX5VSDSXREQ7JDYXZUDPA5CNFSM6AAAAABDVYZOI6WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTWTFXJA.gif Message ID: @. @.> >

type6six commented 8 months ago

I also used md5sum to check, and they seem to match. I sent a 100,000-byte file.

The data segment is configured to be 1200 bytes of file data using the cfdprc file. Adding the CFDP header and bundle header takes it up to 1296 according to wireshark.

The last data segment is 400 bytes; adding CFDP and bundle header goes up to 539 bytes.

dz-dtn commented 8 months ago

Good morning,

I added some debug output to libcfdpP.c and used a 1600 byte file so there are only 2 File Data PDUs when using 1200 byte segments.

Below is the output for receiving the segments in order and out of order:

[2024/03/14-12:57:18] at line 3893 of cfdp/library/libcfdpP.c, dzdebug - cfdp got segment (0)

[2024/03/14-12:57:18] at line 4229 of cfdp/library/libcfdpP.c, dzdebug - write final hunk: (SegOffset: 1200 BytesToWrite: 1200 FileSize: 1200)

[2024/03/14-12:57:18] at line 3893 of cfdp/library/libcfdpP.c, dzdebug - cfdp got segment (1200)

[2024/03/14-12:57:18] at line 4229 of cfdp/library/libcfdpP.c, dzdebug - write final hunk: (SegOffset: 1600 BytesToWrite: 400 FileSize: 1600)

[2024/03/14-12:57:18] at line 3695 of cfdp/library/libcfdpP.c, dzdebug - ======================= completed file =======================

[2024/03/14-12:57:48] at line 3893 of cfdp/library/libcfdpP.c, dzdebug - cfdp got segment (1200)

[2024/03/14-12:57:48] at line 4229 of cfdp/library/libcfdpP.c, dzdebug - write final hunk: (SegOffset: 1200 BytesToWrite: 400 FileSize: 1600)

[2024/03/14-12:57:48] at line 3893 of cfdp/library/libcfdpP.c, dzdebug - cfdp got segment (0)

[2024/03/14-12:57:48] at line 4183 of cfdp/library/libcfdpP.c, dzdebug - in loop write: (SegOffset: 0 BytesToWrite: 1200 nextExtent.length: 400)

[2024/03/14-12:57:48] at line 4229 of cfdp/library/libcfdpP.c, dzdebug - write final hunk: (SegOffset: 400 BytesToWrite: 800 FileSize: 2000)

(The file is not repositioned after writing out the 1200 bytes so when it writes an additional 800 bytes it writes over the last segment and extends the file in this case)

[2024/03/14-12:57:52] [?] Late arrival of Metadata, so checksums will not match: /dtn_xfer/iss/file_1600_32774_xfer

[2024/03/14-12:57:52] at line 3695 of cfdp/library/libcfdpP.c, dzdebug - ======================= completed file =======================

Please let me know if you would like additional info.

Thanks,

DZ

From: type6six @.> Sent: Wednesday, March 13, 2024 12:18 PM To: nasa-jpl/ION-DTN @.> Cc: dz-dtn @.>; Author @.> Subject: Re: [nasa-jpl/ION-DTN] CFDP issue (Issue #31)

I also used md5sum to check, and they seem to match. I sent a 100,000-byte file.

The data segment is configured to be 1200 bytes of file data using the cfdprc file. Adding the CFDP header and bundle header takes it up to 1296 according to wireshark.

The last data segment is 400 bytes; adding CFDP and bundle header goes up to 539 bytes.

— Reply to this email directly, view it on GitHub https://github.com/nasa-jpl/ION-DTN/issues/31#issuecomment-1995054894 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ATDYK7WDWJOWP5TGXSCBBL3YYCC4TAVCNFSM6AAAAABDVYZOI6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJVGA2TIOBZGQ . You are receiving this because you authored the thread. https://github.com/notifications/beacon/ATDYK7VY27LEIGKXW7GFCH3YYCC4TA5CNFSM6AAAAABDVYZOI6WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTW5IPS4.gif Message ID: @. @.> >

type6six commented 8 months ago

D,

I have read through the code to the point where you posted the bug fix. And I agree with your fix.

The bytesToSkip only filled the extension of the previous extent; it has not been written to the "next extent" yet. Your additional checks make sense since we need to know if the segment can go beyond the nextExtent in the more generic case.

That probably won't happen if the sender side is ION, since we limit the segment size. However, your fix should handle the more generic case where the sender may decide to send a much larger segment to cover multiple gaps all at once. Perhaps it's a bit wasteful, but it should work.

I have not tried to replicate your new setup but will do so soon and test the fix, which I expect will take care of the issue.

J

dz-dtn commented 8 months ago

J,

I agree that with the current implementation there will only ever be the one write to fill the data up to the next extent and then break out of the loop with nothing left to write.

Thanks,

DZ

From: type6six @.> Sent: Thursday, March 14, 2024 4:15 PM To: nasa-jpl/ION-DTN @.> Cc: dz-dtn @.>; Author @.> Subject: Re: [nasa-jpl/ION-DTN] CFDP issue (Issue #31)

D,

I have read through the code to the point where you posted the bug fix. And I agree with your fix.

The bytesToSkip only filled the extension of the previous extent, it has not written to the "nextExtent" yet. Your additional checks makes sense since we need to know if the segment can somehow go beyond the nextExtent itself, in the more generic case.

That probably won't happen since on ION's transmit side we limit the size of the segment. However, your fix should handle the more generic case where the sender may decide to send a much larger segment to cover multiple gaps all at once? Perhaps a bit wasteful but should work.

I have not try to replicate your new setup but will do so and test if the fix takes care of, which I expect will.

J

— Reply to this email directly, view it on GitHub https://github.com/nasa-jpl/ION-DTN/issues/31#issuecomment-1998495157 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ATDYK7RRHYU5CGSSQN5Z36DYYIHOFAVCNFSM6AAAAABDVYZOI6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJYGQ4TKMJVG4 . You are receiving this because you authored the thread. https://github.com/notifications/beacon/ATDYK7X764ZSQUG2OFXMONDYYIHOFA5CNFSM6AAAAABDVYZOI6WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTXD2O3K.gif Message ID: @. @.> >

type6six commented 7 months ago

D,

I posted a patch for this issue. The fix is not the same as you recommended. I noticed a few nuances about how "segmenOffset" was updated when computing different checksum types, so I have adjusted what you proposed to fit that.

Based on my preliminary tests, the patch fixed both failures of either of md5sum or a longer file than expected. I've attached a patch based on the 4.1.2 release.

patch-4.1.2-cfdp-github-issue-31.txt

We will try to spend some cycles testing CFDP after 4.1.3 comes out and then release the patch as part of 4.1.4.

Thanks for your valuable support!!

J

iondev33 commented 5 months ago

Hearing no further error report, I am closing this ticket. The patch will be part of 4.1.4 release, scheduled by end of summer.

iondev33 commented 3 weeks ago

This issue is re-opened due to detected failure. Additional debugging will be performed before 4.1.4 release. The previously issued patch should be ignored.