nasa / bplib

Apache License 2.0
30 stars 13 forks source link

bpcat file transfer issue after about 400MB #146

Closed jphickey closed 2 years ago

jphickey commented 2 years ago

The "bpcat" test tool transfers data via BPv7 and custody transfer between nodes. When testing with larger files, it initially starts out fine and runs for a while, but eventually hits a decode error on the receiver:

https://github.com/nasa/bplib/blob/b3f9f7d53212ab6a228fc7afcd075725a28ce1bd/lib/v7_cla_api.c#L160

Hypothesis is that there may have been an out-of-memory situation on the sender side that wasn't properly handled. That is, it was unable to transfer the whole bundle via RAM, so it may have been sent in a truncated or otherwise corrupted state, instead of being dropped and retried later.

Also, once this condition is hit, it does not seem to recover from it - data flow is no longer successful, also suggesting something is not being cleaned up properly after the error.

jphickey commented 2 years ago

The design change idea described in issue #108 may also help here - as it will reduce pool use by not temporarily storing bundle CBOR data in RAM.

Still would need to identify why this issue occurs though (because it could still be there, just less likely) and also identify why data flow doesn't recover after the error, so it may be worth identifying the root cause here before implementing that change.