mar-file-system / erasureUtils

Erasure coding utilities intended for the marfs multicomponent DAL. These service the creation, retrieval, and maintenance of erasure coded data stripes spread accross multiple files.
Other
4 stars 5 forks source link

Read can fail with a specific sequence of read sizes when a block file is truncated #5

Closed wfvining closed 7 years ago

wfvining commented 7 years ago

It is possible for ne_read() to fail if a stripe has been corrupted in a particular way (seems to be when one block file is truncated to a smaller size, but I cannot guarantee that is the only time) and reads of varying sizes are issued in a particular (as yet unidentified) order. The ordering seems to be a small read that does not cover a full stripe (or the corrupted block) followed by a larger read that covers the corrupted block.

The error appears to be caused by corruption of the llcounter variable leading to an unreasonably large seek in all the blocks when attempting to restart the read and recover from reading too little from one block file.

The failure is very difficult to trigger. Currently I am using random sequences of read sizes to trigger it, but once I have a better idea of the read sizes that cause it I can create a better reproducer.

wfvining commented 7 years ago

The commit above resolves this issue.