Erasure coding utilities intended for the marfs multicomponent DAL. These service the creation, retrieval, and maintenance of erasure coded data stripes spread accross multiple files.
Other
4
stars
5
forks
source link
Read can fail with a specific sequence of read sizes when a block file is truncated #5
It is possible for ne_read() to fail if a stripe has been corrupted in a particular way (seems to be when one block file is truncated to a smaller size, but I cannot guarantee that is the only time) and reads of varying sizes are issued in a particular (as yet unidentified) order. The ordering seems to be a small read that does not cover a full stripe (or the corrupted block) followed by a larger read that covers the corrupted block.
The error appears to be caused by corruption of the llcounter variable leading to an unreasonably large seek in all the blocks when attempting to restart the read and recover from reading too little from one block file.
The failure is very difficult to trigger. Currently I am using random sequences of read sizes to trigger it, but once I have a better idea of the read sizes that cause it I can create a better reproducer.
It is possible for
ne_read()
to fail if a stripe has been corrupted in a particular way (seems to be when one block file is truncated to a smaller size, but I cannot guarantee that is the only time) and reads of varying sizes are issued in a particular (as yet unidentified) order. The ordering seems to be a small read that does not cover a full stripe (or the corrupted block) followed by a larger read that covers the corrupted block.The error appears to be caused by corruption of the
llcounter
variable leading to an unreasonably large seek in all the blocks when attempting to restart the read and recover from reading too little from one block file.The failure is very difficult to trigger. Currently I am using random sequences of read sizes to trigger it, but once I have a better idea of the read sizes that cause it I can create a better reproducer.