sccn / xdf

BSD 2-Clause "Simplified" License
87 stars 34 forks source link

Scan forward seeks a byte before #34

Closed dojeda closed 5 years ago

dojeda commented 5 years ago

Hi, I have been reading the load_xdf function with great interest. I am looking forward to propose a couple of modifications to improve it, but I have found a weird inconsistency, or misunderstanding. Perhaps someone could shed a light on this?

I am interested in reading a file without decoding its contents. More precisely, I want to read the info headers for each stream but not their contents. I was digging into the code and found that when there is some corrupted data, there is a _scan_forward function that reads until a boundary chunk is found https://github.com/sccn/xdf/wiki/Specifications#boundary-chunk. This functionality is great for robustness, but when I read the code:

def _scan_forward(f):
    """Scan forward through the given file object until after the next
    boundary chunk."""
    blocklen = 2**20
    signature = bytes([0x43, 0xA5, 0x46, 0xDC, 0xCB, 0xF5, 0x41, 0x0F,
                       0xB3, 0x0E, 0xD5, 0x46, 0x73, 0x83, 0xCB, 0xE4])
    while True:
        curpos = f.tell()
        block = f.read(blocklen)
        matchpos = block.find(signature)
        if matchpos != -1:
            f.seek(curpos + matchpos + 15)
            logger.debug('  scan forward found a boundary chunk.')
            break
        if len(block) < blocklen:
            logger.debug('  scan forward reached end of file with no match.')
            break

... I am confused to see that when pattern is found, the file pointer is moved to the matching position + 15... The pattern is 16 bytes long, so using 15 would make the file reader re-read the 0xE4 byte. In both cases where this function is called, the next instruction is either a continue or the end of the loop, so the next operation is to determine a variable int length with _read_varlen_int, which will fail because the next byte, being 0xE4, is not 0, 1, 4 or 8. This, in turn, will warn with "got zero-length chunk, scanning forward to next boundary chunk", which is actually not an accurate message, and then continue scanning for the next boundary chunk.

This pattern would continue on this error/find boundary/fail loop until the file is consumed.

cbrnr commented 5 years ago

Fixed in https://github.com/xdf-modules/xdf-Python/pull/6.

cbrnr commented 5 years ago

@cboulay this can be closed.