Closed HotPocketRemix closed 4 years ago
padding based on the entire file read is a bit annoying as you can't use an array, is assume. you're probably best to break up the parsing.
class Header < BinData
endian big
uint8 :header_data
end
class Chunk < BinData
endian big
uint8 :struct_entries_etc
end
header = io.read_bytes(Header)
chunks = [] of Chunk
loop do
break if io.closed?
chunk << io.read_bytes(Chunk)
io.skip calculate_padding(io.pos)
end
You can still have a parent class that provides a nice interface.
# continuing from the example above
file = io.read_bytes(StreamingFile)
file.header
file.chunks
Example where I break up the parsing https://github.com/spider-gazelle/crystal-bacnet/blob/master/src/bacnet/secure_message.cr#L18
Unfortunately, the data is contained inside a nested bunch of several other structures, so to split parsing in the middle would also be very difficult. Not the friendliest of format I have to deal with!
Basically, it's something like a RIFF structure, where chunks must be of even length (though in my case, divisible by 16 instead), but chunks can also have subchunks, and those chunks can have subchunks, etc, so it may not even be known ahead of time how many chunks there are total, but each chunk is still padded. RIFF at least just pads the length of each chunk to be even, as opposed to the whole stream so far.
I'll see if I can make some assumptions to help split up the data so I don't have to do so much computation.
would it be useful if the IO#pos was passed to the length callbacks? i.e.
bytes :padding_bytes, length: ->(io_pos : Int32) { calculate_padding(io_pos) }
That might be possible without being a breaking change
Actually by pure coincidence it looks like the io
is available within the callbacks
so you can do
bytes :padding_bytes, length: -> { calculate_padding(io.pos) }
Oh, I didn't even think to try that! That should work perfectly, thanks!
Is there a way to check the
pos
of the IO stream as part of a format specification? I didn't see it in the examples, but I'm relatively new to Crystal so I'm not sure if I just missed it. Consuming the remaining bytes isn't appropriate because this is in the middle of the format, not at the end.More concretely, a few formats that I'd like to parse require alignment after some of the data has been read (for example, after a data chunk has been read, there are padding bytes until the length of the stream has reached a multiple of 16), and normally I'd read a dummy array of the appropriate size and verify that all the entries are 0 but to compute the size of that array, I'd need to know the current position of the stream. I could compute it manually, but if there's a lot of data - especially nested data - before that point, it would be very difficult to keep track.
(To be clear, I can't just round up the size of the data chunk to the next multiple of 16, because the format pads against the length of the entire file read so far, not just the data chunk.)