jam1garner / binrw

A Rust crate for helping parse and rebuild binary data using ✨macro magic✨.
https://binrw.rs
MIT License
545 stars 35 forks source link

Remove `BinRead::after_parse` and add offset-table helpers #210

Closed csnover closed 10 months ago

csnover commented 1 year ago

This trait method exists pretty much exclusively to support deferred reads of FilePtr values but causes ongoing trouble elsewhere and the benefits do not seem to outweigh the problems:

  1. It requires T::Args to be cloneable in many cases where it should not be the case, which causes confusion, has a strong chance of causing accidental slowness, and makes it unnecessarily hard to move from imports;
  2. An analysis I ran of binrw users on GitHub showed that pretty much all cases of FilePtr were using FilePtr::parse or deref_now, so any potential performance benefit does not seem to be realised by real-world projects;
  3. Because there is no hard requirement to call after_parse and it mostly will not break anything, it is too easy for users to write custom implementations that do not do this and so are subtly broken. From the same GH analysis, there was only one case where I found someone who wrote a custom implementation that correctly called after_parse;
  4. Since after_parse does not build a stack of objects to revisit later, its ability to avoid non-linear reads of data is limited to at most one struct or enum at a time anyway.

Given these things (and probably others that I forget), IMO the existence of this feature is not justified. Instead, I think that a design that reads offsets into a Vec<{integer}> and then iterates over them later to convert into Vec<T> is preferable.

To that end, this patch set includes some new helper functions in file_ptr that enable efficient offset table pattern, with the added benefit of (I think?) solving the absolute/relative offset issue: now, a user puts the data field in the correct place in the struct and it Just Works, or else can use #[br(seek_before)] on the data field if the offsets are relative to a different position in the file, which means the offset position can be relative or absolute and there is no need for special offset plumbing.

I think this also suggests a path forward for offset table writing, where a temporary marker for the offset table can be created that zero-fills the space, and then the data field and the marker are passed to a helper function that writes the data and then writes out the offset table, but that isn’t something I’ve actually prototyped yet.

csnover commented 10 months ago

This is merged now.