lunixbochs / struc

Better binary packing for Go
MIT License
572 stars 45 forks source link

Ability to lazily pack/unpack data #82

Open lupine opened 4 years ago

lupine commented 4 years ago

Hey,

I'm handling some legacy game data using struc, and it's awesome :) - here's an example of usage: https://code.ur.gs/lupine/ordoor/src/branch/master/internal/maps/maps.go

Being able to represent the whole file (1.4MiB for the map) in a single struct is very pleasing :thumbsup:. I ran into a couple of things that already have pre-existing issues, but I found workarounds, and overall I'm really happy with how the conversion from encoding/binary turned out for this code.

Now I'm hoping to convert some code that processes image data. The most extreme example I have is a 400MiB file of animations, each of which is RLE-encoded. It's currently processed with encoding/binary and a lot of special-case code like this: https://code.ur.gs/lupine/ordoor/src/branch/master/internal/data/object.go

Even with modern computers, holding the whole 400MiB file in RAM at once isn't great - my low-end personal laptop has 2GiB RAM, and if I read + decode the whole file, it might take up 1GiB all by itself, when I'm only interested in a tiny amount of the total. The original game had system requirements of 16MiB RAM :sweat_smile:.

To convert this file to struc, I need to be able to specify that the data should be lazily read, which is what happens at the moment with the LoadObjectLazily method.

What do you think to an approach like:

type Sprite struct {
    // ... various header fields
    CompressedSize int `struc:"uint32,sizeof=Data"`
    // More headers
   Data io.Reader `struc:"lazy"`
}

?

Using it would look like:

sprite := Sprite{}
f, _ := os.Open("file.obj")
_ = struc.UnpackWithOrder(f, &sprite, binary.LittleEndian)
sprite.Data = newStreamingRleReader(sprite.Data)

The way I see this working is that Unpack* starts to take a ReadSeeker instead of a Reader. It can use that to build objects that will seek-then-read the data when used, and populate the struct with them. For backward compatibility, it could continue to take a Reader, but try to upgrade to a ReadSeeker if a field like this exists.

The readers would only be usable for as long as f is valid, and it would be up to the caller to arrange that. If we never read from sprite.Data, then it's never pulled into memory.

Even more ideally, I'd be able to tell struc to automatically do the "wrap it in RLE encoding" bit, but that might be too ambitious :sweat_smile:

The lazy part could also be useful for slice members generally, although we'd need to provide some way to prompt them to be filled. Maybe one for a follow-up.

On Pack*, we'd io.Copy data from the provided io.Reader into the file, then write the number of bytes into CompressedSize. This allows for efficient serialization, and if you don't read from a particular reader, the unpack-then-pack cycle is easy.

If you do read the values, you have to remember to replace the reader with one that has the content you read, which isn't totally ideal. A rewindable reader of some kind would paper over that, but maybe overcomplicate matters.

What do you think to the idea? Would you be interested in a PR implementing this behaviour, or is it too much of a change to how struc operates?

lunixbochs commented 4 years ago

You should be able to do this with a custom field type without any struc modifications, if you’re confident you will only ever unpack a valid ReaderAt + Seeker.

At this point in the custom type: https://github.com/lunixbochs/struc/blob/master/custom_test.go#L31

You can maybe cast the io.Reader to an io.ReadSeeker here, Seek(0) to get the current position (https://stackoverflow.com/a/10901436), then convert the stream into a SectionReader at the current position with the field size and store it on the field, then seek forward by the field size on the ReadSeeker (so the parser will advance without actually reading the bytes).

Then when you want to read the field, have a Get() []byte method that just ioutil.ReadAll()s the field’s SectionReader.

You should also consider what this means for Packing. I’m assuming the trivial implementation of Pack for this field type would just be a deliberate panic.