Ability to lazily pack/unpack data

Hey,

I'm handling some legacy game data using struc, and it's awesome :) - here's an example of usage: https://code.ur.gs/lupine/ordoor/src/branch/master/internal/maps/maps.go

Being able to represent the whole file (1.4MiB for the map) in a single struct is very pleasing :thumbsup:. I ran into a couple of things that already have pre-existing issues, but I found workarounds, and overall I'm really happy with how the conversion from encoding/binary turned out for this code.

Now I'm hoping to convert some code that processes image data. The most extreme example I have is a 400MiB file of animations, each of which is RLE-encoded. It's currently processed with encoding/binary and a lot of special-case code like this: https://code.ur.gs/lupine/ordoor/src/branch/master/internal/data/object.go

Even with modern computers, holding the whole 400MiB file in RAM at once isn't great - my low-end personal laptop has 2GiB RAM, and if I read + decode the whole file, it might take up 1GiB all by itself, when I'm only interested in a tiny amount of the total. The original game had system requirements of 16MiB RAM :sweat_smile:.

To convert this file to struc, I need to be able to specify that the data should be lazily read, which is what happens at the moment with the LoadObjectLazily method.

What do you think to an approach like:

type Sprite struct {
    // ... various header fields
    CompressedSize int `struc:"uint32,sizeof=Data"`
    // More headers
   Data io.Reader `struc:"lazy"`
}

Using it would look like:

sprite := Sprite{}
f, _ := os.Open("file.obj")
_ = struc.UnpackWithOrder(f, &sprite, binary.LittleEndian)
sprite.Data = newStreamingRleReader(sprite.Data)

The way I see this working is that Unpack* starts to take a ReadSeeker instead of a Reader. It can use that to build objects that will seek-then-read the data when used, and populate the struct with them. For backward compatibility, it could continue to take a Reader, but try to upgrade to a ReadSeeker if a field like this exists.

The readers would only be usable for as long as f is valid, and it would be up to the caller to arrange that. If we never read from sprite.Data, then it's never pulled into memory.

Even more ideally, I'd be able to tell struc to automatically do the "wrap it in RLE encoding" bit, but that might be too ambitious :sweat_smile:

The lazy part could also be useful for slice members generally, although we'd need to provide some way to prompt them to be filled. Maybe one for a follow-up.

On Pack*, we'd io.Copy data from the provided io.Reader into the file, then write the number of bytes into CompressedSize. This allows for efficient serialization, and if you don't read from a particular reader, the unpack-then-pack cycle is easy.

If you do read the values, you have to remember to replace the reader with one that has the content you read, which isn't totally ideal. A rewindable reader of some kind would paper over that, but maybe overcomplicate matters.

What do you think to the idea? Would you be interested in a PR implementing this behaviour, or is it too much of a change to how struc operates?

lunixbochs / struc

Ability to lazily pack/unpack data #82