microsoft / igvm

MIT License
87 stars 18 forks source link

igvm: deduplicate file data on serialization #44

Closed chris-oo closed 5 months ago

chris-oo commented 5 months ago

Change the write_binary_data function to use a serializer that deduplicates data already seen before and have serialized headers refer to previous offsets. This saves significant binary file space on some usages, such as if data was loaded multiple times at different GPAs across different compatibility masks.

This is implemented by keeping a lookup table of data seen before. This does make an additional copy of the data imported, but since this is only used today in applications such as build tooling, this extra memory usage seems okay for now.

This is a breaking change and requires a rev to 0.2.x version.