mitsuhiko / similar

A high level diffing library for rust based on diffs
https://insta.rs/similar
Apache License 2.0
974 stars 32 forks source link

Suitable for large binary files? #67

Closed sgmihai closed 2 months ago

sgmihai commented 3 months ago

I wan to make a patch file to convert an .mkv file that I remuxed from .m2ts back to the original .m2ts Such files are 20-70GB large. Would similar work well on these ? The data, video audio streams, is mostly identical, but their distance increases between the two files the further you go forward, because of the mpeg stream redundant packets of data. I tried with xdelta3 and got good results up to a point where the window size was too small for it to keep detecting the identical parts, and no way to increase window size beyond 2GB.

If you think similar could work in this scenario, please give me some sample code on how to best use it. Is there no cli tool based on this at all ? Thanks.

mitsuhiko commented 2 months ago

You should be able to use it. Just use the underling capturing diff functions on the byte vectors. See this example here for instance: https://github.com/mitsuhiko/similar/blob/main/examples/nonstring.rs