oll3 / bita

Differential file synchronization over http
https://crates.io/crates/bita
MIT License
263 stars 9 forks source link

FR: Embed `.bita` metadata into files themselves #49

Open Azathothas opened 1 week ago

Azathothas commented 1 week ago

Hi, this probably is a dumb question and I am misunderstanding what this tool is supposed to do.... So I apologize in advance..

I was trying to use bita as a zsync replacement for AppImages. For appimages, the .zsync information is embedded into the appimage, without breaking the appimage itself, and a .zsync file is also generated.

But bita, currently creates a compressed archive which breaks the appimage or any elf/binary file. If .bita metadata could be embedded into original files, without breaking them (using a random section in the elf file or some other tricks), the use case of bita would be improved significantly.

Is such a feature in scope of your project? If not, no worries, this is still an amazing tool. Thanks for making it FOSS

rminderhoud commented 1 week ago

Bita creates a special archive of blocks to use for its syncing, there's no need to modify the original file and you could offer both at the same download location, e.g.

| - myfile
| - myfile.bita

I don't remember zsync needing to embed data in the original file, that must be some custom functionality to app image. An advantage of zsync is you don't need to create a separate archive, you can just create the zsync meta file and leave your original file in place. However for zsync you need both the meta file and the original file since the meta file only stores checksums and no actual data. Although I believe there's an option to compress files so it's easier to use for zsync.

Bita is more complex in that regards in that the archive has all the chunked data and the original file is not needed. The advantage of this is that it can do more advanced things like compress each chunk, deduplication, etc.

It might be worth experimenting with using uncompressed bita archives and generating a chunk dictionary from that. Maybe it would be possible to use an external chunk dictionary and use the original file as a source of chunks? Would need experimenting for sure, it's not something I've tried yet personally.

caesay commented 1 week ago

For AppImage / zsync, my understanding is that you embed the location of your remote zsync files into the AppImage (eg. http://my.website/updates). There is then support inside the various appimage libraries to automatically check this location for zsync files and download/apply updates.

When creating zsync files, the library will read your existing AppImage, and create a new .zsync file containing hashes for it - without modifying the AppImage. In fact you can create a zsync of any file on your filesystem in the same way. For AppImage's, shipping as two separate files (AppImage and .zsync) does make some sense, because being able to ship MyApp.AppImage works well for people downloading the first time, and MyApp.zsync has all the information in it needed to do a diff/delta sync for people updating.

Bita does not "break the appimage or any elf/binary file", it's just something different. Bita takes your input file and creates a new compressed archive, with delta/diff information embedded. You could use this for AppImages just as easily as any other compressed archive, you'd just need to decompress/extract it after downloading.

oll3 commented 1 week ago

Hi @Azathothas,

As @rminderhoud said, after compressing a file with bita the metadata is contained together with the compressed and de-duplicated file data, all in the same archive. This to make distribution simple. Eg. a single file to ship and validate, no risk of mixing the wrong metadata and source file. And having a custom archive format enables bita to do further compression and de-duplication on the source file.

The obvious drawback with this approach is that you can't make any use this archive without bita.

You can probably use/modify the bita library to create a separate metadata file and do what I think you're asking for but I'm guessing it won't provide much advantage to using zsync.

Hope this answers your questions!

Azathothas commented 1 week ago

@caesay

Bita does not "break the appimage or any elf/binary file"

Hello, yes, sorry for miscommunication. I meant to say, the compressed archive bita created. This was stupid of me.

You could use this for AppImages just as easily as any other compressed archive, you'd just need to decompress/extract it after downloading.

steam.AppImage.bita was generated with: bita compress -i steam.AppImage -f steam.AppImage.bita

$ ls
-rw-r--r-- 1 runner runner 340M Oct 28 18:16 steam.AppImage
-rw-r--r-- 1 runner runner 340M Oct 29 08:15 steam.AppImage.bita
-rw-r--r-- 1 runner runner 340M Oct 28 08:47 steam.AppImage.old

As you can see the file size is same since AppImages are already compressed. So to make these work with bita, the user/client would need to have bita on system to decompress it back to original, but he still would have downloaded the full file.... (since there was no saving in size) Another option, Is to offer both regular file & file.bita but as you can see, the size would be 2X for every file.

For AppImage's, shipping as two separate files (AppImage and .zsync) does make some sense

it would also make sense for other formats that are compressed already... A simple dictionary only file like .zsync would be immensely useful...

@oll3

You can probably use/modify the bita library to create a separate metadata file and do what I think you're asking for but I'm guessing it won't provide much advantage to using zsync.

Yes, this is what I wanted. It would still be better than the now aging & unmaintained zsync and would make bita almost a drop in replacement....

If you ever add this feature, do comment on the issue.. Otherwise, I think since this is not in scope of this project, this issue could be considered resolved/closed. Thank you!

oll3 commented 1 week ago

As you can see the file size is same since AppImages are already compressed. So to make these work with bita, the user/client would need to have bita on system to decompress it back to original, but he still would have downloaded the full file....

Bita would download the full file if you have no previous version of that file locally to use as seed. When you have a file/files to use as seed it will only download what it can't find locally. Same as with zsync.

Though, as you seem to be aware, bita does not care about the format of the file its compressing and hence does not understand that it's already compressed. Working with uncompressed files will typically be beneficial for bita, since compressions will "obfuscate" the original file, and bita may have a harder time recognizing chunk boundaries. Hence it will find less data to reuse. However it should still work.

zsync on the other hand support for looking into files and understanding (deflate based, gzip etc) compression hence might be more efficient when you're working on some compressed files.

I think this could possibly be an interesting feature for bita though the compression algorithms in the wild are many so limiting to deflate might not cut it today, I don't know. This is not something I'm in need for and it seems out of the scope of bita today. But I'd be open to discuss it if anyone would like to experiment with it.

(since there was no saving in size) Another option, Is to offer both regular file & file.bita but as you can see, the size would be 2X for every file.

Yes, if you need to distribute both the the original compressed AppImage and the bita image it will typically use twice the disk size.

If you ever add this feature, do comment on the issue.. Otherwise, I think since this is not in scope of this project, this issue could be considered resolved/closed.

It's not anything I plan to implement at the moment but I'd be open to discuss it further if anyone wants develop a feature like this.