ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.13k stars 3.01k forks source link

Use blockmap files to fetch data #9050

Closed RubenKelevra closed 2 years ago

RubenKelevra commented 2 years ago

Checklist

Description

I've seen the Lbry desktop app to use blockmap files and I was curious if it's technical possible to find the hashes of the blocks in ipfs - or the other way around supply a file and the blockmap file to put the data into ipfs with the same block sizes.

Couldn't find much, as usually with Microsoft, but here's some docs:

https://docs.microsoft.com/en-us/uwp/schemas/blockmapschema/app-package-block-map

And here's an example release with blockmap files:

https://github.com/lbryio/lbry-desktop/releases/tag/v0.53.4

Edit: Not entirly sure why Microsoft specifies this as XML while the lbry release uses a zip packaed json, but the idea is the same: Cutmarks, flat view on the file and somehow encoded sha256/sha384/sha512 sums per block. So as long as the blocks are smaller than 1 MB this should in theory work. :)

bertrandfalguiere commented 2 years ago

What is the hoped benefit of this approach?

RubenKelevra commented 2 years ago

Hey @bertrandfalguiere,

well the idea would be to import the blockmap file and the matching file into ipfs. The blockmap file has to be created with some knowledge about the internal data structure of the .exe/.zip/.dmg to be useful for delta updates (or they use a general purpose rolling chunker).

If the blockmap files are shared anyway ipfs could just use them too. The ipfs companion could detect them and offer to download the real file via ipfs instead of the blockmap file.

This allows to avoid sharing the CID additionally to the blockmap files, as there would be compatibility provided with other tools which can use them.

Jorropo commented 2 years ago

Sounds like a complicated UX. We could "just" make a chunker that knows how to calculate deltas (really close to the robin one but it also have access to the previous data in it's calculations).

This would have the benefit of working with the current code (you only need to update the chunker, for everything else it's just unixfs data).

Also thoses files don't use IPLD formats, likely have blocks too big with all the usual issues (slow to verify, non paralelisable, non seekable).

If someone want to work on that I don't think it's a bad idea, but IMHO adding deltas to robin sounds like a way better invesment of your time.

Jorropo commented 2 years ago

Update: We are not interested about that, if you want to implement it and send it we will probably review a plugin.

RubenKelevra commented 2 years ago

Alright ... I just came across and thought this may be interesting. But sure, if the blocks are too large this wouldn't work anyway. :)