threefoldtech / 0-fs

A new filesystem for zero-os that leverage on unionfs and a thin fuse layer to get the files on demand
Apache License 2.0
0 stars 1 forks source link

Saving blocks separately instead of final file, on the backend/cache #17

Open maxux opened 5 years ago

maxux commented 5 years ago

This issue is related and created because of:

If the final file is saved after a download and the same cache is used with a new flist, if theses files are still there, they won't be updated, even if blocks have changed.

The only thing, sure to be unique and unmodified, is a block file. Each file have one or multiple block. This block contains integrity hash and encryption key.

It would make sens for me, to save each blocks (uncrypted, to avoid doing this all the time) and providing the right portion of file requested by the system, by reading blocks. Not keeping the final file.

In this way, we can always ensure the blocks integrity and if a final file have changed, this will be seen by the 0-fs, because one hash won't be there.

Cc @zaibon for follow-up.

muhamadazmy commented 5 years ago

Actually saving file blocks instead of full file makes more sense and was considered before. Except it will give very bad performance. Also the way the fuse layer work now when a file is accessed, is be passing the open file descriptor to the fuse module fully hence it works like a proxy.

If we start saving the file as blocks we will have to handle all read, and seek operations and all will have to go through the 0-fs process, reducing performance.

It still can be done though, but we will have to do some benchmarking to see how much it will affect the performance.

zaibon commented 5 years ago

That's also the feeling I have about this. Re-assembling blocks will make things way slower I think. Although @maxux had a point when he said that the syscalls to read a file give the offset and size, so we could just give the proper blocks.

But indeed benchmarking needs to be done to make sure we still get enough performance from this

muhamadazmy commented 5 years ago

This commit https://github.com/threefoldtech/0-fs/commit/19b8df07887397fde36bf6e72d4e33c42c5a8aa1 , is an improvement already on this. but it doesn't fix the issue of corrupted files (disk failure) or other causes of corruption once the file is downloaded. Hence a full cache check on boot should be implemented.

As mentioned above, the storing of the individual blocks can cause a huge impact, i can still give it a try. A check against the file hash on boot can be a better solution, but will cause a boot delay ...