maidsafe / self_encryption

file self encryptor
Other
118 stars 70 forks source link

Debug self-encryption of small files #385

Closed bochaco closed 1 month ago

bochaco commented 3 months ago

Currently, trying to call self encryption on a small file leads to undesirable results. We should just have 1 chunk + data map for a small file, instead we get 3 chunks. + data map. It is from this function get_num_chunks inside self encryption.

Also, earlier we had a method to encrypt small files separately, but now it is removed, maybe that is the cause? Anyways, move that functionality into self-encryption crate.

Also add logging to self encryption.

bochaco commented 3 months ago

@RolandSherwin how exactly do you envision this to work, when you say small files what's the range of sizes you actually mean here? is it between 3 MIN_CHUNK_SIZE and 3 MAX_CHUNK_SIZE ? if so, for a file of size let's say 2 * MAX_CHUNK_SIZE we'll need to generate datamap + 2 chunks, right? and that'd be a different mechanism as SE needs at least three chunks to work...

happybeing commented 3 months ago

I don't know if this is a good place to raise this, but it would be nice to have an API for self encrypting a block of memory. Currently it only allows self encryption of a file on disk and its not good to force data to be written to disk just for this purpose.

RolandSherwin commented 3 months ago

Hey @bochaco. Right, I see! Also, we were having troubles when actually trying to upload those extra chunks. We fetched store cost + paid + sent over chunk+payment, but they were never accepted. I can try to reproduce it locally.

happybeing commented 3 months ago

I can try to reproduce it locally.

These may be two separate bugs.

Firstly uploading individual small files does work (ie safe files upload -p <FILE>).

But uploading a directory (ie safe files upload -p <DIRECTORY>) fails with payments never being enough, and wallet gets drained (logs posted on forum for Qi here).

bochaco commented 3 months ago

After digging in, the issue with small files ending up in 1 chunk (plus the data-map chunk) is due to the fact that files that are smaller than 3 * MAX_CHUNK_SIZE are split into 3 chunks, thus if the content of the file is by chance something like "hello\nhello\nhello\n" it ends up being split into three equal chunks: "hello\n". So that seems all ok to me, at least, as long as we are ok with dealing that way with small files as mentioned above.

happybeing commented 3 months ago

@bochaco That sounds like the issue riddim isolated as he was just adding the same text to a file to see what happened to the upload cost.

It doesn't explain the issue I was experiencing and reported to Qi Ma. Is anyone looking into that?

mickvandijke commented 1 month ago

Closing this because the behaviour is as intended.