sahib / brig

File synchronization on top of ipfs with git like interface & web based UI
https://brig.readthedocs.io
GNU Affero General Public License v3.0
568 stars 33 forks source link

feature: disable encryption for some files #48

Closed sahib closed 3 years ago

sahib commented 3 years ago

There should be a way to disable encrpytion for a selected amount of files. Details TBD - maybe disable it per directory. Encryption should be opt-out, not opt-in though.

evgmik commented 3 years ago

We can use extended filesystem attributes to relay this information to fuse layer. It sounds like a good idea to use this system internally to control encryption, compression, and other file statuses which we keep in the node structures.

On file system it is quite easy:

setfattr -n user.brig.encryption -v 1 testfile
setfattr -n user.brig.compression -v 1 testfile
getfattr testfile

But we would need to add it to the fuse layer.

sahib commented 3 years ago

I kinda like the idea, but I see two problems:

I think it's a good idea to make this available via fuse, but not as sole way. There should be an accompanying command in the CLI, i.e. something like that:

# `brig hint` would be a new command.
# Here it would cause the file to use the new algorithm on the next write.
$ brig hint encryption-algorithm none some-file.txt

# Here we set it for a complete directory.
# All files that will be created or modified in this directory will have the new algorithm
$ brig hint encryption-algorithm none /public

# If we allow this, this would disable encryption for all files.
# We should probably put a big warning sign around that one.
$ brig hint encryption-algorithm none /

Same could be done for compression. This logic would then hook in here:

https://github.com/sahib/brig/blob/45100d1358faa8e678f236a7ebe66279361d38a8/catfs/fs.go#L1014

Instead of guessing the encryption/compression algos it should look up if there are any hints for it. If there aren't we can continue guessing. Once that is possible it should be internally possible to just add a hint when a FUSE users sets the xattrs you proposed. If we also want to re-encode immediately, we also could offer a command like this:

# Only takes the brig path to re-encode:
$ brig stage --re-encode /some/path
evgmik commented 3 years ago

I fear we would need to redesign more: if I understand correctly backend blob has information about compression and encryption algorithm. I.e. the original content is 'contaminated' with this header.

The main use of unencrypted files (besides a bit faster io) would be to share direct link to them as IPFS hash. But this means that we need to strip info from the header of the stream. We can keep this info it in the node info structure, where it is probably belong anyway.

sahib commented 3 years ago

The main use of unencrypted files (besides a bit faster io) would be to share direct link to them as IPFS hash. But this means that we need to strip info from the header of the stream. We can keep this info it in the node info structure, where it is probably belong anyway.

Is it? When you want to share files via IPFS, shouldn't you just add them to IPFS and share them that way? Can you demonstrate a use case where somebody needs to retrieve file through ipfs cat and not through brig?

I fear we would need to redesign more:

I don't think so.

if I understand correctly backend blob has information about compression and encryption algorithm. I.e. the original content is 'contaminated' with this header.

Not only the header is encoded in the backend blob, but also the whole container format that we need to understand compressed and encrypted streams. Without it we would not be able to read the stream anymore, that's similar to what video container formats are to video data.

Sure, for unencrypted and uncompressed streams those helpers are not really needed, but giving a guarantee to the outside that unencrypted & uncompressed (and only those!) would be readable via ipfs cat <hash> limits us on how we encode the data internally. What if we decide to do our own chunking and that the backend blob only contains an index file of the chunks? Or if we implement some sort of packfiles?

evgmik commented 3 years ago

The main use of unencrypted files (besides a bit faster io) would be to share direct link to them as IPFS hash. But this means that we need to strip info from the header of the stream. We can keep this info it in the node info structure, where it is probably belong anyway.

Is it? When you want to share files via IPFS, shouldn't you just add them to IPFS and share them that way? Can you demonstrate a use case where somebody needs to retrieve file through ipfs cat and not through brig?

Sure.

  1. Suppose you want to share a file with a colleague using an unsupported OS.
  2. You may say that the above colleague should use brig web gateway but then it would depend if you machine is online or not. If it the file is IPFS backed, one can use any public gateways to get the file.
  3. There are or will be other tools to share content with IPFS, this way we do not force a stream unwrapping on the user. It would be a pain to run an utility just to cut a few header bytes if the rest is the same (in unencrypted and uncompressed case)

Not only the header is encoded in the backend blob, but also the whole container format that we need to understand compressed and encrypted streams. Without it we would not be able to read the stream anymore, that's similar to what video container formats are to video data.

Sure, for unencrypted and uncompressed streams those helpers are not really needed, but giving a guarantee to the outside that unencrypted & uncompressed (and only those!) would be readable via ipfs cat <hash> limits us on how we encode the data internally. What if we decide to do our own chunking and that the backend blob only contains an index file of the chunks? Or if we implement some sort of packfiles?

Here is my arguments to keep encryption and compression info in the node metadata

  1. See item 3 above benefits: multiple ipfs or brig interfaces to the same file content
  2. To decrypt we have to look at the node metadata anyway to obtain the key. We can store the encryption and compression algorithms info there as well. If we do our own chunking then still it can be stored in metadata (besides there is no upper limit on how much info is there, while header is limited).
sahib commented 3 years ago

Hmm, after sleeping over that a bit, I think it's a nice property to have. Though that will obviously only work for files that are not compressed or encrypted, which might lead to a few surprises on user side. They likely won't intuitively understand the reason why some files get outputted by ipfs cat as garbage and some correctly. But that's more of a UI/UX problem I guess...

Still, we should do encryption as opt-out feature, where user explicitly wishes to use unencrypted files.

Here is my arguments to keep encryption and compression info in the node metadata

You might have a misunderstanding here. There's not just a header at the start of the file, but brig implements a complete container format that allows us random access in encrypted and compressed streams. Also the header is designed in a way that allows it to be extended. From my point of view, all info that is required to read the stream should be in the stream1. This way we also guarantee that this information does not get out of sync and files can be recovered also outside of brig. The only info that should be in the node is if the stream in question is one with a header or not.

1 Well, encryption key excluded. That is a special case for obvious reasons. :smile:


So here's what we need to do in order to support this:

All in all, this is not a critical task for me and also counts as nice-to-have. Before we tackle this we should implement the hint system described above to control what files get encrypted/compressed and which not.

evgmik commented 3 years ago

I totally agree that default should be encrypt.

You might have a misunderstanding here. There's not just a header at the start of the file, but brig implements a complete container format that allows us random access in encrypted and compressed streams.

Yes, I was unaware of the container feature.

Shall, we move it? Roadmap 0.6 -> 0.7

sahib commented 3 years ago

Implemented by #90.