Closed sahib closed 3 years ago
We can use extended filesystem attributes to relay this information to fuse
layer.
It sounds like a good idea to use this system internally to control encryption, compression, and other file statuses which we keep in the node
structures.
On file system it is quite easy:
setfattr -n user.brig.encryption -v 1 testfile
setfattr -n user.brig.compression -v 1 testfile
getfattr testfile
But we would need to add it to the fuse layer.
I kinda like the idea, but I see two problems:
I think it's a good idea to make this available via fuse, but not as sole way. There should be an accompanying command in the CLI, i.e. something like that:
# `brig hint` would be a new command.
# Here it would cause the file to use the new algorithm on the next write.
$ brig hint encryption-algorithm none some-file.txt
# Here we set it for a complete directory.
# All files that will be created or modified in this directory will have the new algorithm
$ brig hint encryption-algorithm none /public
# If we allow this, this would disable encryption for all files.
# We should probably put a big warning sign around that one.
$ brig hint encryption-algorithm none /
Same could be done for compression. This logic would then hook in here:
https://github.com/sahib/brig/blob/45100d1358faa8e678f236a7ebe66279361d38a8/catfs/fs.go#L1014
Instead of guessing the encryption/compression algos it should look up if there are any hints for it. If there aren't we can continue guessing. Once that is possible it should be internally possible to just add a hint when a FUSE users sets the xattrs you proposed. If we also want to re-encode immediately, we also could offer a command like this:
# Only takes the brig path to re-encode:
$ brig stage --re-encode /some/path
I fear we would need to redesign more: if I understand correctly backend blob has information about compression and encryption algorithm. I.e. the original content is 'contaminated' with this header.
The main use of unencrypted files (besides a bit faster io) would be to share direct link to them as IPFS hash. But this means that we need to strip info from the header of the stream. We can keep this info it in the node info structure, where it is probably belong anyway.
The main use of unencrypted files (besides a bit faster io) would be to share direct link to them as IPFS hash. But this means that we need to strip info from the header of the stream. We can keep this info it in the node info structure, where it is probably belong anyway.
Is it? When you want to share files via IPFS, shouldn't you just add them to IPFS and share them that way?
Can you demonstrate a use case where somebody needs to retrieve file through ipfs cat
and not through brig
?
I fear we would need to redesign more:
I don't think so.
if I understand correctly backend blob has information about compression and encryption algorithm. I.e. the original content is 'contaminated' with this header.
Not only the header is encoded in the backend blob, but also the whole container format that we need to understand compressed and encrypted streams. Without it we would not be able to read the stream anymore, that's similar to what video container formats are to video data.
Sure, for unencrypted and uncompressed streams those helpers are not really needed, but giving a guarantee to the outside that unencrypted & uncompressed (and only those!) would be readable via ipfs cat <hash>
limits us on how we encode the data internally. What if we decide to do our own chunking and that the backend blob only contains an index file of the chunks? Or if we implement some sort of packfiles?
The main use of unencrypted files (besides a bit faster io) would be to share direct link to them as IPFS hash. But this means that we need to strip info from the header of the stream. We can keep this info it in the node info structure, where it is probably belong anyway.
Is it? When you want to share files via IPFS, shouldn't you just add them to IPFS and share them that way? Can you demonstrate a use case where somebody needs to retrieve file through
ipfs cat
and not throughbrig
?
Sure.
brig
web gateway but then
it would depend if you machine is online or not. If it the file is IPFS backed, one
can use any public gateways to get the file.Not only the header is encoded in the backend blob, but also the whole container format that we need to understand compressed and encrypted streams. Without it we would not be able to read the stream anymore, that's similar to what video container formats are to video data.
Sure, for unencrypted and uncompressed streams those helpers are not really needed, but giving a guarantee to the outside that unencrypted & uncompressed (and only those!) would be readable via
ipfs cat <hash>
limits us on how we encode the data internally. What if we decide to do our own chunking and that the backend blob only contains an index file of the chunks? Or if we implement some sort of packfiles?
Here is my arguments to keep encryption and compression info in the node metadata
Hmm, after sleeping over that a bit, I think it's a nice property to have. Though that will obviously only work for files that are not compressed or encrypted, which might lead to a few surprises on user side. They likely won't intuitively understand the reason why some files get outputted by ipfs cat
as garbage and some correctly. But that's more of a UI/UX problem I guess...
Still, we should do encryption as opt-out feature, where user explicitly wishes to use unencrypted files.
Here is my arguments to keep encryption and compression info in the node metadata
You might have a misunderstanding here. There's not just a header at the start of the file, but brig
implements a complete container format that allows us random access in encrypted and compressed streams. Also the header is designed in a way that allows it to be extended. From my point of view, all info that is required to read the stream should be in the stream1. This way we also guarantee that this information does not get out of sync and files can be recovered also outside of brig. The only info that should be in the node is if the stream in question is one with a header or not.
1 Well, encryption key excluded. That is a special case for obvious reasons. :smile:
So here's what we need to do in order to support this:
nodes.File
: IsRaw
: If true, the stream does not have a header, if false further info can be found in the stream header. fs.Cat()
should know about this field and use it accordingly.mio.Stream
should learn that a file can be either compressed or encrypted or both. This needs the information to be encoded in the encryption header. This needs a small extension of the header.All in all, this is not a critical task for me and also counts as nice-to-have. Before we tackle this we should implement the hint
system described above to control what files get encrypted/compressed and which not.
I totally agree that default should be encrypt
.
You might have a misunderstanding here. There's not just a header at the start of the file, but brig implements a complete container format that allows us random access in encrypted and compressed streams.
Yes, I was unaware of the container feature.
Shall, we move it? Roadmap 0.6 -> 0.7
Implemented by #90.
There should be a way to disable encrpytion for a selected amount of files. Details TBD - maybe disable it per directory. Encryption should be opt-out, not opt-in though.