openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.69k stars 1.76k forks source link

ZFS Crypto support #494

Closed FransUrbo closed 7 years ago

FransUrbo commented 12 years ago

As of ZFS Pool Version 30, there is support for encryption. This part is unfortunatly closed source, so an opensource implementation would be required. That means it would probably not be compatible with the Solaris version 'but who cares' :).

Illumos is apparently working on this at https://www.illumos.org/projects/zfs-crypto. Source repository can be found at https://bitbucket.org/buffyg/illumos-zfs-crypto. Unfortunatly there is no changes since the fork from illumos-gate. Should ZoL start thinking about this or should we just take the back seat?

Don't know how big of a problem this would be, but 'copying' the way that LUKS (Linux Unified Key Setup) do it seems to be a good place to start.

koraa commented 9 years ago

Since this seems to be going into the direction of an own implementation, I'd like to add two considerations (or rather feature requests) for a future cryptography subsystem in ZoL; I hope this is not misplaced in this thread :)

The first is the ability to have per-volume keys; I am using ZFS on my laptop and I do a lot of work for different organizations and projects. Among the data I store for working on those projects are quite a few cryptographic keys and a lot of source code. It would be quite a relieve to only expose those projects when I am actually working on them.

The second is the ability to create and receive encrypted zfs streams without providing the key to those streams. So for instance I could replicate my entire ZFS pool to my backup pool; encrypted. The backup pool wouldn't have to have any knowledge about the crypto keys and thus would pose a much smaller security risk than it does now. Yet when I lost my laptop I could simply mount and decrypt my backed up pools on the backup server and have instant access to my data.

FransUrbo commented 9 years ago

@koraa The Solaris implementation already do that. Any implementation MUST (IMO) have at least the exact same functionality and command line options etc as the "original" one.

I kinda had preferred the option to have a global pool key, but I ended up reusing the key for all the volumes. Not optimal, but in the end it gave the same result.

ryao commented 9 years ago

Implementing an Illumos compatible kernel crypto API by mapping the Linux functions to Illumos functions is not possible because of GPL symbol exports. The SPL does not touch any of those so that it cannot be claimed that ZoL is circumventing them. What lundman implemented can only work by violating that. Consequently, the project cannot use it. That said, we can always port the Illumos cryptographic routines. Our checksum calculations are currently inefficient because we use a generic sha256 implementation in place of the Illumis cryptographic API, so porting it should be done for more than just encryption support.

ryao commented 9 years ago

It should also be said that the bulk up of the work is independent of implementing the Illumos crypto API, so whether it is used or not will not make this be done much faster or slower.

tcaputi commented 8 years ago

I have briefly looked at the Illumos crypto implementation (with the intention of porting it to a Linux kernel module) and it obviously involves quite a bit of code which would be a daunting task to port. It also seems to me that large pieces of infrastructure of a full port might go unused since ZoL would only need aes-ccm and aes-gcm to have feature parity with Solaris. As far as hardware encryption providers go, I'm not sure we will be able to use those without the EXPORT_SYMBOL_GPL() functions from the Linux Crypto API unless someone is willing and able to write a hardware module to plug into our port.

That said, I looked at the ZFS crypto implementation from Open Solaris and noticed that they only seem to use 6 functions from their crypto API (crypto_encrypt(), crypto_digest(), crypto_decrypt(), crypto_create_ctx_template(), crypto_destroy_ctx_template(), and crypto_mech2id()). Perhaps we could put together a simpler implementation that exposes these calls, allowing us to expedite ZoL encryption. If someone in the future wanted to take the time to port the entire layer over it could be dropped in as a replacement. Thoughts, anyone?

CMCDragonkai commented 8 years ago

I just wanted clarification on this, when ZFS is applied on top of LUKS, aren't things encrypted multiple times if there are multiple block devices being used? For example a mirrored ZFS with mirrored ZIL and mirrored L2ARC, each partition/block would be encrypted separately even if it's the same data being encrypted. If this is true, then it sounds a like a significant disadvantage of putting ZFS on top of LUKS.

kernelOfTruth commented 8 years ago

It would be the same with other filesystems, wouldn't it ?

CMCDragonkai commented 8 years ago

It would be if it was (insert fs here) on top of LUKS, but LUKS on top of LVM might be able to avoid duplicate encryption.

kernelOfTruth commented 8 years ago

Oh, wasn't aware that LVM supported creating volumes across the basis of underlying several physically different devices underneath

In that case, definitely

I was thinking about physically different devices (and that LVM applies only to those) and contemplating how that even could be possible ;)

To your (previous) question: yes, that's quite some overhead

Well, the crypto support of ZFS isn't there yet and there are several ways to speed up luks (pcrypt, AVX, AVX2, SSSE3), so that's the only way for now - if I'm not missing something ...

tcaputi commented 8 years ago

I have done some looking into Solaris 11's implementation of zfs encryption to try and match features. However, I found some issues that are making me think we might want to have our own business logic for key management. Below are a few examples.

For the uninitiated, wrapping keys are cryptographic keys used to encrypt the actual data encryption keys on disk. The wrapping key can be given as raw bytes or derived from a passphrase via a pbkdf2 (password based key derivation) function. That key can then be used to access the actual keys from the disk so data can be read and written. Having a wrapping key provides the ability to change the key used to access the disk without reencrypting the entire dataset. Below, where referenced, the 'key' is the wrapping key for the dataset. A dataset can have multiple encryption keys, stored in an on disk structure called a keychain.

inconsistent key dependency rules

root@tom-solaris:~# zfs create -o encryption=on rpool/test # create a dataset with encryption on
Enter passphrase for 'rpool/test': 
Enter again: 
root@tom-solaris:~# zfs create rpool/test/data # create child dataset. no prompt for passphrase, inherited from parent
root@tom-solaris:~# zfs key -u rpool/test # unload key from parent
root@tom-solaris:~# zfs get keystatus rpool/test/data # child key still available
NAME             PROPERTY   VALUE      SOURCE
rpool/test/data  keystatus  available  -
root@tom-solaris:~# 
root@tom-solaris:~# 
root@tom-solaris:~# 
root@tom-solaris:~# zfs key -u rpool/test/data # unload child key, both keys are now unloaded
root@tom-solaris:~# zfs key -l rpool/test/data # attempt to reload child key, fails because parent isnt loaded
cannot load key for 'rpool/test/data': zfs key -l rpool/test required.
root@tom-solaris:~# 

all encryption types take 32 byte raw keys (even 128 bit and 192 bit variants)

root@tom-solaris:~# zfs create -o encryption=aes-128-ccm -o keysource=raw,prompt rpool/test
aaaaaaaaaaaaaaaa
cannot create 'rpool/test': key incorrect
root@tom-solaris:~# zfs create -o encryption=aes-128-ccm -o keysource=raw,prompt rpool/test
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
root@tom-solaris:~# 

key inherited despite different encryption key lengths

root@tom-solaris:~# zfs create -o encryption=aes-128-ccm rpool/test # create dataset with what should be a 128 bit key
Enter passphrase for 'rpool/test': 
Enter again: 
root@tom-solaris:~# zfs get salt rpool/test # salt is used as parameter of pbkdf2 function used to derive a wrapping key from a passphrase
NAME        PROPERTY  VALUE  SOURCE
rpool/test  salt      14.4E  local
root@tom-solaris:~# zfs create -o encryption=aes-256-ccm rpool/test/data # create a child dataset with 256 bit key
root@tom-solaris:~# zfs get salt rpool/test/data # dataset somehow inherits the same key (which should be 128 bits too short). No salt for new dataset.
NAME             PROPERTY  VALUE  SOURCE
rpool/test/data  salt      0      -
root@tom-solaris:~# 

This goes against their docs:

"When a file system's keysource property values identifies passphrase, then the wrapping key is derived from the passphrase using PKCS#5 PBKD2 and a per file system randomly generated salt. This means that the same passphrase generates a different wrapping key if used on descendent file systems."

If I was to come up with my own set of rules for key inheritence, I would do this:

Do these rules make sense or should there be different rules instead?

7Z0t99 commented 8 years ago

Of course I don't know what Solaris really does, but what you describe would make sense to me if the wrapping key was always 256 bits (=32 bytes) long - independent of the encryption key. So a new child dataset could inherit the wrapping key while a new encryption key is generated as specified by -o encryption, with the wrapping key never being weaker/shorter than the encryption key.

tcaputi commented 8 years ago

That makes sense for the second example and from looking at the grub sources it seems like it might be true. Perhaps, then, the first and third rule I proposed should be modified so that only specifying a local keysource on a child will cause you to become an "encryption root". If the other rules still make sense (and nobody has any other thoughts) I will use them.

7Z0t99 commented 8 years ago

Assuming you modify rules 1 to 3 as you said, that's exactly what Solaris is doing, right? Then rule 4 fixes what you called inconsistent key dependency rules. What do @ryao, @behlendorf or @FransUrbo think about that?

Anyway, I wasn't aware of how much you have been working on that at https://github.com/tcaputi/zfs Thanks! I don't know how much feedback you have got but I see that you have already ported the Illumos Crypto. You should probably file a pull request for that once you feel comfortable (at least one as an RFC) ;-)

lundman commented 8 years ago

The ZIO part of crypto is pretty straight forward and not so hard to do, if you want a basic dmu layer encryption, which I think ahrens discussed at one point. The tricky part of the Solaris stuff is the keychain, as mentioned, and the re-keying ability, inheritance etc.

Personally, since we will not be pool version=30 compatible anyway, I would suggest we steer away from Solaris's crypto, and just do an OpenZFS version.

At the moment, people are mostly surviving on LUKS, geli or CoreStorage to make crypted devices for ZFS to use, which (I assume) uses mostly static keys, without re-keying support. Perhaps as a first implementation, this would be acceptable?

Is CCM-AES and GCM-AES still the best options to use? Is splitting the chksum (SHA256) in half, and using upper bits for digest, still desirable? Although, I do like the idea that you can scrub a pool without needing the keys.

tcaputi commented 8 years ago

@7Z0t99 I think you are correct that the new rules would be the same as Solaris. As for a PR I was originally going to submit the whole thing as a PR when it was done. The one issue is that the crypto functions are not yet usable in userspace. It will take a bit of work to make that happen because ZoL's userspace implementations of some kernel structs are not quite compatible with the SPL's. I currently just have some function stubs so that libzpool will compile without complaining. When it's ready do you know how I would submit it as an RFC?

@lundman I already have the keystore logic mostly implemented. At this point, once I figure out an established set of rules for inheritence and cloning I should be able to implement them fairly quickly. The keystore already supports loading, unloading, rewrapping, and adding encryption keys to keychains. The keychains themselves are able to be associated with multiple datasets and handle all of the reference counting that comes with that. It is also integrated enough with the rest of ZFS so that datasets cannot be mounted or opened while the keychain for it isn't loaded.

Is CCM-AES and GCM-AES still the best options to use?

As far as encryption algorithms go, I have ported AES as a cipher and CBC, CCM, CTR, ECB, and GCM modes from Illumos. I feel comfortable saying these algorithms are correct and secure because I didn't change any of the encryption logic. I can port over another cipher from Illumos if we decide that's best, but I'm not sure I feel comfortable writing the algorithms themselves since I am not an expert in that field. As far as modes go I think CCM and GCM are good to start with and we can add other algorithms later if we want. They are also the only authenticated encryption modes that Illumos offers.

Is splitting the chksum (SHA256) in half, and using upper bits for digest, still desirable?

I personally am OK with doing this because the MAC and checksum are essentially designed to do the same thing, verify that the data is written and read correctly. To me a bigger pain point is the IV that Solaris stores in DVA[2] of the blkptr_t. This reduces the amount of replication the user can have. We could eat into the padding of blkptr_t, since after all its there for "future use", but I would want to make sure that its ok to use half of it before proceeding. I have not gotten that far in the code yet, but I'm getting there pretty quickly (Its next on my list after I'm happy with the keystore.)

lundman commented 8 years ago

@tcaputi I already have the keystore logic mostly implemented.

I looked over some of the work, and it's clearly different to the Solaris 11 sources, so that is all great news. Not that you need my approval :)

tcaputi commented 8 years ago

@lundman @7Z0t99 I appreciate the encouragement. Thank you very much.

behlendorf commented 8 years ago

@tcaputi first off thank you for tackling this project! The latest set of rules you've proposed for key management seem reasonable to me, so I'd say go with them for now.

and just do an OpenZFS version.

Definitely, you may want to draw some inspiration from what was done on Solaris but there's no compelling reason in attempting to be compatible. It far more desirable that this is done in a way which is workable for the other OpenZFS implementations. So I think it's good that your building this on the Illumos crypto infrastructure.

To me a bigger pain point is the IV that Solaris stores in DVA[2] of the blkptr_t.

It may be worth using the reserved padding for this but we'll want to talk that over with the rest of the OpenZFS developers. I can see what they decided to do this since in practice very few blocks end up using that 3rd DVA. However, it's a shame to remove that optional redundancy and it probably ends up complicating the code.

As for a PR I was originally going to submit the whole thing as a PR when it was done.

When you have something you're happy with definitely open a PR against the project. That'll make it easy to pull in additional reviewers from all the platforms and allow for people to test it if they're so inclined.

tcaputi commented 8 years ago

@behlendorf Thanks for the input.

It may be worth using the reserved padding for this but we'll want to talk that over with the rest of the OpenZFS developers.

How should we go about starting that conversation? I am hoping to have the keystore management finished by the end of the weekend and after that the next step is to get the blocks encrypted. I can work on a few other things in the meantime (like getting libzpool to build without placeholders), but that decision would block a lot of the zio layer work for me.

richardelling commented 8 years ago

FYI, Alex @ Nexenta explained how they were using the 3rd DVA for write performance enhancement.

My vote is that the DVA should contain DVAs, the hack from Sun was an unfortunate consequence of left hand vs right hand.

ryao commented 8 years ago

@CMCDragonkai Yes, it would be encrypted multiple times if you put LUKS under each vdev. Using LVM to avoid it should have the same reliability disadvantages as concatenation while MD RAID would have performance issues. Until we actually have encryption, I am a fan of using ecryptfs on top of ZFS, which avoids the issues of an encryption block device shim.

@tcaputi The more I learn about Solaris' ZFS encryption, the more I think shipping it that way was a mistake. I agree with you on not implementing v30 compatibility. Canabalizing the DVAs was definitely bad and we should avoid it if possible. It would be fine consume padding in a development branch before a call on what is actually to be done is made by the larger community, especially since then others can see at least 1 way of doing things in action before making the call.

As for the rules, I think we need to clarify how inheriting descendants behave. Specifically, lets say that we have a dataset named tank/encrypted. If we do zfs create tank/encrypted/child, tank/encrypted/child presumably would also be encrypted. If we do zfs rename tank/unencrypted tank/encrypted/unencrypted, we do not want tank/encrypted/unencrypted to inherit encryption because then only new writes would be encrypted when people would expect total encryption. I think encryption should only be configurable/inheritable at creation like volblocksize on zvols.

Since we are not going to have any illusions of implementing v30 compatibility, I suggest that we disallow anything weaker than 256-bit AES to partially future proof the encryption used.

Also, you might want to read the zfs_encrypt(1M) man page for some additional inspiration:

https://docs.oracle.com/cd/E26502_01/html/E29031/zfs-encrypt-1m.html#scrolltoc

One thing of interest there is how clones are handled in terms of the source snapshots' keys are used. My initial instinct is that rekeying should always be done, but the key would need to be loaded into memory for the old data anyway and we would be losing the ability to efficiently deduplicate changes, so we might as well rekey during the clone operation only if requested via -K, which is what Solaris's man page specifies.

It is also probably worth noting that deduplication would likely still work even if two different datasets' data are stored with different keys, but not in the manner in which most people would consider to constitute working. Specifically, the checksums of the encrypted blocks would be checked and if by some coincidence two different plaintext records encrypted by two different keys produce the same ciphertext, they would deduplicate, with the rate of deduplication being correspondingly low.

With that in mind, we could opt to use a different deduplication table for each key for efficiency purposes. We already have different deduplication tables out of necessity for different checksums (at least in illumos-gate), so making them differ based on the checksum and key would not be radically different from what we do now.

That said, I am happy to see someone taking this up. :)

ryao commented 8 years ago

A couple more thoughts that might be helpful.

When changing the wrapping key, we should do some sort of log in the pool (not ZIL) so that we make certain that we explicitly zero or implicitly overwrite the original keychain after a certain number of transaction group commits in the future. That way transaction rollback on import is still possible should something go wrong. When that number is reached, we could do a check to see if the key is in the spacemap. The number should be a tunable. We would need to handle something like 2 transaction groups or less as a special case though as I think that is about how many transaction groups that must occur before the code will allow reallocation.

When unloading a key, we should also zero the memory before freeing it so that it is not leaked to userspace via a memory allocation. We should also write a shutdown script to unload as many keys as possible during the shutdown/reboot process so that it is not accidentally leaked to userspace via memory allocations either.

tcaputi commented 8 years ago

@ryao

As for the rules, I think we need to clarify how inheriting descendants behave. Specifically, lets say that we have a dataset named tank/encrypted. If we do zfs create tank/encrypted/child, tank/encrypted/child presumably would also be encrypted. If we do zfs rename tank/unencrypted tank/encrypted/unencrypted, we do not want tank/encrypted/unencrypted to inherit encryption because then only new writes would be encrypted when people would expect total encryption. I think encryption should only be configurable/inheritable at creation like volblocksize on zvols.

I agree that rename should not add encryption. Currently I have create implemented so that you can't create an unencrypted child of a encrypted parent. I think the best solution here is to simply do the same for renaming.

Since we are not going to have any illusions of implementing compatibility, lets disallow anything weaker than 256-bit AES to partially future proof the encryption used.

Thats fine by me; all I'd have to do is delete the 192 and 128 bit crypt types from my table of supported encryption algorithms.

Also, you might want to read the zfs_encrypt(1M) man page for some additional inspiration:

I read the man page and currently have (still very much untested) support for all of those commands except clone -K which should be easy to add, since I already have the logic for zfs key -K.

Regarding dedup, I found this blog post saying what they did. I was going to implement it like that, but dedup support is still a ways down the road for me: https://blogs.oracle.com/darren/entry/compress_encrypt_checksum_deduplicate_with

ryao commented 8 years ago

@tcaputi That blog post appears to be exactly what I just described. :)

Also, you should disallow data from encrypted datasets in L2ARC, which I believe was done in Solaris. You should also disable logbias=latency on encrypted datasets such that only logbias=throughput is used.

I could see someone trying to modify ZIL to store the datablock encrypted after the ZIL record so that we could implement logbias=latency on encrypted datasets. In that case, it would not make sense to compress the data block with the ZIL record although we could compress it before encryption.

However, my gut feeling is that trying to encrypt a variable sized block might weaken the encryption based on my vague recollection of hearing that vulnerabilities can happen when you fail to pad things out to a power of 2, so it might be better left unencrypted.

You also will want to handle embedded_data and the ZoL xattr=sa extension. I suspect you could handle them by disabling embedded_data and disallowing xattr=sa on encrypted datasets. Whether they could be handled by encryption should depend on the answer to whether encrypting unusual sizes weakens the encryption much like the case with ZIL. That is a question that I would prefer someone better versed in encryption than me answered.

tcaputi commented 8 years ago

@ryao I was actually planning on adding encryption to the L2 ARC if possible. I believe it is possible and not incredibly difficult (given the quick look into it I did). For right now I am concentrating on getting regular data encryption done before the ZIL and L2 ARC.

This blog post (https://blogs.oracle.com/darren/entry/zfs_encryption_what_is_on) actually talks a bit about how both ZIL encryption and SA attributes are stored in Solaris. They actually mention that encryption of SA attributes are necessary to encrypt file metadata:

ZPL metadata is normally contained in the bonusbuf area of a dnode_phys_t but the dnode is in the clear on disk. For encrypted datasets the bonusbuf is always empty and the content normally have been there is pushed out to an encrypted "spill" block, called System Attribtue block. Normally for ZPL filesystems spill blocks are only used for files with large ACLs.

To your other point regarding ZIL encryption:

That said, my gut feeling is that trying to encrypt a variable sized block might weaken the encryption based on my vague recollection of hearing that vulnerabilities can happen when you fail to pad things out to a power of 2, so it might be better left unencrypted.

I don't believe this will be a problem since, if I read the ZFS docs correctly, ZFS data blocks aren't even guaranteed to be powers of 2 in size. I believe any multiple of 512 up to 128k is acceptable. I can look to confirm this when I get there.

ryao commented 8 years ago

@tcaputi The encrypted data is not stored in ARC, so you would need to encrypt on write out and decrypt on read in. That would need some testing to determine if it is worthwhile.

As for the write block size, writes occur in multiples of 2^ashift. They are padded when less than that.

That said, you can disregard my earlier concerns. The attacks involving padding (or lack of padding) appear to involve network communication, so we should be okay:

https://en.wikipedia.org/wiki/Padding_oracle_attack

sempervictus commented 8 years ago

@ryao: doesnt necessarily involve network communication, i think this can be achieved by stepping through the crypto routine in a debugger and essentially brute forcing padding values until we are able to step through to the next operation of the decryption/validation procedure - presuming this implementation of the cipher uses the same semantic as the one described in the article.

@tcaputi: hat's off to you for this work, its been long needed and will help ZFS adoption the world over. I'll start adding this to our build stacks for testbeds. If you have any specific stress tests you want run, please ask. As far as removing weaker ciphers, there are still plenty of chips sans crypto accelerators out there, so might be worth while to leave them in with a corresponding note in the man pages.

tcaputi commented 8 years ago

@sempervictus Thanks. When its ready the biggest test that will need to be run is just a comparison of encrypted performance using each of the supported ciphers vs a dataset without encryption. If we could do this for small random and sequential writes I'll be happy. I should have access to some hardware myself that I will use when I start getting actual encryption up and running. Im hoping to have basic data encryption done by the end of next week at the latest.

sempervictus commented 8 years ago

We have a bunch of disk layouts available at the datacenter, and i can hold a system in testing for a bit since our production stack seems happy with what it's got today. Far as the benchmarks go, we can whip up a simple set of FIO tests and ZFS creation/teardown functions in bash to do this, since we'll have quite a few permutations to go through. We also have facilities for logging the application and block tiers to ELK (as well as some basic charting), so should be able to produce some pretty pictures when you're ready to start.

By the way, DKMS build failed on the first-tier trusty VM, though i'm pretty sure its because i just merged your branch into a fork of our latest test set:

No rule to make target `/var/lib/dkms/zfs/0.6.5/build/module/icp/api/kcf_cipher.o', needed by `/var/lib/dkms/zfs/0.6.5/build/module/icp/icp.o'.  Stop

Thanks again, i'm sure we'll all be keeping a watchful eye on this thread.

tcaputi commented 8 years ago

@sempervictus I'm not sure why it didn't build, but it isn't anywhere near ready yet, anyway. Give me a week or 2 and I (hopefully) will have something ready. I will post my progress here (as long people are interested). As a side note, it looks like that DMKS error could be being caused by a Makefile not being present. I might have missed one that the gitignore excluded. I'll look into that.

sempervictus commented 8 years ago

Definitely interested, so please keep posting and pushing commits. As a side-note, does anyone on this thread have experience writing PAM modules? We'll likely need something akin to the one ecryptfs has in order to allow for encrypted home directories and farming out the key-acquisition process.

7Z0t99 commented 8 years ago

@tcaputi At https://github.com/zfsonlinux/zfs/pulls, there is a button "New pull request" @ryao @sempervictus I'm not convinced that encryption should only be configurable at creation of a dataset. Especially turning on encryption later would be cool and save users a lot of time, but turning on should encrypt existing data of course. I would guess most of the infrastructure to do that already needs to be in place to implement rekeying.

I'm not sure how dedup should be implmented w.r.t encryption: if the ciphertext of two blocks match, their IVs and MACs will be different and you may want to store both. That's one reason why I'm skeptic about stuffing MAC and truncated checksum into the checksum field. Solaris seems to do something different. The blog post says: So I found a method of generating the IV such that within the "clone family" we will get dedup hits for the same plaintext That means for one plaintext, the entropy in the IV needs to be small - otherwise they wouldn't get the hits. Does that make the implementation vulnerable against attacks?

Regarding padding, IIRC the reason we are should be ok that CCM and GCM mode are strong against those attacks, but CBC mode is vulnerable

7Z0t99 commented 8 years ago

As we I don't intend to be compatible with Solaris, I would change the parameters for PBKDF2 from SHA1 & 1000 iterations (according to the GRUB sources) to SHA512 and 5000 which are the parameters current linux distros use to store passwords.

tcaputi commented 8 years ago

@7Z0t99

At https://github.com/zfsonlinux/zfs/pulls, there is a button "New pull request"

Ok great. I thought there might be some kind of procedure it has to go through first.

I'm not convinced that encryption should only be configurable at creation of a dataset. Especially turning on encryption later would be cool and save users a lot of time, but turning on should encrypt existing data of course. I would guess most of the infrastructure to do that already needs to be in place to implement rekeying.

My one thought here is that if we are already going to be rewriting everything in the dataset, this kind of use case could be accomplished with a zfs send | zfs receive where the receiving dataset has encryption turned on. Unless my research is wrong, the same limitation exists with compression in zfs. That said, if we do decide its a feature that should be in the kernel itself there should be an on-disk iterator that keeps track of where we are in the encrypted rewrite so that in case of power failure we can continue encrypting the old data. Perhaps a feature like that (that works for compression, dedup, encryption, etc.) could be done in a later PR as a separate feature.

I'm not sure how dedup should be implmented w.r.t encryption: if the ciphertext of two blocks match, their IVs and MACs will be different and you may want to store both. That's one reason why I'm skeptic about stuffing MAC and truncated checksum into the checksum field. Solaris seems to do something different. The blog post says: So I found a method of generating the IV such that within the "clone family" we will get dedup hits for the same plaintext

On this, Oracle's blog post has this to say:

If dedup=on for the dataset the per block IVs are generated differently. They are generated by taking an HMAC-SHA256 of the plaintext and using the left most 96 bits of that as the IV. The key used for the HMAC-SHA256 is different to the one used by AES for the data encryption, but is stored (wrapped) in the same keychain entry, just like the data encryption key a new one is generated when doing a 'zfs key -K '. Obviously we couldn't calculate this IV when doing a read so it has to be stored.

Doing that would take a bit or work on the keychain, but should be very doablle if we decide to go the same route.

That means for one plaintext, the entropy in the IV needs to be small - otherwise they wouldn't get the hits. Does that make the implementation vulnerable against attacks?

If my reserach is correct, the only requirement for the IV is that it should be unique. It doesnt not seem to be necessary for it to have high entropy.

As we I don't intend to be compatible with Solaris, I would change the parameters for PBKDF2 from SHA1 & 1000 iterations (according to the GRUB sources) to SHA512 and 5000 which are the parameters current linux distros use to store passwords

A coworker of mine thought we could also pick a random number of iterations (between a nominal min and max value) and store that on disk to be used as a parameter, like the salt. His thought was that this could impede someone from making a rainbow table of common passwords with a set number of iterations. I'm not sure how much security this adds, however, since IIRC the salt is usually the main tool against creating rainbow tables.

ryao commented 8 years ago

I'm not convinced that encryption should only be configurable at creation of a dataset. Especially turning on encryption later would be cool and save users a lot of time, but turning on should encrypt existing data of course. I would guess most of the infrastructure to do that already needs to be in place to implement rekeying.

@7Z0t99 That requires block pointer rewrite if there are snapshots. It is not feasible to do as part of an initial encryption implementation.

Also, the "clone family" is just the clones using the same keys as their parent snapshots with no rekeying. Additionally, the snapshots implicitly use the same keys as their parents (at the time of snapshot) and cannot rekey.

7Z0t99 commented 8 years ago

@tcaputi You are right, the IVs need to be unique. Using an HMAC when dedup=on is really clever.

I don't think the randomization of the number of iterations is that helpful. The two things that make the scheme secure are a high number of iterations and the salt.

@ryao Ugh, so turning on encryption later and full rekeying are infeasible for the moment.

ilovezfs commented 8 years ago

If you cannot rekey, it sounds like pool destruction becomes the only option. Wouldn't people be best served sticking to disk-level encryption unless they never plan to want or need to change keys, or they like regularly rebuilding pools from scratch? I guess at minimum offline block-pointer rewrite would be a prerequisite to making this approach to crypto viable.

If a key is compromised, would there be any way to securely destroy the subset of datasets using that key or is that also not possible?

tcaputi commented 8 years ago

@ilovezfs

If you cannot rekey, it sounds like pool destruction becomes the only option. Wouldn't people be best served sticking to disk-level encryption unless they never plan to want or need to change keys, or they like regularly rebuilding pools from scratch?

The wrapping keys are always changable. The only keys that are permanent are the actual encryption keys which will only exist unwrapped in the kernel's memory. Finding these would require access to the machine's raw RAM while the keys were loaded.

If a key is compromised, would there be any way to securely destroy the subset of datasets using that key or is that also not possible?

I can work on adding this after all the encryption stuff is done, if there isn't an existing way to do this now (which I don't believe there is).

numinit commented 8 years ago

@tcaputi thank you for your work on this!

/subscribe

grahamperrin commented 8 years ago

@ryao wrote:

@tcaputi The encrypted data is not stored in ARC, so you would need to encrypt on write out and decrypt on read in. That would need some testing to determine if it is worthwhile. …

Whilst I can't offer to test on Linux (sorry), I reckon that it will be a worthwhile default – at least where there's hardware acceleration with AES-NI;

@pyavdr wrote:

… processors with AES-NI should be supported to gain optimal performance.

– and from the Design Approach section of Apple's Best Practices for Deploying FileVault (2012-08-17):

… benefit from optimizations in architectural design, cryptographic algorithms, and use of hardware acceleration when available. The algorithm used for block encryption … has been optimized for 512-byte blocks. The conversion from plaintext to ciphertext and back is performed on the fly with negligible impact to the user experience since it relinquishes processing cycles to user activity. Software optimizations are further accelerated on systems with Intel Core i5 and i7 processors with hardware acceleration using the built-in AES-NI instruction set. …

More obliquely, that reckoning (of worthwhile by default) is based on long term personal experience with USB flash drives for removable L2ARC; e.g. Aaron Toponce : ZFS Administration, Appendix B- Using USB Drives (2013-05-09) but without Linux.


From https://github.com/zfsonlinux/zfs/issues/494#issuecomment-180097439 above and other recent comments it seems clear that entirely cross-platform compatibility (to include ZFS in Solaris) is no longer an ideal. Clear, understandable.

Multi-platform compatibility remains highly desirable. I'll promote that thought and (without sharing details here) in the PC-BSD area I can think of at least one issue that may be a good vehicle for promotion.


Now is as good a time as any to throw in two additional resources –

– the former is new to me. Speeding through that, and the recent comments here in GitHub, I sense that things here are progressing in all the right directions. The discussion of wrapping keys is particularly welcome.

Thanks @tcaputi and all.

tcaputi commented 8 years ago

Quick update: While implementing the rules we decided on above, I found that my implementation was leading to the same bugs / inconsistencies that I outlined earlier. I realized that the existing keystore was not going be able to reliably support the new rules so I rewrote it. The new design is much better for our purposes and should be able to support the same operations as Solaris with more predictability (excluding any bugs that might exist).

The last 2 things that need to be done for the keystore to be functionally complete are kernel-side parameter checks and support for encrypted cloning. The keystore mechanisms to support cloning exist; they just need to be hooked into the existing zfs code. I will make another update when these are implemented, but I figured I should post what I've been doing since its been a few days. After that I will work on the data encryption.

ouaibe commented 8 years ago

@tcaputi Regarding other encryption algorithms, supporting Chacha20-Poly1305 could be a viable alternative for an authenticated cipher, it's present in SSH, getting pushed into TLS. Some sample code can be obtained from the OpenSSH implementation http://bxr.su/OpenBSD/usr.bin/ssh/cipher-chachapoly.c Otherwise, there's NORX that could be considered : https://github.com/norx/norx

Anyhow thanks for taking the time to look at this.

mcr-ksh commented 8 years ago

hence we are talking about crypto on storage I strongly emphasis a cipher and checksum that is hardware accelerated via AES-NI.

CK

El 9 feb 2016, a las 13:00, Pierre-Dªvid notifications@github.com escribió:

@tcaputi https://github.com/tcaputi Regarding other encryption algorithms, supporting Chacha20-Poly1305 could be a viable alternative for an authenticated cipher, it's present in SSH, getting pushed into TLS so might be more future-proof. Some sample code can be obtained from the OpenSSH implementation http://bxr.su/OpenBSD/usr.bin/ssh/cipher-chachapoly.c http://bxr.su/OpenBSD/usr.bin/ssh/cipher-chachapoly.c Otherwise, there's NORX that could be considered : https://github.com/norx/norx https://github.com/norx/norx Anyhow thanks for taking the time to look at this.

— Reply to this email directly or view it on GitHub https://github.com/zfsonlinux/zfs/issues/494#issuecomment-181836436.

ouaibe commented 8 years ago

@mcr-ksh Yes you're right, that's a valid concern, AES-NI backed ciphers should definitely be prioritized.

tcaputi commented 8 years ago

@ouaibe Once I am finished with the keystore and the zio layer I can implement any other algorithms we might be interested in and we can do some performance benchmarking then. It will probably be a while before I am at the stage where working on that makes sense.

tcaputi commented 8 years ago

@sempervictus I just had a coworker install my branch of zfs using ./autogen.sh; ./configure; make; make install and everything built. How were you doing the dkms build so I can look into making sure that works?

sempervictus commented 8 years ago

Pardon my lag, on engagement on the other side of the country. I believe I'm using the git-buildpackage wrapper. I'll refresh and rebuild this weekend, we can always address the minutia like this later - core functionality is far more important to implement and push for review. Regarding safe destruction of data which has compromised keys, that's a hard one given that we have no guarantee that our substrate hasn't been snapped, respects SCSI commands properly, etc. That aside, I'd suggest we force delete ops to write garbage to freed blocks immediately after free on datasets with crypto before adding the sector to the free space map for reuse. Rekey and late stage crypto both really call for new writes, so we can't encrypt existing data, just rewrite it enciphered one way or another. Last I checked, bprw is less probable than my riding a pegasus, so the send/recv/free+overwrite approach with place holders sounds rational. Enabling crypto would basically make an initially hidden subset (DS or ZVOL), perform the new writes, take the original DS offline and rename the new one to it, send final delta, then "clean up" the original. Regarding ciphers it looks like we will have large degrees of extensibility here, which is great as orgs with certain requirements for non-portability could add custom ciphers pretty quick, well played. Far as performance goes, I'd say the aesni stuff is a must, but can't be our only approach since zfs is supposed to work on all sorts of crazy chips. We might even want to look back to work from this summer on AVX and SSE accel (raidz calcs, would link, but writing on phone) for anything we can't do with the crypto coprocessor. Lastly for slog and L2arc, dm-crypt should be just fine on initial impl as we only need fast blocks for those components, not individually keyed dataset bound block sets (dataset/zvol). If your enemy can forcibly read blocks out of l2arc devices, they're root, and will extract your keys from ram anyway. Register-only clear text keys a la tresor come to mind, but that may be very painful for performance. /rant On Feb 10, 2016 10:33 AM, "Tom Caputi" notifications@github.com wrote:

@sempervictus https://github.com/sempervictus I just had a coworker install my branch of zfs using ./autogen.sh; ./configure; make; make install and everything built. How were you doing the dkms build so I can look into making sure that works?

— Reply to this email directly or view it on GitHub https://github.com/zfsonlinux/zfs/issues/494#issuecomment-182518776.

tcaputi commented 8 years ago

@sempervictus That all sounds reasonable to me. My one remaining concern for encrypted on-disk data is navigating all the different kinds of zio reads and writes and how they will relate to encryption (ex gang blocks, embeded blocks, etc.). I also need to learn the SA layer so that I can force all ZPL attributes to a spill block for encrypted datasets. I will be figuring that all out over the next week or so (hopefully). Right now I am fixing up the PR i submitted for comments with the illumos crypto port and keystore implemented. Turns out I did not have debugging enabled and I did not notice the make chekstyle command. With any luck I should have that all ready for people to look at by the end of the night.

tcaputi commented 8 years ago

I know there are a lot of changes in the PR, but most of them are the Illumos Crypto Port. I would greatly appreciate any comments on the files I changed and added in module/zfs/. These changes are fairly independent of the ICP, except the few places where it calls the ICP's api. In the meantime, I plan on creating a branch of my fork and starting work on the data encryption aspect.

tcaputi commented 8 years ago

@all: I've been looking into the SA layer to figure out how to force objects to use spill blocks instead of bonus buffers so that the data that would normally go into the bonus buffers (such as ZPL attributes) get encrypted. I noticed this (seemingly too convenient) flag in sa_os_t named sa_force_spill. This seems like it is exactly what I'm looking for, and I see that various code in the SA layer checks this flag. The only problem is that I don't see anywhere that the flag is actually set. Can anyone confirm that this flag is what I am looking for and that it doesn't need any extra work? @behlendorf a quick git blame shows that the flag was added as part of one of your commits labeled "Update core ZFS code from build 121 to build 141".