Closed hockeymikey closed 1 month ago
Also I really miss being able to copy the password and directly save it in my password database.
I agree with everything @hockeymikey said, but just to add one more thing: There's a lot of benefit in having a "no-encryption" backup method. It's useful for applications like EteSync, Tresorit, and etc that already take care of the encryption themselves. For example, EteSync has automatic version control capabilities with de-duplication. This will not work if the data is encrypted before EteSync gets to de-duplicate and then encrypt it.
A workaround is to use encryption provided by SAF (DocumentsProvider
). For example, you can store a backup to a container encrypted with EDS Lite.
Hello everyone, I'm a pentester from Radically Open Security and have done a short security evaluation of this project as part of the NLnet funding grant.
From a security perspective, switching directly from the randomly chosen 128 bit master key that is transported in the 12-word BIP39 mnemonic to a user-chosen passphrase (+ key derivation function) would introduce a number of risks, primarily when it comes to attack resistance against offline brute-force scenarios. There are ways that counteract this at least partially, see for example the LUKS2 design and its Argon2 usage. However, it would likely require a redesign of the Seedvault file format and part of the cryptographic construction in order to implement it. It is arguably difficult to get users to consistently pick strong passphrases, particularly on mobile, which is absolutely required for the practical strength of the resulting on-disk encryption.
As for an optional mode with a no-encryption method: I can understand the general appeal of this feature. Having encrypted files is in the way of deduplication and compression technologies. However, I'm very hesitant to recommend going down this route. As it is now, Seedvault is a self-contained system that can be reasoned about and tested, but combining some core functionality with arbitrary external conditions and other tool behavior is risky if their confidentiality/integrity/availability situation is worse, which I assume it is. For example, the Nextcloud filehandling logic already leads to integrity issues with individual files, see https://github.com/seedvault-app/seedvault/issues/184 . Versioning and deduplication issues such as this, for example when they're silent to the user, may result in corruption of the backup that is much harder to detect by Seedvault since part of the responsibilities have been handed to Nextcloud, or cannot be checked without significant overhead. From discussions with @grote and public documentation on this subject, I think that Seedvault is already in a difficult position when it comes to e.g. reliably determining the integrity of backups stored on some remote medium through SAF. This is on top of other concerns for trusting the transparent encryption of third party systems, which may e.g. change the threat model in case there are malicious apps or other users that have access.
Overall, implementing these two design changes may be more difficult than it appears.
Thanks @ros-cr for clearing this up!
For the integrity Problem: Since we do have lesser and therefore larger files for Backups with Android 12, how about signing the unencrypted files and saving the key with it on the destination Storage or just keep a json file with hashes of the files. Of course this would only make sense, if you can really trust your target Storage but I think that a lot of Calyx users have their own Nextcloud or NAS device which they (can?) trust.
So I think this breaks down to the question if you want to give the responsibility to decide if a storage can be trusted to the user or not.
What @ros-cr said shouldn't speak against asymmetric cryptography though, I feel. I would love to deposit my ed25519 public key within Seedvault and encrypt my backups with that. :/
If you have resources for "more encryption methods", please count this as user feedback in favor of moving toward a totally opaque repository.
I'm a new user of SeedVault and have not looked at how it works in any meaningful detail, but I'm very old in the ways of security and cryptography. What concerns me is not the unavailability of plaintext backups, but what look like imperfections in the encrypted backups. When I get a warning that people who can see my storage can know what apps I'm using, I get an instant very strong feeling that the encryption isn't where it needs to be... which is strengthened by @ros-cr's comments about integrity. It can be surprising what you can get through even small chinks in things like that.
Ideally, a backup repository should look like a bunch of chunk files of uniform size, with all of their contents meaningless without the key, and their names disclosing nothing except maybe creation times or creation order (if that). Somebody who gets a copy of such a repository, or watches it over time, might get an idea of how much you're backing up and how much churn you have, but in a happy world that'd be all they'd get. And of course any modification to any of those files should be detected and by default rejected, as should any change in the file population other than rolling the whole repository back to an earlier valid version.
With good opacity and assured integrity, you don't have to worry about the properties of the storage backend. Among other things, that would mean you could get rid of the allowed backends list, and give users total flexibility about where they put their backups.
I also agree with @ros-cr that an unencrypted storage mode is a massive footgun. If you do provide for unencrypted storage, I'd suggest that it should be some kind of semi-hidden advanced use-at-your-own-risk option with dire warnings all over it, and you should treat choosing it as a choice to totally give up on confidentiality or integrity.
If you think most users will need deduplication, well, that's the sort of thing I'd expect a backup application to do internally.
Maybe you could get close-ish by embedding part or all of restic? Restic's repository isn't quite totally opaque, but it's close. It really only gives up snapshot metadata, not anything about the actual content being stored. I honestly have no idea whether using it that way is feasible, though.
With good opacity and assured integrity, you don't have to worry about the properties of the storage backend. Among other things, that would mean you could get rid of the allowed backends list, and give users total flexibility about where they put their backups.
How would you get assured integrity when the storage backend isn't telling you if the file you send it got actually written to storage?
As for the other points, the files backup uses chunking already and in the future we might be able to run app backups through that as well, but so far we are improving things step by step. Check out the docs on files backup: https://github.com/seedvault-app/seedvault/blob/android12/storage/doc/design.md#overview
How would you get assured integrity when the storage backend isn't telling you if the file you send it got actually written to storage?
You maintain a tip signature/MAC/whatever over the entire repository, which allows you to detect if anything is missing (or changed).
The backend can roll you back to an earlier tip, but can't arbitrarily delete or modify anything within a snapshot. That's the best you can do; if you can't be assured of knowing any state outside of the backend at restore time, then the backend can always roll you back, no matter what. And of course the backend can also just refuse to disgorge anything at all when you actually need to do a restore.
Basically assured integrity isn't the same as assured availability.
Integrity means only that if you do get something back from a backend, you know it's what you put in, in unaltered form. Assuring that is a really crypto-friendly problem.
Availability means that you always actually get back whatever you put in. In fact some people would say that availability only means that you get back something, not necessarily what you put in, but that's kind of a useless quibbly definition. You can't assure availability with cryptography, although you can use redundancy. You can also challenge a backend from time to time to make it a bit harder for it to cheat when the real need arises.
A rollback is a corner case where you think you're getting back what you put in, but are actually getting back some previous "thing that you put in". If you can keep even a little bit of state outside of the backend, you can detect that... but if the backend has all of the state other than persistent, unchanging decryption/authentication key, then you really can't.
I'll try to find time to read the link you posted. Thanks for responding.
@ros-cr Alittle late to the party with your message, but better late than never. The crux of your argument is flawed because it's based on your value system. You find these attributes valuable, but a person like myself, I do not. As a developer we must respect the wishes of the user, especially in FOSS. We aren't gods. Now the crizitism of the format is interesting and worth a discussion in rewriting as you said, but in the context of external tools its entirely invalid. It's my system (in the context of Nextcloud), I can structure it how I want and the reliability of it is greater than you believe especially depending on how a system is configured. But I want a no-encryption setup. I already run my phone that way, literally doesn't matter. My server is going to be more secure and if someone gets physical access, I have bigger issues.
As I have described in my previous comment, I am not a Seedvault developer and not permanently involved with this project. During the NLnet funded quick security evaluation in 2021, @grote has asked me to take a look at this ticket and comment on it internally and publicly. The evaluation has ended and there is no additional time for reviews available or planned.
@hockeymikey : I have the impression that you're looking for a software that is significantly different from what Seedvault is aiming to do, both in the "seed" (mnemonic seed) and "vault" (confidentiality, integrity) directions. If we ignore the security implications, is still a major development effort to implement this from what I can tell.
@ros-cr No I'm not, it's an open source backup application replacement for Google's. A seed vault is a place where seeds are backed up in the case of catastrophe. Same here, nothing specific about the security of it.
@hockeymikey well for this project we're not going to support multiple methods, all other points aside we simply do not have the resources to keep up with that.
There already is a lot of pending higher priority work to be done here for us.
Is there at least a way to make the UX better? Like maybe being able to copy-paste the words as a single passphrase? The words will be stored somewhere either way so why don't make that a little easier and simplify password store usage rather than paper note or a screenshot (which may easily end up unencrypted in cloud)?
Honestly, as a non-English-speaking user, mnemonics are extremely difficult to use, I would rather use gpg, at least gpg has its own key management tool and is more generic.
Why not write these 12 words enigma in Chinese characters? A potential attacker wouldn't be able to exploit the backup even if he had the 12-word paper.
Forcing users to use less secure methods is bad thing. I assume that most people have a file with those 12 words stored alongside their backup instead of using a password manager.
Allow the user to pick from a list of possible methods to store their data. Currently we only have the 12-words and that is really meh. I don't like it. I think the possible choices should be (but not limited) is that hardware talked about in https://github.com/stevesoltys/seedvault/issues/74, 12-words, a straight string password of any length or complexity, and no encryption. Gives the user the freedom to pick how they want it stored. The selection screen should be a simple activity like the storage provider activity is now where they pick one.