isamert / scli

a simple terminal user interface for signal messenger (using signal-cli)
GNU General Public License v3.0
439 stars 40 forks source link

Using GPG to encrypt plaintext history and attachments #169

Closed mikeevmm closed 2 years ago

mikeevmm commented 2 years ago

Hello,

First of all, thank you for scli. The official app is unusable for me (can't start up most of the time, and will eat memory until the PC crashes), and scli filled that hole really elegantly. Really like the way the app is organized, too, in terms of the bindings and the on-screen navigation.

The one thing I was missing, however, was the history. I know that scli has support for persistent history, but it felt silly to use Signal, and then leak my contacts' messages onto a plain text file.

This PR attempts to fix that somewhat, by decrypting/encrypting the relevant history and attachment files at start-up/quit, using Gnu Privacy Guard.

This is done in a Bash script, currently (scli.sh), which leaves scli untouched, but (maybe?) reduces portability. I think it would be possible to integrate the behaviour into scli, but would like some feedback.

exquo commented 2 years ago

Nice! Many will find this script very useful! We can add it under a contrib dir.

I'm planning to add a pair of options to scli: --storage-encrypt-cmd and --storage-decrypt-cmd to make this more general.

A few general notes:

Not encrypting a program's data is not necessarily problematic - that really depends on your threat model. Using a full disk encryption is qualitatively more secure than encrypting individual files. That is not to say that there are no advantages from encrypting individual files in addition to that.

Signal-desktop does not encrypt its storage (or more like the encryption key is in the same dir that the data): [1], [2], [3]. Those threads also have some discussions of potential benefits of encrypting the stored data on the level of the app.

Signal-cli also does not encrypt its data. I see you have added it to the script, nice! There might be an issue though, if scli is not the only program that uses signal-cli. Also, the script only encrypts the data dir. There are also received attachments and contacts' avatars, but those can be large and slow up encryption (and subsequently program startup) by quite a bit.

Scli's log file (if enabled) can include sensitive data as well.

Notes on the script:

The decrypted history file seems to be stored on the disk while scli is running. This can probably be improved. Maybe by deleting the file when detecting that scli has started (perhaps pgrep signal-cli can be used as a proxy). It would be even better to make the history file into a named pipe. That way unencrypted data never touches the disk. But this might turn out to be difficult. These are just suggestions, I'm happy to merge the PR without this.

The history.bak file should be preserved (included into the encrypted archive), in case the main history gets corrupted (e.g. during a scli crash).

Related to the possibility of crashes / errors: good call on using trap rather than just appending encrypt() at the end of the script! More triggers can be added: SIGHUP, SIGINT, SIGTERM, (maybe others?). Also, I think the trap should be put above decrypt, like in the first commit. What if decrypt fails part-way? The encrypt function should be robust enough to handle the data on the disk in whatever state it is.

In tar commands, there should be no need to add compression with -z, since gpg adds another layer of compression by default. Piping tar .. | gpg .. is more efficient than creating temporary files. Had that approach been problematic (hence the last commit)?

It would make reading the script easier if the stuff in SOURCE=${BASH_SOURCE[0]} … was placed in a separate function (e.g. find_script_path()), and/or there was a link to the discussion of it (https://stackoverflow.com/a/246128). Also, since the script will be in contrib dir (one level deeper than the scli executable), the path will need to be adjusted accordingly.


Thanks again!

mikeevmm commented 2 years ago

Thanks for the attentive reply!

There might be an issue though, if scli is not the only program that uses signal-cli.

That is true, I hadn't considered it. However, I don't know what would be the best course of action in this case. We could encrypt only the number that scli uses, through, for example, a symlink (left hanging when the folder is encrypted), but that doesn't solve the general situation (and is kind of inelegant). In any case, I'm not sufficiently aware of what signal-cli is doing to propose something; I just noticed that when I linked my phone, a folder with the corresponding number appeared in data/, and assumed that all relevant data was in there. :)

Scli's log file (if enabled) can include sensitive data as well.

The history.bak file should be preserved (included into the encrypted archive), in case the main history gets corrupted (e.g. during a scli crash).

In tar commands, there should be no need to add compression with -z

It would make reading the script easier if the stuff in SOURCE=${BASH_SOURCE[0]} … was placed in a separate function (e.g. find_script_path()), and/or there was a link to the discussion of it (https://stackoverflow.com/a/246128).

Duly noted, I will amend the script when I have the chance. (Re. the last one: whoops, I really should have put some attribution there. It's just that I use that snippet a lot, and ended up copying it around from other scripts.)

It would be even better to make the history file into a named pipe.

This sounds reasonable, I'll try it and report back.

Also, I think the trap should be put above decrypt, like in the first commit.

Actually, I (think I) was doing that; in the course of the d89ce90 fix I swapped them around trying to debug things. This also relates to

Piping tar .. | gpg .. is more efficient than creating temporary files. Had that approach been problematic (hence the last commit)?

Yep! The tars ended up empty for some reason. I never figured out if it was some misuse on my part, or what, but I only really got it working once I fully packed the tars onto disk and then gpg'd them. I'll revisit this.

exquo commented 2 years ago

Regarding encrypting the signal-cli data, one approach would be to make it "opt-in" for the users (e.g. write the code, but execute it only if users set some flag like ENCRYPT_SIGNAL_CLI_DIR). A script like this should serve as a starting point for the users to modify as they wish, rather than a fits-all solution. At least the users should read / glance through it.

I use that snippet a lot, and ended up copying it around from other scripts

That's what I assumed :). I initially figured out what it's doing, searched for an easier way of accomplishing the same thing, and found the SO question. I bet it's in lots of people's scripts, with that amount of question activity. BTW, how complicated this simple task is emblematic of the perils of shell scripting..

I only really got it working once I fully packed the tars onto disk and then gpg'd them. I'll revisit this.

👍

exquo commented 2 years ago

Also, it would be instructive to provide some examples of what kind of threats this solution would (and would not) protect against (in the script's comments, readme or elsewhere).

For instance, for a user with an encrypted home directory, who keeps scli running most of the time, if the script keeps files decrypted while scli is running, then the additional encryption does not seem to further improve security.

mikeevmm commented 2 years ago

In my particular case, I have a shared tilde in the cloud, and wanted to use scli from there. I didn't want my signal account compromised in case the shared tilde got compromised (which is more likely when it's a public facing server).

Is it advisable to be doing this in the first place? Probably not.

exquo commented 2 years ago

Is it advisable to be doing this in the first place? Probably not.

:) Signal-cli's data must be on the disk decrypted while it's running. It includes private keys which is all that's needed to compromise a signal account. The cloud service's administrators either can or can not see your data (and they can, if they want to). But performing additional encryption would not change that. Storing the data on a public-facing server makes it that much more of a concern.

Why not just run it on your personal computer? You need one to access the VPS anyway.

mikeevmm commented 2 years ago

Point taken. Another, probably more reasonable, situation is using scli on a work computer: encryption is only a concern while you're away from the computer, and you may not have control over whether the drive is encrypted.

exquo commented 2 years ago

That's a good example of a common situation. In this case however, it would be preferable to use an established, well-tested solution like fscrypt.

mikeevmm commented 2 years ago

I feel like you're making a strong case for why this shouldn't exist, specifically due to a false sense of security. Wouldn't it be better to add a section to the README outlining the comments above (in particular how signal-desktop stores the key alongside the database)?

exquo commented 2 years ago

That seems to be the effect so far :). But I don't have a pre-formed opinion. In fact, I was planning on adding some support for history encryption to scli. Now that I think about it more, however, I don't yet see a usage scenario where some other alternative doesn't offer a better security. "False sense of security" is key, exactly - we certainly shouldn't promote it. I think we should try to identify the potential threats and recommend the appropriate solutions (in a note in the readme or somewhere similar, yes).

So far, I can think of:

There are other potential scenarios, of course. If scli is running only a small percentage of the time, it might make sense to have a script doing decryption - run - encryption (whatever tool is used for doing cryptography). It all depends on the threat model. So there might well be a place for a script like this, which we could include and advise on when to use.

Most users' first reaction is probably: "Adding encryption is always better!". But the details matter. People should be making informed decisions based on their particular situations. We can mention signal-desktop's absence of encryption; not to justify our own, but as a general FYI. More important is why they have made this choice (not exactly something they have at length explained). But their rationale very likely applies to scli as well.

I've also started a discussion on this topic in signal-cli's repo. I'm no security expert, so hopefully someone over there can weigh in.

mikeevmm commented 2 years ago

Closing this PR for now, as I'm convinced that doing encryption on the chat data would only lead to a false of security. (I trust Signal's team's threat modelling more than mine.) If it turns out to make sense, you are welcome to re-open this PR.

exquo commented 2 years ago

I have added the relevant info to the README and linked to this discussion there.

Thanks again for bringing this up! It has helped to focus the thinking on the threat model and the issues with adding storage encryption to scli itself.