monicahq / monica

Personal CRM. Remember everything about your friends, family and business relationships.
https://beta.monicahq.com
GNU Affero General Public License v3.0
21.37k stars 2.13k forks source link

End-to-End Encryption #543

Open ajvsol opened 7 years ago

ajvsol commented 7 years ago

I was planning on moving my contacts to an end-to-end encrypted CardDAV/CalDAV server called EteSync and then I found your app. Monica has great usability and a ton of features I hadn't even considered, but unfortunately it lacks the privacy and security provided by end-to-end encryption. It would be really great if Monica had this feature also, and I think more people would be comfortable with using the hosted version then too.

aejnsn commented 7 years ago

Can you contrast your ideal implementation with usage of HTTPS (i.e., TLS with a certificate)?

djaiss commented 7 years ago

@neomodern even if you'd host it yourself on your private server?

ajvsol commented 7 years ago

@aejnsn client-side encryption, so it's decrypted by the client. Encrypted at rest, not merely during transit.

@djaiss there's a reason why end-to-end encryption is very popular now even on self-hosted services (see Matrix/Riot, Seafile, CryptPad, PrivateBin, EteSync etc) and it's because people have come to realise their data is at risk by not being encrypted at rest. And this can't be fixed through simply using full-disk encryption, as it doesn't do anything if your system is always on (i.e. cloud servers).

jdambron commented 7 years ago

Considering the high privacy of the information stored in monica, end-to-end encryption would be indeed a huge improvement both for self-hosted version and monicahq.com

djaiss commented 6 years ago

I perfectly understand the need, and the importance, of end to end encryption. As a matter of fact the first version of Monica encrypted data with Bcrypt before saving it to the database. While it wasn’t true end to end encryption, it was a first step towards having data a bit more secure than they are now.

I had to remove this encryption because it limited what I could do with MySQL. Sorting for instance wasn’t possible. Searching wasn’t possible. The only data I could manipulate were dates, ids and Boolean values.

If someone could point us in the right direction on how we could make the database safe, it would really help the project.

Perhaps I should encrypt everything again and live with the limitation, or find creative ways to achieve it. The key would be the only way to decrypt data (or brute force but I think the Hash library of Laravel is somehow solid). If someone gets access to Linode’s servers, they would have access to the key.

Can I encrypt data with a password that only the end user would know? That would remove the stress of having the database stolen. The user would provide both a password to sign in, and a different (or the same I don’t know yet) key to encrypt/decrypt client side and Monica would never know the content of the data. Drawback: if the user forgets his key, data is lost forever.

AHemlocksLie commented 6 years ago

Let me preface this by saying I'm not an expert. I'm pretty technically knowledgeable, so I feel like I'm most likely right, but not expert level, so if I say anything wrong, I'm totally open to correction.

If you want end-to-end encryption, that would mean the server would never see the unencrypted form of the data. The server would essentially function purely as storage. Everything would have to be shipped out to the client side on demand where it would be decrypted, and it would need to be encrypted on the client side before it was ever shipped back to the server. It would likely necessitate a rewrite, as a quick glance at the repository suggests just about everything is done with PHP, a server side language.

If you wanted a compromise short of end-to-end encryption and just encrypt saved data, you could encrypt it with a key the user has, something like their password, then dump all their information into memory. I'm not fond of this because users lose passwords, so this could lead to catastrophic data loss from their perspective.

The problem with doing it end-to-end is the web interface. How do you process everything on the client end? You have to move almost the entirety of the application's logic to that end. You could do it in JavaScript, I suppose. This is one of my most lacking areas, so anyone please correct me if I'm wrong, but I think JavaScript isn't super fast compared to other languages, so depending on how intensive the logic required is, it may bog down the user's browsers and lead to warnings about scripts slowing things down. Some users may kill the scripts when prompted without realizing the problems this would cause. Some will just get impatient if it's slow and complain or stop using it.

If you're going for end-to-end, you may have to rewrite it as an offline application with syncing capabilities. On the plus side, this would double team issue #531. The downside is that you absolutely shatter mobile compatibility without an app, and this definitely seems to me like the kind of thing people will want mobile access to. It seems that iOS has opened up a lot to other programming languages, so with careful planning, you might be able to get away with only writing the core logic once for desktop and mobile. Any UI components would have to be custom tailored to their environment, however, so you will likely won't be able to skip out on that work.

If you go for a desktop rewrite, I'd be willing to put in a signficant amount of work to help if you use Haskell. It's an unusual and unpopular language, I know, so I don't expect you to make that change just for me, but I really like the idea behind the project, and I've been trying to find a good Haskell project to work on, so the offer is there. Another point against it is that using a more obscure language makes it harder to find people willing to contribute, so while I'm willing to do a lot in my free time, I don't know how much gets done by outsider commiters and if it'd be worth it. Some minimal research suggests it could likely be cross compiled for iOS and maybe even Android. Would require more research and definitely careful planning, though.

johnriley commented 6 years ago

I dealt with this in a startup I worked on that required much of the same security consciousness. I am by no means an expert, but will apply the principles I've learned, which is to analyze the key threats. The significant threat, in my opinion, is the cloud service/hosting provider being hacked, so the raw database being leaked, and application vulnerabilities. For this project, in my opinion, end to end encryption does not really make sense since it would prevent a mobile app, data analysis, etc. from occurring. Encryption in transit is a no brainer.

For the application, I would recommend splitting it up into micro services. A backend API, front end web, and a database. The database and API would be "opposing", with the database being encrypted via public key, with the API being the only actor with the private key. The key would be stored as an environment variable, meaning that critical security actions would include changing the keys regularly and ensuring that the kernel of the server is kept up to date so environment variables are protected. From there, import/export, data analysis, and other features could be built as plug ins/micro services interacting with the API.

Finally, I would always recommend keeping in mind that the most critical element of this project is a vibrant developer community and consistent updates.

AHemlocksLie commented 6 years ago

End-to-end encryption wouldn't prevent those things, it would just force them to be moved to the client side. The mobile app would be more work that way, but with careful planning, I think a lot of the work could be deduplicated. For the privacy conscious, inability of the server to perform data analysis is a selling point, not a problem.

Considering how sensitive and intimate this information could potentially be, I think end-to-end encryption is definitely a worthwhile endeavor. This is information that could be used to aid identity theft if other important information about the contact is available in the wild, potentially leak security question answers, bolster spear phishing, and just reveal regular personal and maybe embarassing details. It can give a lot of information about someone's life and the lives of those around them, so it seems fully worthwhile to protect it as best as possible. I would consider it especially improtant to implement extra safeguards due to the fact that it's a self-hosted service that may be run by inexperienced users who don't know how to properly secure their machines.

johnriley commented 6 years ago

For the privacy conscious, inability of the server to perform data analysis is a selling point, not a problem.

Ha - that is a very philosophical comment. There needs to be a critical balance between commercialization, features, and privacy/security as the wrong decisions could cause the project to not do well. Often times usage is a great line of defense, since it means you're incentivized to pay attention to security.

djaiss commented 5 years ago

I think this issue is one of the most important issue that we need to tackle. Let's think about it again. The major drawbacks of encrypting data:

The benefits:

The only option that I see to implement this, is by adding a passphrase to the user object.

There are a lot of other questions but we cover the basis.

djaiss commented 5 years ago

@asbiin let's think about this.

pqhf5kd commented 5 years ago

How about the option to encrypt while accepting the loss of functionality?

ajvsol commented 5 years ago

Like @pqhf5kd I'd be fine accepting those missing features, as keeping sensitive data in a SaaS product without E2EE is a non-starter for me. This is also the model of Standard Notes, and they've been able to do search and sort in their E2EE client. It might be worth looking through their codebase for ideas.

From the aforementioned EteSync they've also released a browser extension called Signed Pages which helps ensure the scjl Javascript crypto used in their web app is as secure as any desktop app also.

djaiss commented 5 years ago

Thanks @ajvsol for your insights.

I've taken a look, it's impressive. I need to thoroughly analyze all this.

Another drawback of E2EE: impossibility to use Carddav or Caldav.

ajvsol commented 5 years ago

EteSync does end-to-end encrypted CardDAV/CalDAV so it should be possible.

st-sloth commented 5 years ago

MEGA cloud storage (while being commercial) also has end-to-end encryption, and they use a single master password which is used for client-side derivation of different authentication and encryption keys. And their sort and search is quite snappy on large amount of files. Here is their open-source webclient, might be interesting as well.

And, like other people, I would too be perfectly fine with the lack of some features, however much time it would take to implement them on top of client-side E2EE. Temporary, of even in some cases permanent, unavailability of several advanced features, should hardly be a showstopper for securing this kind of very private data :)

Though the biggest problem is perhaps the API, which would need to have SDKs with client-side encryption logic. That's sad.

nrktkt commented 5 years ago

A general +1 for this feature. I've been contemplating using Monica, but with the private nature of the content, I'd only trust it running in my living room without E2EE.

On the technical side I just have two small contributions. First, on the topic of passphrases. I agree a passphrase is probably going to be needed (a non knowledge based secret is probably not suitable here). It would be a simpler UX to re-use the login password as the passphrase; but it will mean you need to hash the login password client-side. So there is some trade-off there. I would not recommend using the passphrase to encrypt content directly. If instead you use the passphrase to wrap the content encryption key, then the user will be able to rotate their passphrase in the future without needing to re-encrypt all of their data.

Second, it's probably useful to consider the trade-offs between "field/column" level encryption and "object/row" level encryption. The former gives you the option to not encrypt certain things like IDs, timestamps, etc. while also having a bit of flexibility to change your schema; on the other hand, it's less private, since some metadata will be visible.

djaiss commented 5 years ago

After a lot of thoughts and discussions with @asbiin, it’s time that we revisit encryption.

I want to start to work on Monica v2, the new major version which will be a complete overhaul of the tool. At the heart of this change would be the encryption of the data.

I don't want to move forward anymore without encrypting the database first.

I’m no expert in encryption at all. I just want to build something that is reasonably secure and that would not be a problem in case of a data leak. I think that all systems are opened to vulnerabilities – although I think we can make it a bit harder for troublemakers to steal your data.

There are two ways of encrypting the data:

As you’ve seen in Monica, we are not great at client side work. Moreover, encrypting data in the browser is super complex and error-prone, and we are a very small team of enthusiasts who still work on Monica in our spare time without any resources to help us.

This is why I’d like to do server side encryption of data, following this procedure:

That way, data is still encrypted using a key that we don't know and don't store, and encryption is per account, so even if one user is corrupted, the damage would only be done in one account.

Drawbacks

This is not a full end 2 end encryption. There is a risk if:

In this case, the data can be read. But as far as I can think of, it would only affect one account.

Unknowns

In Monica v2 we want to support webhooks. With this approach, we would only support webhooks triggered by an action that the user has done, not by a cron. Indeed, say we have the birthday of Roger Moore Dec, the 27th. The system will know that there is a birthday on this date, therefore it could send the webhook that the birthday is happening. But how would it know, in the json sent by the webhook, what is the name of the person? Using crons we don't have the secret key that the client is sending us when he does an action in the browser.

I don't know how to deal with this.

The database

With this method, most of the fields can be encrypted in the database. Not everything though, we need some data to remain readable by the system if we want certain operations to be performed easily

Moreover, there are probably other specific fields that do not need to be encrypted. However, everything that is directly related to a contact should be encrypted.

Next steps

I have a POC already that does this workflow and it’s working fine. I even implemented search and it’s also working fine.

What do you think of this approach?

st-sloth commented 5 years ago

@djaiss thanks for working on this issue of getting user data more secure!

While still not zero-knowledge it is definitely a step in the right direction. Especially since it's always a compromise between security, functionality and the team's resources.

Though the fact that the server still knows the secret data, at least at the request time, is unsettling. The data would easily be compromised by some middleware logging the cookie or the data itself, either by malicious intent or by mistake. That is another risk. Though a malicious actor having write access to the server file system can just as well change the client side code (considering web-browser application) and eventually get access to the data even in the truly zero-knowledge encryption setting.

For the threat of a leaked database this approach seems reasonable (considering strong keys and hash function). Besides, this approach does not exclude and can be further improved by client side encryption if there would be resources for that.

As for webhooks with full data, it's just mutually exclusive, the server either knows the data at any moment or it doesn't. Perhaps a "dangerous" option to disable encryption for certain fields individually, in order to have more data in webhooks, with a graceful degradation like "it's birthday of one of your contacts"?

(Just an opinion of someone somewhat interested in the topic, not a security expert as well).

nrktkt commented 5 years ago

@djaiss it's awesome that you've committed to adding encryption. I think focus on privacy will go a long way in Monica's progress.

However, I think the approach outlined is somewhat the worst of both worlds between database encryption and end to end encryption. Hopefully I can explain without giving offense.

No encryption Database encryption End to end encryption Send key on every request
User has to enter an encryption password No No Yes Yes
User knows only they have access to their sensitive data No No Yes No
Database can filter/sort on fields Yes Possibly No No
Encryption at rest No Yes Yes Yes

Essentially you'd be only gaining the advantage of having data encrypted at rest, but making the user enter a password. Simultaneously you'd spread the encryption key to the user's data across multiple browsers and through every system that touched the request (load balancers, middleware, etc).

djaiss commented 5 years ago

Simultaneously you'd spread the encryption key to the user's data across multiple browsers

@kag0 I agree. But this is how 1Password works for instance. They force you to enter the key in every browser/app they have, and they have millions of users.

However, they keep the secret key on the clients, and never send it to the backend, which is the big problem of my proposed approach.

@st-sloth we have something like 11 000 servers running Monica right now. I think, but I might be wrong of course, that a database leak is the biggest problem we face on monicahq.com, but it’s less of a problem for those running their own version of Monica, as instance owners control everything and they probably run it on a private server.

nrktkt commented 5 years ago

But this is how 1Password works for instance. They force you to enter the key in every browser/app they have, and they have millions of users.

That's true, and I'm not suggesting that having the user enter a key is a bad thing, simply that if you do it the user should get some benefit. ie. "User knows only they have access to their sensitive data"

I agree sending they key to the backend is the downside of your approach, but maybe I'm missing the upside?

ajvsol commented 5 years ago

I prefer client-side (E2EE) encryption over server-side, because I'd like to not have to host my own server and instead just get a subscription with Monica. I can't do that if the server can still easily decrypt my data because they know the secret key. Server-side encryption is basically security theatre in my eyes, and I'd rather wait longer for a proper E2EE implementation maybe "someday" than settle for server-side.

aNullValue commented 5 years ago

I agree with @ajvsol, and want to take it a step farther: I consider adding server-side encryption to be an anti-feature. Introducing it causes the possibility to exist that someone could host Monica for others, giving those others the reasonable belief that the host could not access their Monica data, while that is not at all true (it would be fairly easy for the host to store the keys for later use). It's also not difficult to imagine a scenario whereby an adversary could compromise the Monica host instance and capture the keys for multiple users. Implementing client-side encryption renders that impossible, because the host is never aware of the client keys (neither solution can necessarily defend against an attack at a specific client, against the client software distribution, etc).

While I'm OK with the ability for people (who are informed enough to understand the potential ramifications) to run their own private instance without E2EE, I'm strongly opposed to any multi-user host being able to technically access user data that's this sensitive, regardless of small hurdles in their way.

st-sloth commented 5 years ago

If an adversary has control over the host, they can just as well slightly tweak the javascript code they host so that it sends the keys back in some obfuscated way like in a cookie or an HTTP header under the guise of some token, possibly split and spread over several requests for harder detection. And it's enough to only send the key once and then stop. And the code might be vetted and trusted one moment and the other moment updated on the server with the added leak.

Even client-side encryption is possibly unsafe if the client-side code comes from an untrusted source (or a trusted one but not known to be compromised). An ideal solution seems to be a standalone client-side application without automatic updates, with each update going through thorough security audit and only then manually installed. Until that happens, it's always settling for trust in the service provider. Or building the client-side application manually, trusting the community or doing the audit by oneself, given there is a client-side application.

In case of Monica, client-side encryption has also at least a couple more issues:

Moreover, encrypting data in the browser is super complex and error-prone, and we are a very small team of enthusiasts who still work on Monica in our spare time without any resources to help us.

And, client-side encryption breaks API calls and webhooks on the sensitive data.

Still, at-some-point-possibly-backdoored client-side encryption might be somewhat better than sending sensitive data to the server at all (even with HTTPS for transit). If Monica team does go with that, given that the UI seems to be rendered with Vue and not PHP, perhaps it would make sense to quickly tweak @djaiss's approach to add a client-side encryption layer for the sensitive fields (but thus lose searching and sorting on them, until implemented client-side); and then iterate from that?

ajvsol commented 5 years ago

Even client-side encryption is possibly unsafe if the client-side code comes from an untrusted source (or a trusted one but not known to be compromised). Actually there is a solution to this, use the Signed Pages browser extension to ensure that the code hasn't been compromised. That combined with the code being open-source solves those two potential issues and makes a web app have comparable security to a desktop app.

The other obstacles of it being difficult to do and it limiting functionality are kind of expected when trying to implement E2EE, and are not sufficient reasons to avoid securing user data in this day and age of countless leaks.

asbiin commented 4 years ago

To make Monica work, we need 3 things:

E2EE could work no problem for web server rendering, even with client-side encryption. Web server handle api too, but a key could be send with each request.

The problem is how to make the scheduler or the queue to work? Data are encrypted, to the scheduler or the queue may have access to decrypted data, or decryption key ... which make E2EE less secure, or irrelevant.

For me that's the major problem, and if someone has an idea about it, I am very interesting.

nrktkt commented 4 years ago

@asbiin could you elaborate on what the scheduler and queue need to do/what kind of jobs they need to handle?

asbiin commented 4 years ago

@kag0 the scheduler is a cron task running on a server. It must access the datas to know what reminder send. The queue handles jobs sending to it. The are commands like mail sending.

nrktkt commented 4 years ago

Reminders could be handled client-side, the task only needs to know the time and user, then the job could prompt the client to pull the encrypted list of things to be reminded about, decrypt it, and display the appropriate reminder.

Email is harder. The best we could do is letting the user know that something requires their attention and providing a link to the page where it would be displayed.

Ryonez commented 3 years ago

Could we possibly look at etesync for assistance?

They have a similar system( contact, calendar, notes) that's end-to-end encrypted. They've also released etebase, which deals with the encryption of objects.

Here's a link here: https://blog.etesync.com/introducing-etebase-an-end-to-end-encrypted-sdk-and-backend/

So while I might not be aware of all the nuances, are we able to learn from and implement things they have done?

Ian2020 commented 3 years ago

Check out the Etebase lightning talk video from this year's FOSDEM: https://fosdem.org/2021/schedule/event/etebase/. They walk through the whole thing and build an example app. They don't seem to support PHP yet tho.

Peter4487 commented 3 years ago

I love Monica but I really can't get myself to use it without this feature. I would be very happy to pay money to help make it happen.

tamaskan commented 1 year ago

Anything new about this issue since chandler is in development ?

found this about it: https://gist.github.com/thiloplanz/e1136a04b26c138c8225