mailpile / Mailpile

A free & open modern, fast email client with user-friendly encryption and privacy features
https://mailpile.is
Other
8.81k stars 1.02k forks source link

Profiles/aaccounts data consistecy #2196

Closed timorl closed 1 year ago

timorl commented 5 years ago

I'm not sure if this should be sent here as an issue, but I don't know how else to share my comments on this, and they seem very relevant.

Mailpile sees to have a general problem with data consistency with regards to profiles/accounts. Any profile seems to be composed of five parts (correct me if I'm wrong):

All the parts except the vcard are completely independent of each other and unaware of the vcard. This leads to bugs like #1958, #1977, #1867, possibly #904 and is partially related to #899.

I think what is needed here is a better data model relating this kind of info -- this would also be relevant to general problems with vcards, like #2080, #2195, and #1913. Also related to #1228. I am not aware of any data model documentation for mailpile -- if any exists I'd gladly read it, but it would still contain the above issues. If one does not exists, creating and implementing one would probably cut down immensly on this kind of bugs.

As for the problems with profiles specifically I'd suggest the following:

  1. Relate the sources, routes and histories back to the vcard(s?) that relate to them. Without a vcard they are not accessible from the GUI and not related to each other. At the very least if a vcard is lost this should trigger some kind of warning somewhere.

  2. After doing the above we could possibly regenerate vcards if they are lost, as is the case with some of the bugs. Not sure what to do with keys, as they have less internal structure so adding a reference to vcards from them is a much bigger challenge

  3. Create other specific requirements for data consistency. I am not sure about the details, but one should think whether having a mail source without a history, or a route by itself, or any other combinations are reasonable states of the world (I expect a lot of them are) and what to do if the constraints are broken (at least warnings or reports from the health check would be nice).

  4. Perhaps create a separate object representing a profile that would reference all of its parts. Now profiles only exists as vcards, which doesn't seem like what vcards are designed to do. This would make it easier to spot problems with profiles (also in general work with them?) and make them more explicitly different from non-profile vcards -- right now the only difference is the kind line in a vcard, which can be changed creating a 'fake' profile with none of the properties of a profile.

Those are just preliminary suggestions and I don't know the project well enough to undestand how feasible they are. I mostly want to start a discussion that will eventually lead to to better data model and fewer bugs.

JackDca commented 5 years ago

It's always good to be able to regenerate lost data, including lost VCards. But no matter how many cross references are created, there is no way to absolutely guarantee data integrity in the face of random data loss. It might be simpler to first understand how the VCards get lost in the first place and try to prevent that. In three years of daily Mailpile use this particular problem has never happened to me. Is it a bug? Unclean shutdown?

Regarding linking the components of a profile - some of the links requested above already exist:

It would be helpful if more documentation of the data model could be added to the wiki. As is always the case with libre software, those who see the need are the best ones to do the work!

timorl commented 5 years ago

Thanks for the informative response @JackDca !

there is no way to absolutely guarantee data integrity in the face of random data loss

I am aware, but it would be nice if problems with data consistency would at least be reported, if they cannot be fixed.

It might be simpler to first understand how the VCards get lost in the first place and try to prevent that.

My other point was that with a proper data model, bugs that make data inconsistent would be rarer and possibly easier to track. Every bit of code manipulating data would have a reference for what it should check for in order to avoid creating inconsistencies.

some of the links requested above already exist

Great!

The first one could be useful for some checks (currently mailpile does not see a problem if the specified vcard does not exist) and I wonder what it currently claims in the case of my reconstructed profile. Is there a CLI way of displaying this value?

The second one I mentioned in my post, but I would be interested in a backwards link.

The third one is a good point, as I mentioned I don't really know what x-mailpile-history does, I just know that if I didn't configure it in my vcard I don't see anything in the New and Total fields of the GUI where profiles are displayed.

It would be helpful if more documentation of the data model could be added to the wiki. As is always the case with libre software, those who see the need are the best ones to do the work!

With pleasure actually! But first I'd like to know if adding some more formal data model would be accepted by the developers here and, if yes, what it could require.

BjarniRunar commented 5 years ago

You are absolutely right that the data model around contacts and accounts sucks, and it is a source of many bugs. No argument there!

However, reading between the lines, you seem to assume that a formal data model would somehow make the bugs go away. That's true for some of the bugs, but for others it's not true at all. The nastiest bugs we have right now, have to do with automated importing and merging of contact data from external data sources that are outside our control (primarily the GnuPG keychain). That stuff was probably a bad idea, and the bugs and flaws would likely have manifested no matter how formal our internal representations were.

Formality is not a panacea... and if I liked formality and well structured data, I probably wouldn't be writing an e-mail client. :stuck_out_tongue_winking_eye: But even if I were to accept that we want more formality in Mailpile's handling of data (and a bit more could certainly help), we really only reap benefits from formality if the formal constraints are enforced by tooling.

And on that subject, it's worth mentioning a few of the design constraints we've been working with from the start:

1) All important data should be encrypted (or encryptable) at rest 2) Standard, or at least text-based, file formats are preferred, so as not to lock-in user data 3) We want to be able to bundle and ship a complete working app on Windows, Mac and Linux

Note that these points are ALL in direct conflict with most RDBMSes, which is the industry-standard way to define and enforce formal data models. I'm not aware of much tooling out there that will let us adhere to these principles. The closest I'm aware of, would be in-memory SQLite databases that we manually encrypt and dump to disk every time something changes - which is fine for some workloads, but a non-starter for others.

But in any case, I feel that before we start talking about formal data models, we're going to have to talk tooling. There's no use having a formal spec without a clear path towards implementation, and I would hate to see you waste your time specifying something nobody is going to implement.