gramps-project / gramps-web-api

A RESTful web API for Gramps
GNU Affero General Public License v3.0
77 stars 40 forks source link

Gramps synchronization #163

Closed DavidMStraub closed 2 years ago

DavidMStraub commented 3 years ago

Eventually, especially once we implement write operations, it would be great to support two-way synchronization with Gramps desktop. I suppose it would not be too hard to implement a plugin that uses the API XML export that we already have and builds on top of the existing Import Merge Tool.

cdhorn commented 3 years ago

Not sure I follow your thinking and not sure how practical this is...

Nick-Hall commented 3 years ago

Why don't we just write directly to the database?

The only obstacle I can see at the moment is the metadata which is held in memory, but that should be easy to fix.

Then we could use JSON patch to modify existing objects. For example:

[{ "op": "replace", "path": "/gender", "value": "1" }]

This could be used with a PATCH request to change the gender of a person.

We could implement person_patch and associated database methods in core Gramps. I have already prototyped this approach for the standard editors as a route to implement multi-user access.

cdhorn commented 3 years ago

Right, the GUI seeing changes shouldn't be an issue AFAIK. But the REST API is stateless so how can it push notifications to the clients not knowing who they are? Would clients poll for changes?

DavidMStraub commented 3 years ago

OK, let me elaborate what I meant. There are a couple of separate issues in my view.

1. Need for a Gramps → API synchronization

Let's say there is an official hosting service "GrampsCloud" and I host my tree there. I still want to keep using Gramps (as a desktop app) on my laptop and expect it to work as I'm used to when I'm sitting in an archive in a basement without wifi, or on a plane. I wouldn't want to accept, as a user, to only be able to work when I'm online. So there will be "offline" changes that need to be synced back to the "cloud". At the moment, I'm just rsync-ing my Gramps database directory to my web server, but once we allow write access to the API, there will be conflicts and there must be a way to solve them. The logic for solving these conflicts should be mostly contained in the "import merge tool", just that this would be the other way around: not merging changes from an XML file into the local tree but merging local changes into a remote tree (the API).

Note that this is quite similar (I presume) how other online services connect with their desktop counterparts, e.g. MyHeritage and Family Tree Maker (although I've never used them).

2. Need for an API → Gramps synchronization

This might not be necessary at the moment, but once we allow (multi-user) write access, there must be a way for a researcher to sync those changes back to their local tree; potentially with conflicts. This is exactly what the "import merge tool" does, so I think it's not difficult to implement.

3. Gramps using the API as a database

When I run the API locally, it shares the SQLite database with my desktop Gramps, which works in principle, but I don't think direct database access makes sense in a network context. For SQLite this doesn't work anyway, but even if someone were to use a Postgres DB, I don't think it would be sensible to make the database server accessible directly. How would one impose access rights, etc.?

Still, having Gramps directly talk to the API without a manual synchronization step is doable in principle I believe, by having a GenericDb subclass that reads and writes from the API instead of talking to a database. We could have a separate endpoint for that, accepting SQL-like payloads.

But since I think the synchronization will be necessary anyway (see above), I am not sure if this is actually worth the effort of implementing it.

4. Notifying Gramps of remote database changes

I am not sure, but I think currently Gramps assumes that nobody else is messing with the DB, so I don't know if it's actually possible to remain consistent when the remote DB changes while Gramps has opened it. But this is anyway only necessary in the setup in 3. above, not when using manual syncs.

5. Notifying app users of database changes

That the database changes while a user of a web app (or other type of app) is using the tree is a scenario that we'll anway have, regardless of whether it's due to syncing with a desktop Gramps or through direct edits with some other app. Notifying the app is indeed something we have to think about; one way would be to add a last changed timestamp to the metadata endpoint (which would require polling); another (more sophisticated) would be to use websockets.

cdhorn commented 3 years ago

Oh okay, yes I understand what you mean now and took it in the wrong context. Yes that would be a good feature and agree you would want Gramps to be able to push as well as pull changes.

DavidMStraub commented 3 years ago

As we're closer to having write support & collaborative editing, we have to think about a way to keep diverging local and remote databases in sync.

I'm thinking of the following procedure:

  1. User opens a "synchronize with remote server" plugin in their Gramps desktop.
  2. The plugin shows a username/password dialog and then fetches a full XML export from the existing /exporters/xml/file endpoint
  3. A modified version of the Import Merge Tool is used to display all the local vs remote (downloaded XML) differences and to choose which ones to keep
  4. After consolidating all the changes, the plugin pushes a full XML of the local DB (or a JSON with necessary changes) back to a new API endpoint, where those changes are applied, after which the remote DB is exactly the same as the local DB

This would require the following:

What do you think?

DavidMStraub commented 3 years ago

... forgot to mention: before applying the patch in 4., we should probably make an automated backup on the server side.

cdhorn commented 3 years ago

Agree on automated backup. I assume there would be config option to only keep x copies around. But then having an automated backup does there not also need to be a more convenient way to restore it?

After consolidating all the changes, the plugin pushes a full XML of the local DB (or a JSON with necessary changes) back to a new API endpoint, where those changes are applied, after which the remote DB is exactly the same as the local DB

on the API side: new endpoint accepting the consolidated XML or a JSON with changes (e.g. JSON patch)

So I'm a little unclear here how this would work. If after a sync process it is to be exactly the same as the local DB then you need to drop and recreate the database and import the XML otherwise the object change dates will continue to diverge. That may or may not be important. But then I have not looked at the code so am I wrong about that?

DavidMStraub commented 3 years ago

Agree on automated backup. I assume there would be config option to only keep x copies around. But then having an automated backup does there not also need to be a more convenient way to restore it?

True. This needs more thought.

So I'm a little unclear here how this would work. If after a sync process it is to be exactly the same as the local DB then you need to drop and recreate the database and import the XML otherwise the object change dates will continue to diverge. That may or may not be important. But then I have not looked at the code so am I wrong about that?

Yes, either that or we have to also set the change date explicitly (which is possible).

DavidMStraub commented 3 years ago

I played around with a modified import merge addon but ended up starting from scratch in a more modular way (and using more of gramps.gen.merge.diff). The code is here until it's ready for addons-source: https://github.com/DavidMStraub/gramps-addon-webapisync

DavidMStraub commented 2 years ago

I have an almost finished implementation of the sync plugin and was starting to try it on my on (real) tree, but ran into a really annoying and hard to solve issue. @Nick-Hall any insight would be appreciated.

The problem is that, when I'm done synchronizing changes and have written them to the local DB, the tool is writing the changes to be synced remotely to the in-memory SQLite it was using for the diff. The DbTxn that is used for this is then converted to a JSON object that can be posted to the remote /api/transactions/ endpoint. So far, so good.

Now, the problem: my local desktop Gramps has de as language, the remote API, based on the official docker image, is using en_GB. I don't want to change this, as it's a good way to find bugs in the i18n implementation and it's a scenario that we will have to expect. So, what's happening? For all GrampsType subclasses, the JSON representation my local Gramps generates has a German string: {"_class": "EventType", "string": "Geburt"} (Geburt = Birth).

The remote API is not happy with that: it thinks it's a new custom type. Instead of silently changing the type and cluttering the database with useless custom types, fortunately we included a powerful check in the API that submits not only the updated object, but also the original one. This is meant to prevent mid-air collisions, e.g. when the remote DB has change in the meantime, but also catches this case: the API complains that the type has changed from Birth to the custom type Geburt in the original object.

Now, to solve this, one could patch gramps.gen.serialize.to_json to not use the type's _I2SMAP (converting integer type ID to localized string), but instead _I2EMAP (converting integer type ID to English "XML string"), and then use a custom GrampsLocale instance to translate the "XML string" to the remote locale (that can be obtained simply by getting /api/metadata).

However, this only almost works, because there are types where the XML string is different from the translation message! Why this is the case, I have no idea, but I have not found a way to work around it and am at a loss.

List of problematic type names:

Father's Age/Father Age
Mother's Age/Mother Age
Born In Covenant/BIC
Do not seal/DNS
Do not seal/Cancel/DNS/CAN
Bold/bold
Italic/italic
Underline/underline
Fontface/fontface
Fontsize/fontsize
Fontcolor/fontcolor
Highlight/highlight
Superscript/superscript
Link/link

If any of those is present, synchronization will fail - this will happen in particular for all notes with formatting.

I don't know how to "backwards translate" with gettext and the _DATAMAPs are populated on Gramps import and cannot be changed anymore. Any ideas, apart from hard coding the above list?

Nick-Hall commented 2 years ago

For all GrampsType subclasses, the JSON representation my local Gramps generates has a German string: {"_class": "EventType", "string": "Geburt"} (Geburt = Birth).

I think that the problem is with the JSON representation in core Gramps. Internally a GrampsType actually stores pre-defined types as an integer. Custom types use the string.

The Gramps XML format exports types as untranslated strings. This is probably what I intended, but used the translated version by mistake.

We could use either approach. The JSON format isn't used much at the moment. We should ask @prculley for his opinion as he uses it in the import and merge.

DavidMStraub commented 2 years ago

Thanks for the feedback! Yes, IMO it would make more sense for the Gramps serialization functions to use the XML string; especially if this is still planned as a means to store the serialized objects in the database to replace the current pickled format. Otherwise, the types in a database would not be recognized anymore if a user changes their locale. I would be happy to submit a PR to change it.

I checked which addons use the serialization functions:

In the last two cases (@prculley please confirm) they are only used to compare two objects, so whether the type name is localized or not shouldn't make a difference as long as both objects are serialized with the same convention (which is the case).

Even if we change this in Gramps, for the sync tool I might have to go with hard-coding the exceptions for the time being as the change won't land until Gramp 5.2 I guess.

prculley commented 2 years ago

The object.get_schema (JSON) is used in the database differences report (and import/merge tool) to provide user readable names to the various objects. Doug Blank created the database differences report, and I leveraged his thinking in making the import/merge.

So I think there is a use for translated object titles. But I can see why for general JSON export (and eventual JSON db use) we should use untranslated XML style strings.

If the JSON is not supposed to be user readable in all languages, is there a way to extend the get_schema with translated strings a well? Perhaps add a xmltitle tag? Or mark these as late translate so they are not translated in the get_schema return, but can be translated for reports, and left untranslated for other purposes?

Just thoughts...

jralls commented 2 years ago

Or mark these as late translate so they are not translated in the get_schema return, but can be translated for reports, and left untranslated for other purposes?

That's the correct approach.

DavidMStraub commented 2 years ago

@prculley I think you are referring to the to_struct method, right? get_schema doesn't contain the values, only the types.

Or mark these as late translate so they are not translated in the get_schema return, but can be translated for reports, and left untranslated for other purposes?

That's the correct approach.

Yes, that makes sense, however it will be complicated by the differences between the XML string and the translation message for the exceptions I listed above. Due to these exceptions, we can't just do

s = gramps_type_instance.xml_str()
s_trans = _(s)  # will not work in general

but would have to do something like

s = gramps_type_instance.xml_str()
s_trans =  gramps_type_instance._I2SMAP[gramps_type_instance._E2IMAP[s]]

and this will only work for translating to the default locale. Translating to another locale is still not possible without hard-coding the exceptions.

The cleanest solution would be to get rid of the exceptions. Changing the XML strings is not an option as it would break Gramps XML backward compatibility. Changing the translation messages would require changing all language files, but I don't see any other issue.

jralls commented 2 years ago

All of the DATAMAPs should be changed from _(foo) to N_(foo) and functions that use _I2SMAP should take a GrampsLocale argument so that they can be translated into an arbitrary locale, as is necessary for correct use in reports.

I think that would allow keeping the more human-friendly presentation strings without any new special casing or messing up translators.

DavidMStraub commented 2 years ago

Interesting! Is this N_ used anywhere in the current Gramps code base?

jralls commented 2 years ago

Yes though it's always spelled out as ngettext. N_ is an alias usually associated with it as _ is with gettext. Up to now it's been used sparingly in Gramps. When I redid the localization several years ago I didn't bother creating the alias because it occurs only once or twice in each file where it's used. It will be used a lot in the DATAMAP definitions making defining the alias worthwhile.

DavidMStraub commented 2 years ago

I will close this as the sync addon is in development and there is nothing to do in web API at this time.