qgis / QGIS-Enhancement-Proposals

QEP's (QGIS Enhancement Proposals) are used in the process of creating and discussing new enhancements for QGIS
117 stars 37 forks source link

QEP 80: Project backup and revisions #80

Open NathanW2 opened 7 years ago

NathanW2 commented 7 years ago

QGIS Enhancement 80: Project backup and versioning

Date 2016/11/02 Author Nathan Woodrow @NathanW2 Contact woodrow.nathan@gmail.com maintainer @NathanW2 Version QGIS 3.0

Summary

Have you ever used Office 356 or Google Docs and found the revision history function handy? Being able to roll back to an older version is super handy if you mess something up. Well, why not have the same thing in QGIS? Think how powerful it would be to be able to restore older versions of the currently open project, and never worry about busting your project again.

This QEP outlines the ability to be able to store project revisions/backups with the ability to rollback to older versions if needed. Being able to rollback to older copies of a document is a handy feature with a lot of web tools today and bringing this into QGIS would increase the user friendliness of the application. Having auto revision also allows the user to experiment without worry of never being able to rollback to an older version.

Current solution

Currently, each time the project is saved a .qgs~ the file is created. This looks fine, however, this only gives you the single point of rollback and requires users to rename the file manually in order to restore the backup. There have been cases, although rare, of a project file getting corrupted, while this is a rare case having a built in system would avoid bad cases like that.

If you need to keep revisions of your project files you are currently required to take file backups and we all know where that leads. project.copy.qgs, project.copy2.qgs, project.IDontKnowWhatVersion.qgs`. You get the point. There is no need for the user to manage the project file on disk. Of course, there is nothing stopping the user doing that if they still wish.

Proposed Solution

The proposed solution is to store a copy of the project XML blob compressed in a single managed (let's say in ~.qgis2 for now) SQLite database with a timestamp, file name, project name, and some other metadata. This allows QGIS to manage all backups and file revisions itself with a nice UI in order to rollback to older/newer versions.

An example of the SQLite table:

      CREATE TABLE IF NOT EXISTS projects (
          name TEXT,
          "save_date" datetime,
          filepath TEXT,
          xml BLOB,
          tag TEXT);

Each time the user saves a compressed copy of the XML blob is saved in the database along with the name, filename, date stamp, etc. The user also has the ability to save a tag against that save point. Tags are simply named points in time.

When a point in time is selected the XML is loaded from the database, written to disk, and reloaded in the session.

The feature will also have the ability to define other storage types using a simple API. These could include MS SQL, Postgres, web service. This can allow for enterprise style setups where the data might be stored on local PCs and in a database. Storage writes can be done in a thread to avoid locking the UI.

A python example of a storage API

class Storage:
      def save_project(self, xml, **metadata):
            pass

      def get_version_xml(self, name, date):
            pass

      def get_versions(self, name):
            pass

Each storage provider could define the interface so that we can use that storage for projects. By default, QGIS will use the SQLite storage unless told otherwise.

Performance Implications

Storage calls must be threaded in order to not block UI. Saving to a web service or Loading from server and web service could be slow although generally shouldn't be an issue as we are only downloading the XML from the storage.

Storage Implications

Storing a copy of the XML on each save could lead to a lot of data being kept. A couple of methods to help combat this:

Further Considerations

As per #27, a new type of project file could make sense and allow storing the history alongside the project. This might be an option, however, some things to consider:

If #27 is implemented, using a git type tool to store the diffs next to the project might also work but would need testing. Something to consider at least.

One option might be to export the history into the project format as proposed in #27 if you wish to share the project history with others.

Using Git doesn't really work well if you need a central storage for all projects as it would get quite messy.

Prototype

I have implemented this basic project versioning idea in a plugin which can be found here: https://github.com/NathanW2/project-versions/blob/master/plugin_versions/__init__.py

Prototype UI options

image

image

This prototype creates an SQLite DB and will save a copy of the XML blob on each project save. Each version can be selected to reload in your current session.

Further Improvements

Backwards Compatibility

The new feature will no effect on older installs.

Issue Tracking ID(s)

Votes

(required)

NathanW2 commented 7 years ago

Note: This draft is still a work in progress and I will fill in more details over the next couple of days.

pcav commented 7 years ago

Performance implications can be sever for server: parsing the project already has apparently a major performance hit n it.

NathanW2 commented 7 years ago

Currently the idea is the project is still loaded from disk, only the backup xml is saved in the DB.

On Thu, 3 Nov 2016 5:20 pm Paolo Cavallini notifications@github.com wrote:

Performance implications can be sever for server: parsing the project already has apparently a major performance hit n it.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/qgis/QGIS-Enhancement-Proposals/issues/80#issuecomment-258076998, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXS3NXhIYNWsecXPV4m2xVWod2gS6H4ks5q6YstgaJpZM4Kn4A_ .

mhugo commented 7 years ago

Hi @NathanW2 Nice idea. I will start working on https://github.com/qgis/QGIS-Enhancement-Proposals/issues/27 that includes a new project format (zip based) quite soon and your proposal could be a nice candidate to test

NathanW2 commented 7 years ago

@mhugo ok cool. I'm still open to the idea of having maybe the option to include history in the project format however there a couple of main points that it needs to address first:

And there are a few others but can't think of it right now.

Something I did think of was maybe having the option to include that history if you are shipping the project to someone else, else it just leaves it local.

Open to ideas and I will flesh this QEP out more over the next couple of days.

NathanW2 commented 7 years ago

@wonder-sk @anitagraser @m-kuhn keen to get your thoughts on this.

nyalldawson commented 7 years ago

I like this idea a lot!

My ideal wishlist of features would be:

anitagraser commented 7 years ago

I think this is a great idea and a very nice pragmatic solution.

I'll go test the plugin :-)

In general - concerning new designs for project files - I hope that there will still be a way to edit the project in a text editor, because if dozens of file paths need to be changed, find&replace is so much nicer than the GUI. This does not seem an issue with this proposal, but might be with #27 (don't know).

NathanW2 commented 7 years ago

So did some testing with using git. Git still stores a copy of the file any time on commit if there is a change, which will be each time for that single project. In the end, it will still grow in file size for each save like the proposed solution.

An alternative is just storing diffs against a snapshot of the project file, however, that also can raise other issues in terms of restoring etc.

Generally, space is pretty cheap these days and we can have auto compress older changes on so it compressed older versions after a number + time stamp.

Any other thoughts?

m-kuhn commented 7 years ago

So did some testing with using git. Git still stores a copy of the file any time on commit if there is a change, which will be each time for that single project. In the end, it will still grow in file size for each save like the proposed solution.

Really? That sounds quite surprising and e.g. the following suggests otherwise:

While that's true and important on the conceptual level, it is NOT true at the storage level. Git does use deltas for storage. Not only that, but it's more efficient in it than any other system. Because it does not keep per-file history, when it wants to do delta compression, it takes each blob, selects some blobs that are likely to be similar (using heuristics that includes the closest approximation of previous version and some others), tries to generate the deltas and picks the smallest one. This way it can (often, depends on the heuristics) take advantage of other similar files or older versions that are more similar than the previous. The "pack window" parameter allows trading performance for delta compression quality. The default (10) generally gives decent results, but when space is limited or to speed up network transfers, git gc --aggressive uses value 250, which makes it run very slow, but provide extra compression for history data.

Source: http://stackoverflow.com/questions/8198105/how-does-git-store-files/8198276#8198276

If there are good reasons for not using git, I don't mind at all and yes, space it pretty cheap, so other approaches might be better.

NathanW2 commented 7 years ago

Hmm guess I was wrong then, I didn't find that source when I was looking around. Will do some more testing.

Only reason I wasn't planning on using git was because I wanted to have a single store for all project file history and not keep it next to the file. Maybe it could be kept with the file if we go with #27

m-kuhn commented 7 years ago

I don't know the git internals and api. I assume it should be possible to dump it into a blob or taring / zipping it up. In any case, coordination with #27 sounds very good.

NathanW2 commented 7 years ago

Yeah if we went with #27 we can just have a git repo into side the zip file really, would require more testing of course.

I also want to consider how to handle project files that might be non-file based, e.g from a database.