locationtech / geogig

GeoGig project
Other
402 stars 80 forks source link

Putting a PostGIS DB under version control with GeoGig #362

Open FxKu opened 7 years ago

FxKu commented 7 years ago

Hey there,

I couldn't get access to the geogig-dev list, so I will open an issue here, although I've just have some general questions.

My database consists of many tables which are connected by foreign keys. The geometry for single features is centralized in one table - so it's a little different than a common GIS-like database. For example, I can not use QGIS or GeoServer for editing without quickly running into constraint violations.

So, I've imported the database into GeoGig. Then I made a change in the database and imported it again. GeoGig was able to figure out what has changed and only commited the diff, which I found pretty cool, btw. Now, my question is, if this a proper workflow? Continue to work with my database as usual, and then import the changes to GeoGig whenever I feel like it. I've also tried to export commits back to PostGIS and it got restored properly (without foreign keys, but that's ok). I've read that imports are loaded into a cache. So, if my database gets very big will there be a limit for imports?

I will give a talk about database versioning at Foss4g Boston next week and just want to be sure, if I got all the concepts right. In the talk I will compare different solutions and GeoGig will be one of them.

Thx in advance for any help and see you in Boston Felix

groldan commented 7 years ago

Hello, sorry I haven't noticed this until now.

Yes, that is a valid workflow. It's not optimal because of the time it takes every time you re-import, but you're correct that, besides having no choice other that performing a "full table scan" per table, in the end only what have changed from that last import will be committed to the repository.

I'm not sure where you've read that imports are loaded into a cache, but as it stands in this context it is not accurate. Your imports will be safely saved to the repository's revision graph, there's no limit for the number of imports.

Hope that helps. Cheers, Gabriel

FxKu commented 7 years ago

Thanks for this answer ... just in time before my talk ;) With the cache question I did not mean number of imports. I was just thinking if I make imports of HUGE tables, if there is a limit?

groldan commented 7 years ago

cool! just saw your slides, looks awesome! yeah, the postgres storage backend makes use of a heap cache, but no, objects are not added to the cache on inserts, only when performing queries, and it evicts automatically so no limit, just performance degradation if the cache is too small.

zaxxorq commented 7 years ago

Hello,

I have a similar situation, I want to use postgreSQL-backed geogig repository, but I want to use it over geoserver.

So far, I've managed to do two separate things:

  1. Save data to geogig repository over geoserver (with geogig plugin)
  2. Save data to geogig repository directly by importing tables with pg tool (no need for geogig plugin then)

From what I understand, as stated on the official web here, the geogig repository cannot version "raw data" (user database tables) directly, thus it must be imported into geogig standard repository structure. Such structure may not be manageable in the sense of queries, foreign keys and other data manipulations and geospatial computations.

So, what I managed with approach number 2 is about the same as what FxKu mentioned in the first post. If I want to work with the data in form of postgreSQL tables I need to export the data from geogig repository, then import-add-commit it back into geogig repository when user is done.

You said that such workflow is not optimal due to having to re-import, so could you recommend some alternative? Is there some way to automatically and optimally synchronize postgres tables with the last/custom version from geogig repository? How would it support branching?

Thank you very much in advance! Žarko

martinbustos commented 6 years ago

Hi guys,

I face with a similar scenario. I have to implement multiuser versioned editing on a postgis database and I would like to maintain the database structure. Given an existing database, could you explain me in more detail how to implement the workflow "import-add-commit" against a postgreSQL-backed geogig repository?

In this scenario, I would have to maintain a db instance per user?

Thanks in advance for any help

Martin

FxKu commented 6 years ago

Hey @martinbustos, You got it right. The work is decentralized. Every user would work with his own exported DB instance. There are tools available that allow for multi-user editing in one single PostGIS DB, but they lack of branching/merging capabilities.

ali-dashti commented 5 years ago

you can use trigger for import on every change!