omniscale / imposm3

Imposm imports OpenStreetMap data into PostGIS
http://imposm.org/docs/imposm3/latest/
Apache License 2.0
721 stars 158 forks source link

Support import into Spatialite database #20

Closed skrach closed 2 years ago

skrach commented 10 years ago

Introduction

This update adds the ability to import OSM data into a SpatiaLite database.

All current features of the PostGIS import (e.g. generalized tables, creation of indexes) are supported by the Spatialite import as well. The only thing missing is support for schema operations like Deployment and rotation between different schemas. This is because SpatiaLite has no concept of schemas.

As the changes I made are rather big, I'll explain them in detail as well as the reasoning behind them.

Changes

Refactoring of database/postgis package

The current package structure inside database has been changed. A new parent package sql has been introduced.

The new sql package contains a lot of the previous postgis logic. Only code specific to postgis has been moved to the new sub package postgis. Analogously the new spatialite package contains code specific for the handling of a Spatialite database.

The reson for this refactoring is that a lot of database code is independent of the underlying SQL database. Other SQL based RDBMS would be placed as childs of this sql package as well.

For NoSQL databases a new package directly below the database package would need to be created.

The sql package implements the database interfaces defined in database.go.

database/sql/sql.go (was postgis.go)

Inside the file sql.go (which previously was postgis.go) is the place of shared SQL logic. Most of the code will seem familiar. However to make this logic database agnostic the following changes have been made:

The QueryBuilder interface provides a single place to define databse specific SQL statements. imposm3 already had this concept partially by using Specifications. The QueryBuilder interfaces are additions to this paradigma and put all SQL statements behind an interface.

Currently the following QueryBuilder interfaces exist (all defined in sql.go):

QueryBuilder interfaces

QueryBuilder

The QueryBuilder interface provides methods not bound to a specific (generalized) table. This means that upon calling this methods, schema and table must be provided as arguments.

Examples include TableExistsSQL(schema, table), DropTableSQL(schema, table) or CreateGeometryIndexSQL(schema, table, column). The full list of methods can be found in sql.go

NormalTableQueryBuilder

This interface defines methods bound to normal (=non generalized) table instances. All methods have no arguments, as the required data is provided using the SQLDB struct.

This interface defines methods bound to generalized table instances:

Spatialite doesn't allow concurrent writing to a single database. Therefore writes can't be parallelized as during a PostGIS import. Therefore, the amount of workers is now a new property named Worker of the SQLDB struct which is set during initialization inside the postgis.go and spatialite.go files. For the SpatiaLite database import this value is set to 1.

Bulk Import

SpatiaLite doesn't allow bulk imports using it's API. A new property controlling this behaviour has been added as well (SQLDB.BulkSupported)

Deployment Support

As SpatiaLite doesn't support schemas, deployment is not possible. Another property (SQLDB.DeploymentSupported) allows to enable or disable this feature. I know that the idea behind the current design of imposm3 checks if the Deployer interface is implemented to represent this behaviour, but since sql.go implements the database interface on it's own, this mechanism can't be used.

Optimize()

Database optimization is specific to a database, therefore the logic for it has bene moved to postgis.go and spatialite.go as well. A function pointer (SQLDB.Optimizer) allows a database impolementation to run database specific code for this scenario. A QueryBuilder is not sufficient in this case as the statements will often vary. For example SpatiaLite has no concept of clustered geometric indexes like PostGIS.

Final words

I know that this update contains a lot of changes. It was my intention to make them in a useful way so that no duplicate code is created and so that the current design of imposm3 is respected. Nonetheless it's entirely possible that you don't agree to some of the concepts I've introduced or have ideas how things could be built in a better way. I'm open if you've ideas or suggestions!

If you like I'm also happy to discuss this pull request via email, phone or Skype if you like.

I would be really thankful if you could take the time to see through these changes!

olt commented 10 years ago

This is indeed a huge changeset. Cool that you did figure out how to get it to work, since there isn't much documentation and the mapping/field/spec-stuff is not really intuitive. Unfortunately your patch collides with my plans to refactor these bits, so I won't be able to merge it.

Regardless of that, a few comments to your patch:

I don't like the new abstraction with the added SQL package (this is my personal opinion about the API of Imposm). I think the generic DB interface is good enough. Yes, PostGIS and Spatialite can share some code, but I would implement that by creating reusable functions and components and not by wrapping everything another time and adding methods like IsDeploymentSupported.

Ideally, the first working 'patch' would be a copy of the postgis folder that includes all changes to get it working for Spatiallite, including tests. Then I would work on that branch and refactor it to reuse common parts from the postgis package.

Also:

I know you put in a lot work for that, but patches that do change the architecture need to be discussed before (here or on the mailing list).

skrach commented 10 years ago

First of all thank you for taking the time to go through my proposed changes! Your detailed answer shows me that you took the time to seriously think about my idea. I really appreciate that! Your tips (e.g. git mv usage) area a great feedback as well!

I can understand your criticism about my implementation. As you've pointed out, some of those things are a matter of personal preference, while others are simply flawed in their current form (like the mentioned IsDeploymentSupported method).

In hindsight it would've been wise to talk to you about my changes before implementing them. I don't know why I hadn't come to that conclusion before! :-D Anyway I think I learnt a lot about imposm3 and developing in Go in general, so I don't think the time was wasted.

Still I think SpatiaLite support would be a great addition to imposm3. You already roughly explained a way to get there. I would be really glad to help enable this feature in a way you're happy and agree with.

The idea of duplicating the postgis folder and adapting the code to get Spatialite support is exactly what I did when making a prototype version to see if I would hit any walls. Therefore I could provide you with such a version if you like.

My only concern is that you mentioned refactorings in the affected code parts. Ideally a fork of the postgis folder should be created after those refactorings happened.

Therefore I would like to ask you the following questions:

Assuming you want to move forward in adding Spatialite support, other decisions would need to be made as well. E.g. how code sharing between SQL databases could be implemented and how I can help you with that (if you like). But maybe it's best, to discuss the questions at hand first?