tilemill-project / tilemill

TileMill is a modern map design studio
https://tilemill-project.github.io/tilemill/
BSD 3-Clause "New" or "Revised" License
3.1k stars 527 forks source link

Handling non-geodata in TileMill #1083

Open yhahn opened 12 years ago

yhahn commented 12 years ago

Brain dump/sketch of upcoming non-geodata work. Past tickets #820, #941.

Goals

springmeyer commented 12 years ago

mapnik scope

Yes, loading arbitrary non-spatial data is outside the scope of mapnik. In the future, if we see a major need to support non-spatial data in all the common formats that mapnik renders (or will) for tilemill like shapefile dbf's or postgis tables in addition to csv's or sqlite, then I could change my position. But if 90% of use cases are csv's then we should simply specialize tilemill to this and handle purely in javascript.

joining

As far as joining to other layers I see these scenarios:

1) tabular data is csv, spatial data is sqlite: This is the ideal world if tabular data is large and joins need to be tested and frequently refined. This would be the tilemill power user, advanced scenario. It can be solved by tilemill offering to import the csv to an sqlite db, attaching the db to the mapnik sqlite layer (on-the-fly?), and then, as normal, allowing the user to author custom joins in the subselect layer UI. The advantage of this, again, is it will scale for large data and it is flexible to allow users to get potentially complex joins right without requiring re-import churn.

2) tabular data is a small csv, spatial data is a small file like shapefile/geojson/kml (or some format the file was distributed in). Imagine a world borders shapefile and a simple list of countries - like the examples @incanus has brought up frequently. Novice user just wants to join them without messing with sql and see a map in seconds and even an ultra slow join would suffice and hopefully all id's match one-to-one. This could be done by tilemill importing the csv and then querying all features from the mapnik shapefile layer. Then in pure javascript the two sets of rows could be joined on the fly, and a new sqlite db could be written out with the combined results. Latest mapnik can now read/write WKB geometries, so once this is exposed in node-mapnik we can test this approach. My assumption is that we would want to enforce a strict limit on row count for this method so that simple, small joins are doable but larger joins (> 5000 rows?) are blocked to avoid having to write an ultra performant join implementation (this is what sql databases are for!).

3) massive csv, massive spatial file: require conversion to sqlite outside of tilemill for now.