Open darribas opened 4 years ago
I actually am a complete beginner with geopackages. Any contributions are welcome!
Yeah I have no clue... Maybe @jorisvandenbossche or @ljwolf???
Though not Python related, .gpkg
files can easily be queried with Firefox's SQLite Manager. Just drag and drop in the .gpkg
and query. With the sample-geopackage.gpkg
:
PRAGMA table_info(points)
cid | name | type | notnull | dflt_value | pk | |
---|---|---|---|---|---|---|
1 | 0 | fid | INTEGER | 1 | 1 | |
2 | 1 | geom | POINT | 0 | 0 | |
3 | 2 | ID | TEXT | 0 | 0 |
SELECT * FROM points
fid | geom | ID | |
---|---|---|---|
1 | 1 | 71,80,0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 | one |
2 | 2 | 71,80,0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,240,63,0,0,0,0,0,0,0,64 | two |
3 | 3 | 71,80,0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,64,0,0,0,0,0,0,0,0 | three |
SELECT * FROM points WHERE ID == "one"
fid | geom | ID | |
---|---|---|---|
1 | 1 | 71,80,0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 | one |
... though this does seem to be giving strange results in the geopackage seems to have gotten projected.geometry
...
In theory, it should be possible to query gpkg directly from Geopandas. In gpd.read_file
, Fiona passes kwargs as driver-specific parameters that will be interpreted by the OGR library as layer creation or opening options (docs). And GDAL GPKG driver mentions SQL. But I was not able to do so, but I am not sure what should be the exact syntax for that. This below seems to ignore the query.
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
cities = gpd.read_file(gpd.datasets.get_path('naturalearth_cities'))
world.to_file('package.gpkg', layer='world')
cities.to_file('package.gpkg', layer='cities')
gpd.read_file('package.gpkg', layer='world', sql='SELECT * WHERE pop_est > 10000000')
Maybe @sgillies will know.
cool! The GDAL page also mentions a link to Spatialite. It'd be super nice if that allowed you to query from read_file
ton only read from disk the rows you really wanted.
@darribas spatially you can now filter using bbox
to load only part of the geometry within set bounding box (or extent of another geoSeries). That should work with any filetypes. Since geopandas 0.7.0 you'll be able to filer by mask or filter rows (https://github.com/geopandas/geopandas/pull/1160). But figuring out how to query gkpg would be way cooler :).
Yeah, I think you should be able to use that query in theory, but I'm not getting any results using that from geopandas or from fiona directly.
I though maybe you needed to force the sql to query against the layer, sql = 'select * from world where pop_est > 10000000'
, but that still fails.
the fiona
documentation suggests that you should use the builtin filter()
rather than an attribute filter from OGR, so we could try to add that to geopandas here using similar logic to the bbox filter: The user defines a lambda, say
geopandas.read_file('package.gpkg', ..., filter = lambda feature: feature[
'properties']['pop_est'] > 1000000`)
and we evaluate that lambda against the feature in the input to from_features:
from builtins import filter as filter_items
gdf = GeoDataFrame.from_features(filter_items(f_filt, filter), crs=crs, columns=columns)
We'd still read each feature, but not all at once, since that filter_items
call would still be lazy.
@martinfleis Fiona has no support for OGR's SQL. There are only a few formats that support the attribute indexing that would make those attribute queries much faster than any filter you'd write in Python or do with Pandas. GeoPackage is one, but I'm not inclined to add to change Fiona for this case. Supporting Fiona's existing features is a lot of work already and I'm reluctant to add more.
@ljwolf That looks cool. Do you want to move this discussion to geopandas repo? It could work nicely with the PR providing support of row filtering and masking as a bigger update of selective reading capabilities.
Just for my memory, I'll leave here a couple of links I've come across related:
geopandas
issue on interfacing with spatialite, which shows how to setup spatialite to work in python (I take that works on Python 3 but have not tried)Just a though: given GeoPackages are a special file format in that they can be seen as a file or a db (they're both), would it make sense to make it easy to exploit the db side of it through read_postgis
? The workflow being here if you just want to read a whole table you'd use read_file
, if you want a more complicated query that potentially involves spatialite, you'd do it through read_postgis
?
@martinfleis definitely!
Note that the raw bytes stored in the Geopackage for geometries are a header plus WKB, per the spec: http://www.geopackage.org/spec/#gpb_format
This makes them really awkward to use in simple raw SQL queries like the above.
You either need to query them with conversion back to spatial types using a spatial engine that can handle that conversion (looks like both ogr
and spatialite
do that for you), or you have to handle that conversion yourself outside the database.
In case it is useful, I have a very rudimentary pass at writing geopackages from pygeos
geometries here. This might be useful for future versions of geopandas
that use pygeos
geometries internally, if it seemed appropriate to write custom I/O for geopackages. While it established it is possible to get faster I/O using pygeos
and numpy
to handle the conversion to the geometry encoding format above, it has a bunch of rough edges (some of which likely need to move down to a C extension / Cython level).
Fantastic resource! Thanks very much for putting it together. This prompted me a question I've had for a while about GeoPackage as a format. Since it's essentially a SQLite file with some delimited structure, do you know if it's possible/how to query a GeoPackage as a SQL file? At the very least, it should be possible to pull out records matching some attribute condition (e.g.
SELECT * WHERE Column_1 == "x"
). In an ideal world, you might also be able to do spatial queries (e.g. points within a 500m radious of point x, etc.). Do you know if any of this is possible and if so how? I think it'd make an awesome addition to the resource.