intake / intake_geopandas

An intake plugin for loading datasets with geopandas
BSD 2-Clause "Simplified" License
15 stars 7 forks source link

add geopackage driver #36

Open amsnyder opened 10 months ago

amsnyder commented 10 months ago

I would like to store some geopackage files in an intake catalog, but I don't currently see any drivers for this file type. A pangeo colleague suggested I open an issue here. Do you all have any plans to add this driver? Thanks!

martindurant commented 10 months ago

What is geopackage, please? I am looking at https://www.geopackage.org/spec/#_sqlite_container (an sqlite3 file with specific conventions and file extension).

Yes, a driver would be fine, but I would rather do it for "v2", currently in development. How do you currently read these data?

amsnyder commented 10 months ago

I'm not sure how to answer the question about what a geopackage is - I don't know the details of the file format. I can try to help dig up information if I know what you're looking for.

Here is an example of how I would open one:

import geopandas as gpd
import fsspec

fs_read = fsspec.filesystem(
    's3',
    anon=True,
    client_kwargs={'endpoint_url': 'https://usgs.osn.mghpcc.org'}
)

with fs_read.open('hytest/wbd/huc12/huc12.gpkg', mode='rb') as f:
    huc12_basins_geopackage = gpd.read_file(f, layer='huc12', driver="GPKG")  
martindurant commented 10 months ago

I added the following to the Intake Take2 (v2) branch:

class Geopackage(SQLite):
    filepattern = "gpkg$"

and this allows

In [2]: import intake

In [3]: intake.datatypes.recommend(u, storage_options={'endpoint_url': 'https://usgs.osn.mghpcc.org', 'anon': True}, head=None)
Out[3]: [intake.readers.datatypes.Geopackage]

In [4]: data = intake.readers.datatypes.Geopackage(u, storage_options={'endpoint_url': 'https://usgs.osn.mghpcc.org', 'anon': True})

In [5]: reader = data.to_reader()

In [6]: reader.read()
Out[6]:
                                         TNMID                            METASOURCEID  ... SHAPE_Area                                           geometry
0       {B1EF0C55-72ED-4FF6-A3BA-97A87C6A6C47}                                     NaN  ...   0.004859  MULTIPOLYGON (((-86.15784 31.42164, -86.15783 ...
1       {F0D9874D-52BA-4FDC-A5E6-E259B627764D}                                     NaN  ...   0.014214  MULTIPOLYGON (((-86.18406 31.53503, -86.18406 ...
2       {2E0CB201-5672-45B5-8CA7-A60070122697}                                     NaN  ...   0.009979  MULTIPOLYGON (((-86.29029 31.27059, -86.29089 ...
3       {9D39E120-C6DF-401F-AA8F-1748E9423AA0}                                     NaN  ...   0.009897  MULTIPOLYGON (((-86.30253 31.45077, -86.30251 ...

Making readers is much simpler in V2! This reader object can then be put into a catalog and saved as YAML.

Note on "anon": we trialed having s3fs "fall back" to trying anon in the case that credentials were missing or invalid, but this caused problems for everyone, so it's better to explicitly label datasets that need no creds.

martindurant commented 10 months ago

Note on head= in recommend(): if this is True (the default) the start of the file gets scanned, and the possible datatypes then includes SQLite.

amsnyder commented 10 months ago

Awesome, thanks @martindurant. Is there a timeline for when intake v2 will be released?

martindurant commented 10 months ago

Very alpha is available now as 2.0.0a2 (or .aX, as I have time). I was planning for beta/RC release at the new year, and then full release depending on feedback. I might call the package "intake2" or "take2" for a transitional time (but nor until release).

ian-r-rose commented 10 months ago

Hi! This is somewhat off the cuff as I’m traveling, but I think it should be possible to use geopackage files with this plugin as-is by specifying the driver in the geopandas_kwargs.

There is even an example in the test suite here: https://github.com/intake/intake_geopandas/blob/34b30175ba86f4ce754f4e84d35fa07c91e6db88/tests/test_file_source.py#L78

On Thu, Dec 21, 2023 at 7:54 PM Martin Durant @.***> wrote:

Very alpha is available now as 2.0.0a2 (or .aX, as I have time). I was planning for beta/RC release at the new year, and then full release depending on feedback. I might call the package "intake2" or "take2" for a transitional time (but nor until release).

— Reply to this email directly, view it on GitHub https://github.com/intake/intake_geopandas/issues/36#issuecomment-1866789302, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLWQN4Y5ZZJX56FQ3IXZVTYKSAU7AVCNFSM6AAAAABA6U63JWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRWG44DSMZQGI . You are receiving this because you are subscribed to this thread.Message ID: @.***>