isamplesorg / export_client

Client for the iSamples export service
Apache License 2.0
1 stars 1 forks source link

Add examples for how to use DuckDB to query geoparquet files #16

Open rdhyee opened 3 months ago

rdhyee commented 3 months ago

I managed to export the Smithonsian records with

isample export -j $ISAMPLES_JWT -f geoparquet -d /tmp -q 'source:SMITHSONIAN'

Next challenge is how to do simple spatial queries. I installed duckdb on my Mac:

brew install duckdb

I installed duckdb extensions as suggested by cholmes/duckdb-geoparquet-tutorials:

INSTALL spatial;
LOAD spatial;
INSTALL httpfs;
LOAD httpfs;

I then

CREATE TABLE geoparquet_data AS SELECT * FROM read_parquet('isamples_export_2024_06_07_07_40_00_geo.parquet');
DESCRIBE geoparquet_data
  ;

to get

┌──────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─────────┬─────────┬─────────┬─────────┐ │ column_name │ column_type │ null │ key │ default │ extra │ │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ ├──────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┤ │ sample_identifier │ VARCHAR │ YES │ │ │ │ │ label │ VARCHAR │ YES │ │ │ │ │ description │ VARCHAR │ YES │ │ │ │ │ source_collection │ VARCHAR │ YES │ │ │ │ │ has_specimen_categ… │ STRUCT("label" VARCHAR)[] │ YES │ │ │ │ │ has_material_categ… │ STRUCT("label" VARCHAR)[] │ YES │ │ │ │ │ has_context_category │ STRUCT("label" VARCHAR)[] │ YES │ │ │ │ │ keywords │ STRUCT(keyword VARCHAR)[] │ YES │ │ │ │ │ produced_by │ STRUCT(description VARCHAR, has_feature_of_interest VARCHAR, identifier VARCHAR, "label" VARCHAR, responsibility STRUCT("name" VARCHAR, "role" VARCHAR)[], result_time VARCHAR, sampling_site STRUCT(desc… │ YES │ │ │ │ │ registrant │ STRUCT("name" VARCHAR) │ YES │ │ │ │ │ sample_location_lo… │ DOUBLE │ YES │ │ │ │ │ sample_location_la… │ DOUBLE │ YES │ │ │ │ │ geometry │ BLOB │ YES │ │ │ │ ├──────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────┴─────────┴─────────┴─────────┤ │ 13 rows 6 columns │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

but I'm trying to a spatial extent query:

D SELECT ST_Extent(geometry) AS extent FROM geoparquet_data;
Binder Error: No function matches the given name and argument types 'ST_Extent(BLOB)'. You might need to add explicit type casts.
    Candidate functions:
    ST_Extent(GEOMETRY) -> BOX_2D

LINE 1: SELECT ST_Extent(geometry) AS extent FROM geop...
               ^
D SELECT ST_Extent(CAST(geometry AS BOX_2D)) AS extent FROM geoparquet_data;
Conversion Error: Unimplemented type for cast (BLOB -> BOX_2D)
LINE 1: SELECT ST_Extent(CAST(geometry AS BOX_2D)) AS extent FRO...
                         ^
D SELECT ST_Extent(CAST(geometry AS GEOMETRY)) AS extent FROM geoparquet_data;
Conversion Error: Unimplemented type for cast (BLOB -> GEOMETRY)
LINE 1: SELECT ST_Extent(CAST(geometry AS GEOMETRY)) AS extent F...

How to fix?