Open FlorisCalkoen opened 3 months ago
Following up, my guess is that dask_geopandas
is also struggling to read GeoParquet files thave have been converted with gpq
due to a similar issue/decision that has been made in the gpq
crs conversion/specification. See example below:
storage_options = {"account_name": <account_name> "credential": <token>}
href = "<protocol>/<container>/<prefix>/valid-geo.parquet"
gdf = dask_geopandas.read_parquet(href, storage_options=storage_options)
It looks like you are running into an issue with dask-geopandas. The crs
is optional in a GeoParquet geometry column. It looks like dask-geopandas assumes it will be present here https://github.com/geopandas/dask-geopandas/blob/3489a1cbafbeda3c0d4493133112969268e58d66/dask_geopandas/io/arrow.py#L36
I think this is the same issue https://github.com/geopandas/dask-geopandas/issues/270
I don't think gpq currently contains a method to specify the target crs. Also I see that by default you use "OGC:CRS84", what is your rationale for that? Why not, for example, use "EPSG:4326"?
I'll add a little bit of context on my use case. So I just used
gpq
to convert a 'big' collection of parquet files to geoparquet by simply doinggpq convert non-geo.parquet valid-geo.parquet
in a for loop. Further in my processing chain I load these geoparquet files usingGeoPandas
, but I ran into an issue because when thecrs == "OGC:CRS84"
it cannot be converted to epgs. Although it's expected behaviour I'm mostly just curious why you use "OGC:CRS84" instead of "EPSG:4326".I'll probably change my routines from
gdf.crs.to_epsg()
togdf.crs.to_string()
, but I guess that several others rely on to_epsg() as well when using GeoPandas, so I thought it's worth opening a discussion point here.