Closed darribas closed 2 years ago
This refers to the default parquet compression used in (geo)pandas. You can try using some other {‘snappy’, ‘gzip’, ‘brotli’, None},
or None.
OK can confirm writing in geopandas
with None
works on sfarrow
Thanks for raising this. Just to add that compression does work within sfarrow
, but it depends on your arrow
installation and the codecs are properlly installed. I've had some past issues with arrow
on my Ubuntu system. This is also discussed here: https://arrow.apache.org/docs/r/articles/install.html
Super! I did install sfarrow
on top of an already installed R stack (the 6.1 of the gds_env
. For the next release of the container though I'd love to get this packaged and properly installed. I suspect installing at the same time as sf
, rgeos
, etc. will take care of conflilcts but I might reach out if I run into issues...
question in the meantime: do you know if arrow binaries get installed when you install sfarrow
or is it a dependency you have to deal with on your own before installing it?
The arrow
package should automatically get installed when sfarrow
is installed if it's not found. The binaries for the Arrow C++ library should be installed by the arrow
R package installation according to https://arrow.apache.org/docs/r/#installation. You (hopefully) shouldn't have to deal with any of it on your own before installing sfarrow
.
However, my experience is that full Arrow support isn't always included with that default installation of the arrow
package. Specifically the support for the 'snappy' codec that you mentioned earlier (and other compressions).
On Ubuntu 20.04, I have to use the following within R to get the Arrow library and arrow
package to support snappy:
Sys.setenv(ARROW_S3="ON")
Sys.setenv(NOT_CRAN="true")
install.packages("arrow", repos = "https://arrow-r-nightly.s3.amazonaws.com")
From: https://stackoverflow.com/questions/64937524/r-arrow-error-support-for-codec-snappy-not-built
I hope that helps.
Just a quick update, I can confirm the strategy above seems to work fine and will ship on the gds_env:7.0
.
I'm trying the following to interface
geopandas
andsf
:Then on R:
And I get the following error:
I suspect this refers to the codec used by geopandas to write arrow objects to parquet file, but not sure what the differences would be?
cc'ing @jorisvandenbossche and @martinfleis in case they have any clues