wcjochem / sfarrow

R package for reading/writing `sf` objects from/to parquet files with `arrow`.
https://wcjochem.github.io/sfarrow/
Other
75 stars 4 forks source link

Document difference to current status of arrow #5

Open petrbouchal opened 3 years ago

petrbouchal commented 3 years ago

Hi, great work - I came across this in the R Weekly I think.

I was trying to do similar things with arrow and ended up filing an issue which resulted in a fix.

I wonder to what extent the PR in arrow covers the use case handled by your package? If there are differences, would it make sense to document them here? Especially around the compatibility with geopandas, which I have not tested the arrow pull request against. In any case I thought you might be interested that/how this is being handled in arrow.

wcjochem commented 3 years ago

Hi! Thank you for raising this. I had not found your issue or the fix when I started building this workaround package. I'm glad to see that this kind of support will be coming in arrow.

I just tested the version of arrow in the PR and, and I agree it does cover the roundtrip with sf objects which is great. But it can't currently handle parquet files written from geopandas (and it doesn't write to a format that geopandas understands).

The way arrow is handling the sfc geometry columns also leads to a really large metadata list. sfarrow has a step to convert geometry to WKB first (as does geopandas) which let's it be more compact and maybe a bit faster.

I agree I can spell out these differences a bit more here. For some users, the arrow solution may be sufficient for their needs.

petrbouchal commented 3 years ago

Thanks for the response. Indeed, before the improvement in arrow, I was also using conversion to WKB, so I agree there is value in this, esp. given the situation re geopandas as you write.

Good luck with this going forward!