natcap / invest

InVEST®: models that map and value the goods and services from nature that sustain and fulfill human life.
Apache License 2.0
162 stars 65 forks source link

Proposal: transition to using fiona for vector operations #1619

Open emlys opened 1 month ago

emlys commented 1 month ago

Fiona is now a well established package for working with vector data. It wraps the GDAL C API providing a pythonic interface to read and write vector files. I propose we transition existing code from using the GDAL python API for vectors to using fiona where possible.

We've discussed the need for better vector support in pygeoprocessing. Fiona will take care of a lot of the vector boilerplate that we might want to abstract out. And then it should be clearer which, if any, vector operations are specific to us that we might want to put in pygeoprocessing.

Vector boilerplate such as starting/committing transactions, copying, and syncing to disc are complicated and easy to do wrong in GDAL. The fiona developers have clearly put more thought than we have into doing these tasks correctly from python.

This task is fairly low priority since it doesn't support any new functionality, but I think it's worth doing in the near future as a maintenance item.

Note: why not geopandas? While geopandas is another popular vector library, I think it's less suited to our use case. It's designed around reading in data all at once, so it won't be memory efficient on large datasets. It also brings in the whole pandas framework, which I don't think we'll benefit from in most cases in invest. Fiona is a more minimal abstraction on top of GDAL.