njtierney / geotargets

Targets extensions for geospatial data
https://njtierney.github.io/geotargets/
Other
49 stars 4 forks source link

Require minimum version of GDAL? #71

Open Aariq opened 1 month ago

Aariq commented 1 month ago

Certain features (e.g. SOZip shapefile use by tar_terra_vect() with filetype = "ESRI Shapefile") require GDAL >= 3.7. However, that release was only about a year ago. Running tests on GDAL 3.0.4 gives the following:

Error (test-tar-terra.R:62:5): tar_terra_vect() works
<tar_condition_run/tar_condition_targets/rlang_error/error/condition>
Error: Error running targets::tar_make()
Error messages: targets::tar_meta(fields = error, complete_only = TRUE)
Debugging guide: https://books.ropensci.org/targets/debugging.html
How to ask for help: https://books.ropensci.org/targets/help.html
Last error message:
    the write() function in tar_format() must not create a directory. Found directories inside the data store where there should only be files: _targets/objects/test_terra_vect_shz

I think there are a few options:

  1. Require GDAL >= 3.7
  2. Check for GDAL >= 3.7 and error if older
  3. Check GDAL version and fallback to a different system when version requirement isn't met (e.g. using utils::zip() instead of the .shz extension)
brownag commented 1 month ago

I think it is reasonable to target minimum support for ~GDAL 3.0.4 for baseline geotargets/GDAL functionality, but upon reflection, in the case of writing a .shz I was a bit misleading in my other comment.

Writing a .SHZ has actually been supported since GDAL 3.1. The current test seems to work fine on GDAL 3.4.1 which ships default with Ubuntu 22.04 LTS (latest on GHA). e.g. https://github.com/njtierney/geotargets/actions/runs/9089652980/job/25013438422?pr=70#step:5:2863.

The ZIP file written is just not SOZip enabled. Which is fine, because we don't do anything special, we just get read performance boost for free via /vsizip/ with GDAL >3.7


That said I think there is a good discussion to be had here. There are many relevant changes and upgrades users may want to make use of via additional options, but we can't assume that much of that more modern functionality is there by default, at least not yet. Work-arounds for particular cases that will become less relevant as time goes on may not be the best approach, but may be worth it if it is going to be a primary pattern in geotargets use cases.

As for ESRI Shapefile, I figure there are other single file spatial formats that can be used on older GDAL versions. So, I would be in favor of skip the test when GDAL <3.1 or expect error on older GDAL version. Considering we already have special handling for it, it isn't too much overhead IMO to just check the version and throw an explanatory error.

If we look at the builds of Ubuntu packages (https://packages.ubuntu.com/search?keywords=gdal) we see 20.04 LTS, the oldest LTS still supported, has GDAL 3.0.4 unless additional repositories are added. So this issue will pop up from time to time.

For Ubuntu, it is simple to add additional repositories to get a somewhat more modern version. However, getting >=3.7 currently requires Ubuntu 22.04 if using ubuntugis-unstable (https://launchpad.net/~ubuntugis/+archive/ubuntu/ubuntugis-unstable). Granted you do occasionally see folks running 18.04 (not supported for a year now), or CentOS or Redhat or similar with some ancient versions of GDAL... In that case folks are left to building from source, which is definitely doable but with a bit of a learning curve.

Aariq commented 1 month ago

Why 3.0.4 and not 3.1 as a minimum version? It sounds like writing .shz is introduced in GDAL 3.1 so couldn't we just require that and then not worry about doing conditional things in geotargets? Like, will everything we currently have implemented be likely to work in GDAL ≥ 3.1?

brownag commented 1 month ago

SystemRequirements isn't a hard limit in our case, so we could put more or less whatever we want in there. It wouldn't preclude users from running an older version since we don't rely on linking to GDAL libs directly in building geotargets or anything like that

sf and terra have minimum GDAL reqs of 2.0.4 and 2.2.3 respectively. I suggested 3.0.4 because it sortof escapes the burden of having to do any specific maintenance or workarounds for GDAL <3, while recognizing there still may be a significant amount of users with devices running Ubuntu 20.04 or similar vintage libs. I don't really have any problem with going to 3.1, other than that writing shapefiles doesn't seem like a significant enough reason to actually set minimum version

Aariq commented 1 month ago

On the other hand... I just remembered that all the packages that require GDAL are in Suggests, so maybe it's not appropriate for geotargets to list GDAL as a SystemRequirement. Might be good to just add a note in documentation whenever we rely on a GDAL feature that is only available in more recent versions.

njtierney commented 1 month ago

Talking to @mdsumner I don't think we can specify a very recent version of GDAL - {terra} has "GDAL (>= 2.2.3)", as you noted @Aariq

So perhaps the way to do this is identify the places where we use modern features from GDAL and fire a warning/trigger some other process?