rsbivand / rgrass

Interpreted interface between the GRASS geographical information system and R
https://rsbivand.github.io/rgrass/
26 stars 8 forks source link

Speed up and enhance reading from (and writing to) a GRASS GIS database #93

Closed florisvdh closed 3 months ago

florisvdh commented 3 months ago

This PR attempts to address the following:

Further testing, especially with Windows and macOS are most welcome!

A small benchmark is shown below.

Code to create the benchmark ``` r library(dplyr, warn.conflicts = FALSE) library(rgrass) #> GRASS GIS interface loaded with GRASS version: (GRASS not running) grass_conf <- initGRASS( gisBase = "/usr/lib/grass83", gisDbase = "/home/floris/grassdata", location = "nc_basic_spm_grass7", mapset = "PERMANENT", override = TRUE ) vec_without_gdalgrass <- bench::mark( vec_without_gdalgrass = read_VECT( "streets", use_gdal_grass_driver = FALSE, ignore.stderr = TRUE ), iterations = 4 ) suppressWarnings({ vec_with_gdalgrass <- bench::mark( vec_with_gdalgrass = read_VECT( "streets", use_gdal_grass_driver = TRUE ), iterations = 4 ) }) rast_without_gdalgrass <- bench::mark( rast_without_gdalgrass = read_RAST( "elevation_shade", use_gdal_grass_driver = FALSE, ignore.stderr = TRUE ), iterations = 10 ) rast_with_gdalgrass <- bench::mark( rast_with_gdalgrass = read_RAST( "elevation_shade", use_gdal_grass_driver = TRUE ), iterations = 10 ) bind_rows( vec_without_gdalgrass, vec_with_gdalgrass, rast_without_gdalgrass, rast_with_gdalgrass ) |> select(expression, min, median, iterations = n_itr) |> knitr::kable() ```
expression min median iterations
vec_without_gdalgrass 12.92s 13.09s 3
vec_with_gdalgrass 4.13s 4.18s 4
rast_without_gdalgrass 476.39ms 484.12ms 9
rast_with_gdalgrass 143.08ms 146.35ms 9

Created on 2024-06-16 with reprex v2.1.0

Some system specs:

$ ogrinfo --version
GDAL 3.8.4, released 2024/02/08
$ 
$ apt policy libgdal-grass
libgdal-grass:
  Installed: 1:1.0.2-7+jammy3
  Candidate: 1:1.0.2-7+jammy3
  Version table:
 *** 1:1.0.2-7+jammy3 500
        500 http://ppa.launchpad.net/ubuntugis/ubuntugis-unstable/ubuntu jammy/main amd64 Packages
        100 /var/lib/dpkg/status
     3.4.1-3 500
        500 http://ftp.belnet.be/ubuntu jammy/universe amd64 Packages
$ 
$ inxi -Sxxx
System:
  Host: xxx Kernel: 5.15.0-112-generic x86_64 bits: 64 compiler: gcc
    v: 11.4.0 Desktop: Cinnamon 6.0.4 tk: GTK 3.24.33 wm: muffin vt: 7
    dm: LightDM 1.30.0 Distro: Linux Mint 21.3 Virginia
    base: Ubuntu 22.04 jammy
hellik commented 3 months ago

Point 1 by @rsbivand is an important one regarding winGRASS. Different GDAL versions built by different compilers may interfere. That's a packaging issue.

Though I find it an interesting approach worth to be followed.

The fallbacks are the methods currently used?

florisvdh commented 3 months ago

The fallbacks are the methods currently used?

@hellik yes, if GDAL is not aware of the GDAL-GRASS driver (so indeed it needs to be set up correctly), the current implementation is used. In that case GRASS GIS is called with r.out.gdal or v.out.ogr to first write a temporary file either with the RRASTER (raster) or the GPKG (vector) GDAL driver, respectively.

For a binary driver installation that works out of the box on Ubuntu-based systems, one can use the ubuntugis-unstable PPA, which is where the official GRASS GIS release lives, and install libgdal-grass. It is on par with its GRASS GIS and GDAL dependencies in the same repo. Still, to be user-friendly and work out of the box on all systems, it would indeed be great to have it natively included in GDAL.

These are the driver issues encountered a few days ago and posted at the driver repo, with help of @neteler: https://github.com/OSGeo/gdal-grass/issues/46, https://github.com/OSGeo/gdal-grass/issues/47, https://github.com/OSGeo/gdal-grass/issues/48.

rsbivand commented 3 months ago

So also this is bringing back code for example at: https://r-forge.r-project.org/scm/viewvc.php/pkg/rgrass7/R/bin_link.R?root=spgrass#l330, dropped on transfer to github. It was dropped because effectively nobody used the plugin, and because it was read-only. Please try some comparisons writing objects from R to GRASS using the plugin, as I think you suggest that it can write. I understand that this will only work for users with terra built from source against a GDAL built with the GRASS and OGR_GRASS drivers, so is effectively a reversion to how spgrass6 and rgrass7 used to work. It will not be available on platforms where terra is not built with driver-enabled GDAL.

rsbivand commented 3 months ago

Running rgrass built from the gdal_grass branch fails CMD check with this log: 00check.log

florisvdh commented 3 months ago

suggest that it can write

No, it's only relevant to read_RAST() and read_VECT(), since indeed it's a read-only driver. So in this PR the driver functionality is only added in read_RAST() and read_VECT().

In relation to the driver, write_*() functions were only updated to refuse objects that already refer to a GRASS data source.

Apart from that, this PR also adds some read_*() and write_*() functionality that is agnostic to which approach is being used (see PR description).

It will not be available on platforms where terra is not built with driver-enabled GDAL.

GDAL needs no specific configuration to use the driver; it uses its autoload mechanism to detect the driver and make it available. See https://github.com/OSGeo/gdal-grass/blob/main/README.md:

With this driver package you configure and install GDAL normally, then build and install GRASS normally and finally build and install this driver in GDAL's "autoload" directory.

On my system it was sufficient for {terra} to have been compiled against the GDAL version that was in place before the additional driver was installed. Also no additional GDAL_DRIVER_PATH environment variable (as documented at https://github.com/OSGeo/gdal-grass/blob/main/README.md) was needed for the driver to be detected, since the binary driver from the ubuntugis-unstable PPA is installed in an expected path:

$ locate gdalplugins | grep GRASS
/usr/lib/x86_64-linux-gnu/gdalplugins/gdal_GRASS.so
/usr/lib/x86_64-linux-gnu/gdalplugins/ogr_GRASS.so
florisvdh commented 3 months ago

Running rgrass built from the gdal_grass branch fails CMD check with this log: 00check.log

Thanks. It's introduced by 682c7fc, 'needed' in order to not get the same error on my system, where there's no rsb mapset.

So since this error now pops up on your side I guess on your machine the rsb mapset will be the active mapset, so that write_RAST() writes the layers there, invoking the error.

The solution will be to be explicit about the mapset where new layers will be written. I suggest that we create a custom mapset explicitly in the examples, so that written layers end up there, not in PERMANENT. Do you agree @rsbivand?

rsbivand commented 3 months ago

Yes, @florisvdh creating a mapset to write into is a good idea in the examples; will you implement it?

florisvdh commented 3 months ago

Thanks for testing @veroandreo !

florisvdh commented 3 months ago

@rsbivand Setting a mapset in the examples of this PR has now been taken care of (in readRAST.Rd and readVECT.Rd).

I discovered that the vect2neigh() example in readVECT.Rd still somehow modifies the PERMANENT mapset by writing 3 extra layers into it (based on the census dataset). This happens even though the custom mapset is active. Since I won't soon have time to dig into this function (already returning from the GRASS GIS meeting), I have outcommented its example so that PERMANENT remains untouched.

florisvdh commented 3 months ago

While fiddling with mapsets and regions yesterday, it was discovered that the GDAL-GRASS driver does not respect a mapset's region. It just reads out a raster layer as it is stored in the GRASS GIS database. This is an important difference with r.out.gdal, which applies the mapset's region settings (extent & resolution).

As this causes inconsistency between the r.out.gdal approach (i.e. writing a temporary file for use by {terra}) and the GDAL-GRASS driver approach (direct reading), we cannot consider uptake of the GDAL-GRASS raster driver already.

Consequently I have reverted the use of the GDAL-GRASS raster driver in this PR (a828acf).

A corresponding driver feature request has been posted at https://github.com/OSGeo/gdal-grass/issues/49, so at least awaiting a solution for that one before reconsideration. For the driver it makes sense to take into account the region, since this GDAL driver is tailored for GRASS GIS.

Thanks @neteler and @veroandreo for the discussion about this yesterday.

Note that for vector datasets the region is not considered in GRASS GIS, so only the raster driver approach has been reverted.

rsbivand commented 3 months ago

@florisvdh looks OK now, would you like to merge?

florisvdh commented 3 months ago

@florisvdh looks OK now, would you like to merge?

OK, let's do that. I'll add updates to NEWS.md and the pkgdown website before merging.

florisvdh commented 3 months ago

@rsbivand you can of course rephrase NEWS.md if needed. Thanks for reviewing this PR.