landam / grass-gis-git-migration-test

0 stars 0 forks source link

r.external to accept gdal config options #135

Open landam opened 5 years ago

landam commented 5 years ago

Reported by perrygeo on 20 Sep 2014 18:14 UTC When linking to an external GDAL raster source, it would be useful to pass GDAL configuration options (http://trac.osgeo.org/gdal/wiki/ConfigOptions)

Consider this scenario: I need to specify a GDAL config option to read an NetCDF file correctly. I can specify GDAL_NETCDF_BOTTOMUP=NO as an environment variable which works for most cases but when using a multiprocessing approach to parallelization (as e.g. t.aggregate does) the newly spawned processes don't inherit the same environment and will fail.

The solution might be for r.external to accept GDAL config options that can be applied regardless of the environment variables and can allow externally linked rasters to function properly across processes.

GRASS GIS version and provenance

svn-trunk

Migrated-From: https://trac.osgeo.org/grass/ticket/2428

landam commented 5 years ago

Comment by neteler on 25 Sep 2014 09:25 UTC The request sounds reasonable but I didn't figure out how to pass "papszOptions" to GDAL as used in r.external. The magic line in r.out.gdal is this:

# r.in.gdal/main.c

    hDstDS =
        GDALCreate(hDriver, output->answer, cellhead.cols, cellhead.rows,
               ref.nfiles, datatype, papszOptions);

but in r.external GDALOpen() is used. Perhaps a GDAL expert can tell us the trick.

landam commented 5 years ago

Comment by glynn on 25 Sep 2014 23:59 UTC Replying to [comment:1 neteler]:

The request sounds reasonable but I didn't figure out how to pass "papszOptions" to GDAL as used in r.external. The magic line in r.out.gdal is this:

but in r.external GDALOpen() is used. Perhaps a GDAL expert can tell us the trick.

Note that r.external is the analogue of r.in.gdal, which doesn't accept any configuration options.

The analogue of r.out.gdal is r.external.out, which has an options= option.

If options are needed for reading, a similar option should be added to both r.in.gdal and r.external, presumably using GDALOpenEx() instead of GDALOpen(). The latter will require extending the GDAL "link" format (lib/raster/gdal.c).

landam commented 5 years ago

Comment by neteler on 26 Sep 2014 09:27 UTC Replying to [comment:2 glynn]:

Replying to [comment:1 neteler]:

The request sounds reasonable but I didn't figure out how to pass "papszOptions" to GDAL as used in r.external. The magic line in r.out.gdal is this:

but in r.external GDALOpen() is used. Perhaps a GDAL expert can tell us the trick.

Note that r.external is the analogue of r.in.gdal, which doesn't accept any configuration options.

While it doesn't directly, I had added a larger cache some time ago by setting GDALSetCacheMax() to 300MB rather than the tiny 40MB default GDAL cache size. This speeds up import tremendously:

r.in.gdal ...
memory=integer
    Maximum memory to be used (in MB)
    Cache size for raster rows
    Options: 0-2047
    Default: 300

I wonder how to get that into r.external (I suppose that it would benefit as well).

If options are needed for reading, a similar option should be added to both r.in.gdal and r.external, presumably using GDALOpenEx() instead of GDALOpen(). The latter will require extending the GDAL "link" format (lib/raster/gdal.c).

OK (no idea how to implement that).

landam commented 5 years ago

Comment by dylan on 15 Oct 2015 18:54 UTC Replying to [comment:3 neteler]:

Replying to [comment:2 glynn]:

Replying to [comment:1 neteler]:

The request sounds reasonable but I didn't figure out how to pass "papszOptions" to GDAL as used in r.external. The magic line in r.out.gdal is this:

but in r.external GDALOpen() is used. Perhaps a GDAL expert can tell us the trick.

Note that r.external is the analogue of r.in.gdal, which doesn't accept any configuration options.

While it doesn't directly, I had added a larger cache some time ago by setting GDALSetCacheMax() to 300MB rather than the tiny 40MB default GDAL cache size. This speeds up import tremendously:

r.in.gdal ...
memory=integer
    Maximum memory to be used (in MB)
    Cache size for raster rows
    Options: 0-2047
    Default: 300

I wonder how to get that into r.external (I suppose that it would benefit as well).

If options are needed for reading, a similar option should be added to both r.in.gdal and r.external, presumably using GDALOpenEx() instead of GDALOpen(). The latter will require extending the GDAL "link" format (lib/raster/gdal.c).

OK (no idea how to implement that).

Finding this thread after searching for some ways to speed-up file access to maps linked via r.external.

The adjustable cache solution in r.in.gdal appears to be:

if (parm.memory->answer && *parm.memory->answer) {
       /* TODO: GDALGetCacheMax() overflows at 2GiB, implement use of GDALSetCacheMax64() */
           GDALSetCacheMax(atol(parm.memory->answer) * 1024 * 1024);
           G_verbose_message(_("Using memory cache size: %.1f MiB"), GDALGetCacheMax()/1024.0/1024.0);
    }

Could this same block of code be used within r.external? I don't fully understand how r.external works, so I suppose that it is more complicated than this.

Or another option, is there an environmental variable that could be used to control the GDAL cache size?

landam commented 5 years ago

Comment by glynn on 19 Oct 2015 19:33 UTC Replying to [comment:4 dylan]:

Could this same block of code be used within r.external? I don't fully understand how r.external works, so I suppose that it is more complicated than this.

r.external itself just sets up the "link" between GRASS and the data file. The actual I/O occurs in lib/raster (gdal.c, open.c, get_row.c, close.c) when a GRASS module reads the map.

But the data which r.external controls is per-map, while this appears to be a global setting. What happens when a module reads multiple GDAL-linked maps with different settings? It might make more sense to set this in Rast_init_gdal() from an environment variable or $GISRC variable.

landam commented 5 years ago

Comment by dylan on 19 Oct 2015 20:03 UTC Replying to [comment:5 glynn]:

Replying to [comment:4 dylan]:

Could this same block of code be used within r.external? I don't fully understand how r.external works, so I suppose that it is more complicated than this.

r.external itself just sets up the "link" between GRASS and the data file. The actual I/O occurs in lib/raster (gdal.c, open.c, get_row.c, close.c) when a GRASS module reads the map.

But the data which r.external controls is per-map, while this appears to be a global setting. What happens when a module reads multiple GDAL-linked maps with different settings? It might make more sense to set this in Rast_init_gdal() from an environment variable or $GISRC variable.

Thank you for the clarification Glynn. I think that a suitable environmental or GRASS variable would be ideal. Something that isn't widely used but very important when working with massive, numerous, or massive and numerous files. I am unable to implement but happy to test and document.

landam commented 5 years ago

Comment by dylan on 31 Mar 2016 19:45 UTC Checking-in, any progress?

landam commented 5 years ago

Comment by neteler on 5 May 2016 14:08 UTC Milestone renamed

landam commented 5 years ago

Comment by neteler on 28 Dec 2016 15:04 UTC Ticket retargeted after milestone closed

landam commented 5 years ago

Modified by @landam on 5 May 2017 20:40 UTC

landam commented 5 years ago

Comment by @landam on 1 Sep 2017 20:28 UTC All enhancement tickets should be assigned to 7.4 milestone.

landam commented 5 years ago

Comment by neteler on 26 Jan 2018 11:40 UTC Ticket retargeted after milestone closed

landam commented 5 years ago

Modified by neteler on 12 Jun 2018 20:48 UTC

landam commented 5 years ago

Comment by @landam on 25 Sep 2018 16:51 UTC All enhancement tickets should be assigned to 7.6 milestone.

landam commented 5 years ago

Comment by @landam on 25 Jan 2019 21:07 UTC Ticket retargeted after milestone closed