rspatial / terra

R package for spatial data handling https://rspatial.github.io/terra/reference/terra-package.html
GNU General Public License v3.0
534 stars 88 forks source link

Error: [mosaic] insufficient disk space. Estimated need: 635GB. Available: 46 GB #981

Closed JimShady closed 1 year ago

JimShady commented 1 year ago

This does not work:

my_files <- list("a_rast.tif", "b_rast.tif" ...<200 other files>)
a <- lapply(my_files, rast)
b <- sprc(a)
m <- mosaic(b, 
            fun="max", 
            filename="global.tif", 
            overwrite=TRUE, 
            wopt=list(datatype = "INT1U",
                     ))

Error: [mosaic] insufficient disk space. Estimated need: 635GB. Available: 46 GB

I am unsure how to take a different approach. The rasters overlap, so I can't use merge - I need the fun="max" bit.

If I make a VRT and then use gdaltranslate to write-out the VRT (with LZW compression, BIGTIFF=YES and PREDICTOR=2), then the final file size is around12GB. however gdaltranslate does not have a 'max' function for overlaps so I can't use that.

Any ideas? Thanks.

JimShady commented 1 year ago

Apologies. My mistake. Was writing to the wrong drive. Closing.

rhijmans commented 1 year ago

I am still intrigued about the large difference between the estimated and the actual file size. What is dim(x) for the output raster? And are these real numbers of something smaller such as bytes?

JimShady commented 1 year ago

I don't have dim(x) available right now. I just deleted the output to re-create it again. But it's a 0.0002777778 by 0.0002777778 resolution raster covering most of the globe.

The values in the output will all be within the range 1-6, so I set the datatype to INT1U .

JimShady commented 1 year ago

I have one raster for each country in my project. About 211. Each file/country is 30m resolution ( about 0.0002777778 degrees) and I want to combine them to one global file. I have done it successfully using gdaltranslate.exe . However, as I said, I am not happy how it treats overlapping areas. It just takes the last number, rather than the max. Which is why I am looking into alternative solutions.

rhijmans commented 1 year ago

That suggests that you get a 50x compression. I suppose that is possible because you have few unique values.

library(terra)
r = rast(res=0.0002777778, ymin=-60)
need = ncell(r) / 1073741824
need
#[1] 651.7768
need / 12
#[1] 54.31473

I cannot know beforehand how big the file will actually be (if you would use an ascii format the file could be much larger). I will change the code to give a warning instead of an error.

JimShady commented 1 year ago

Ok. Thanks.

Aside, what does mosaic(fun = "max") do with NA values? If NA and 5, I'm after it returning 5.

JimShady commented 1 year ago

Still struggling to do this work, but now getting:

The terminal process "C:\Users\u1191972\AppData\Local\Programs\Python\Python311\Scripts\radian.exe '--no-save', '--no-restore'" terminated with exit code: 3221225620.

Will try with vanilla R.exe rather than radian. Unsure what is causing this issue. Google talks about dividing by 0 error. Unsure if related to terra.

JimShady commented 1 year ago

Crashed again using mosaic.

The terminal process "C:\Program Files\R\R-4.2.2\bin\R.exe '--no-save', '--no-restore'" terminated with exit code: 3221225620.

rhijmans commented 1 year ago

NAs are ignored in mosaic and merge . The zero-division error (exit code: 3221225620) is puzzling

I would try merging in steps, E.g. four quadrants (NW, NE, SW, SE). You can get the extent from each raster to determine in which quadrant (s) they fall.

JimShady commented 1 year ago

You don't know how I would go about doing the same operation but using native gdal do you? Thinking to see if it crashes gdal also.

JimShady commented 1 year ago

I think I might need to try and understand this stuff I guess:

https://gdal.org/drivers/raster/vrt.html#using-derived-bands-with-pixel-functions-in-c-c

rhijmans commented 1 year ago

I think it is possible but I have no experience with that.

Rapsodia86 commented 1 year ago

I think I might need to try and understand this stuff I guess:

https://gdal.org/drivers/raster/vrt.html#using-derived-bands-with-pixel-functions-in-c-c

See if this one helps: https://gis.stackexchange.com/questions/324602/averaging-overlapping-rasters-with-gdal

JimShady commented 1 year ago

Thanks, I'll give that a go.

JimShady commented 1 year ago

The top of my vrt file, when opened in notepad, now looks like this.

<VRTDataset rasterXSize="1295999" rasterYSize="526500">
  <SRS dataAxisToSRSAxisMapping="2,1">GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]</SRS>
  <GeoTransform> -1.7999986259635691e+02,  2.7777778075829427e-04,  0.0000000000000000e+00,  9.0000007199999999e+01,  0.0000000000000000e+00, -2.7777778075829362e-04</GeoTransform>
  <VRTRasterBand dataType="Byte" band="1">
     <PixelFunctionType>average</PixelFunctionType>
    <PixelFunctionLanguage>Python</PixelFunctionLanguage>
    <PixelFunctionCode><![CDATA[
import numpy as np

def average(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,raster_ysize, buf_radius, gt, **kwargs):
    x = np.ma.masked_greater(in_ar, 6)
    out_ar[:] = np.ma.max(x, axis = 0, fill_value=0)
    mask = np.all(x.mask,axis = 0)
    out_ar[mask]=255
]]>
    </PixelFunctionCode>
    <NoDataValue>255</NoDataValue>
    <ColorInterp>Gray</ColorInterp>
    <ComplexSource>

I'm now running gdaltranslate.exe on it like so:

system(glue('{double_quote("C://Program Files//QGIS 3.28.1//bin//gdal_translate.exe")} --config GDAL_VRT_ENABLE_PYTHON YES -q -co {double_quote("PREDICTOR=2")} -co {double_quote("COMPRESS=LZW")} -of GTiff -co BIGTIFF=YES -co TILED=YES -co NUM_THREADS=ALL_CPUS {double_quote(vrt_file)} {double_quote(gsub(".vrt", ".tif", vrt_file))}'))

It seems to have started making the tif file. I'll have to wait a while now to see what the result is. I wish I'd tested on 3-4 small files rather than just running it on my whole world dataset .... doh.

JimShady commented 1 year ago

If this works I guess I'll write a function to insert the numpy code into my vrt file, so that I don't have to manually edit the vrt each time. It seems a bit weird I need to do this. You'd think someone would have done it already.

Rapsodia86 commented 1 year ago

Here is something you may find helpful;: https://gis.stackexchange.com/questions/350233/how-can-i-modify-a-vrtrasterband-sub-class-etc-from-python

JimShady commented 1 year ago

Woohoo! Modifying the vrt manually actually worked. Though I had originally forgot to put subClass="VRTDerivedRasterBand" in.

Will write out a full toy-example in the morning for future reference.

JimShady commented 1 year ago

This didn't work actually. Something to do with my Python installation that I'm now trying to fix.

Python path configuration: PYTHONHOME = (not set) PYTHONPATH = (not set) program name = 'python' isolated = 0 environment = 1 user site = 1 import site = 1 sys._base_executable = 'C:\PROGRA~1\QGIS32~1.1\bin\GDAL_T~1.EXE' sys.base_prefix = '' sys.base_exec_prefix = '' sys.platlibdir = 'lib' sys.executable = 'C:\PROGRA~1\QGIS32~1.1\bin\GDAL_T~1.EXE' sys.prefix = '' sys.exec_prefix = '' sys.path = [ 'C:\PROGRA~1\QGIS32~1.1\bin\python39.zip', '.\DLLs', '.\lib', 'C:\PROGRA~1\QGIS32~1.1\bin', ] Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding Python runtime state: core initialized ModuleNotFoundError: No module named 'encodings'

Current thread 0x00001a24 (most recent call first):

[1] 1033
JimShady commented 1 year ago

Maybe I will go back to trying with {terra} once you've changed that error into a warning Robert (or did you already in the dev version?)

JimShady commented 1 year ago

So the top of the VRT now looks like this:

<VRTDataset rasterXSize="1295999" rasterYSize="526500">
  <SRS dataAxisToSRSAxisMapping="2,1">GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]</SRS>
  <GeoTransform> -1.7999986259635691e+02,  2.7777778075829427e-04,  0.0000000000000000e+00,  9.0000007199999999e+01,  0.0000000000000000e+00, -2.7777778075829362e-04</GeoTransform>
  <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand">
    <PixelFunctionType>add</PixelFunctionType>
    <PixelFunctionLanguage>Python</PixelFunctionLanguage>
    <PixelFunctionCode><![CDATA[
import numpy as np

def add(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
                raster_ysize, buf_radius, gt, **kwargs):
    np.round_(np.clip(np.amax(in_ar, axis = 0),0,255),
            out = out_ar)
]]>
    </PixelFunctionCode>
    <NoDataValue>255</NoDataValue>

And I run it at the command line like so:

setx GDAL_DATA "C:\Program Files\GDAL\gdal-data"
setx GDAL_DRIVER_PATH "C:\Program Files\GDAL\gdalplugins"
setx PROJ_LIB "C:\Program Files\GDAL\projlib"
setx PYTHONPATH "C:\Program Files\GDAL\C:\Users\u1191972\AppData\Local\Programs\Python\Python311\"
"C://Program Files//GDAL//gdal_translate.exe" --config GDAL_VRT_ENABLE_PYTHON YES --config GDAL_CACHEMAX 64 -q -co "PREDICTOR=2" -co "COMPRESS=LZW" -of GTiff -co BIGTIFF=YES -co TILED=YES -co NUM_THREADS=ALL_CPUS "F://v1.0.1/industrial_base.vrt" "F://v1.0.1/industrial_base.tif

But I get the error:

ERROR 1: <class 'numpy.core._exceptions._ArrayMemoryError'>, ((211, 256, 65536), dtype('uint8'))

Which I am fully aware is nothing to do with terra but I thought someone might find it useful at some point. Somehow.

Rapsodia86 commented 1 year ago

See this: https://stackoverflow.com/questions/57507832/unable-to-allocate-array-with-shape-and-data-type

What python version are you using? 3GB is not that crazy for the allocation

You could try to split it into two or four smaller VRTs, and then merge them all into one as Robert suggested.

JimShady commented 1 year ago

Yes I found that SO answer earlier. I'm on Windows so I tried to mess about with the virtual memory settings and reboot, but it hasn't helped.

JimShady commented 1 year ago

Yes I guess I will try doing them by region first.

JimShady commented 1 year ago

I'm trying to do this region by region now. Like so:

for (each_region in regions){
 my_files <- as.list(my_files[grepl(each_region , my_files)])
        a <- lapply(my_files, rast)
        b <- sprc(a)
        m <- mosaic(b, 
                    fun = "max", 
                    filename=glue("{each_region}.tif"), 
                    overwrite = TRUE, 
                    wopt = list(datatype = "INT1U"))
}

It starts and looks fine, but after a minute or so, it keeps crashing with the Windows divide by zero I reported earlier. Some of the regions write 1KB ... wait a few minutes then crash. Some regions write some content, and then crash. For example these both crashes with the same error.

image

The terminal process "C:\Users\u1191972\AppData\Local\Programs\Python\Python311\Scripts\radian.exe '--no-save', '--no-restore'" terminated with exit code: 3221225620.

rhijmans commented 1 year ago

I would love to look at this, but I cannot do anything without being able to recreate the error. So it would be great if you can share an example that fails.