Closed JimShady closed 1 year ago
Apologies. My mistake. Was writing to the wrong drive. Closing.
I am still intrigued about the large difference between the estimated and the actual file size. What is dim(x)
for the output raster? And are these real numbers of something smaller such as bytes?
I don't have dim(x) available right now. I just deleted the output to re-create it again. But it's a 0.0002777778 by 0.0002777778 resolution raster covering most of the globe.
The values in the output will all be within the range 1-6, so I set the datatype to INT1U .
I have one raster for each country in my project. About 211. Each file/country is 30m resolution ( about 0.0002777778 degrees) and I want to combine them to one global file. I have done it successfully using gdaltranslate.exe . However, as I said, I am not happy how it treats overlapping areas. It just takes the last number, rather than the max. Which is why I am looking into alternative solutions.
That suggests that you get a 50x compression. I suppose that is possible because you have few unique values.
library(terra)
r = rast(res=0.0002777778, ymin=-60)
need = ncell(r) / 1073741824
need
#[1] 651.7768
need / 12
#[1] 54.31473
I cannot know beforehand how big the file will actually be (if you would use an ascii format the file could be much larger). I will change the code to give a warning instead of an error.
Ok. Thanks.
Aside, what does mosaic(fun = "max") do with NA values? If NA and 5, I'm after it returning 5.
Still struggling to do this work, but now getting:
The terminal process "C:\Users\u1191972\AppData\Local\Programs\Python\Python311\Scripts\radian.exe '--no-save', '--no-restore'" terminated with exit code: 3221225620.
Will try with vanilla R.exe rather than radian. Unsure what is causing this issue. Google talks about dividing by 0 error. Unsure if related to terra.
Crashed again using mosaic.
The terminal process "C:\Program Files\R\R-4.2.2\bin\R.exe '--no-save', '--no-restore'" terminated with exit code: 3221225620.
NA
s are ignored in mosaic and merge . The zero-division error (exit code: 3221225620) is puzzling
I would try merging in steps, E.g. four quadrants (NW, NE, SW, SE). You can get the extent from each raster to determine in which quadrant (s) they fall.
You don't know how I would go about doing the same operation but using native gdal do you? Thinking to see if it crashes gdal also.
I think I might need to try and understand this stuff I guess:
https://gdal.org/drivers/raster/vrt.html#using-derived-bands-with-pixel-functions-in-c-c
I think it is possible but I have no experience with that.
I think I might need to try and understand this stuff I guess:
https://gdal.org/drivers/raster/vrt.html#using-derived-bands-with-pixel-functions-in-c-c
See if this one helps: https://gis.stackexchange.com/questions/324602/averaging-overlapping-rasters-with-gdal
Thanks, I'll give that a go.
The top of my vrt file, when opened in notepad, now looks like this.
<VRTDataset rasterXSize="1295999" rasterYSize="526500">
<SRS dataAxisToSRSAxisMapping="2,1">GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]</SRS>
<GeoTransform> -1.7999986259635691e+02, 2.7777778075829427e-04, 0.0000000000000000e+00, 9.0000007199999999e+01, 0.0000000000000000e+00, -2.7777778075829362e-04</GeoTransform>
<VRTRasterBand dataType="Byte" band="1">
<PixelFunctionType>average</PixelFunctionType>
<PixelFunctionLanguage>Python</PixelFunctionLanguage>
<PixelFunctionCode><![CDATA[
import numpy as np
def average(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,raster_ysize, buf_radius, gt, **kwargs):
x = np.ma.masked_greater(in_ar, 6)
out_ar[:] = np.ma.max(x, axis = 0, fill_value=0)
mask = np.all(x.mask,axis = 0)
out_ar[mask]=255
]]>
</PixelFunctionCode>
<NoDataValue>255</NoDataValue>
<ColorInterp>Gray</ColorInterp>
<ComplexSource>
I'm now running gdaltranslate.exe on it like so:
system(glue('{double_quote("C://Program Files//QGIS 3.28.1//bin//gdal_translate.exe")} --config GDAL_VRT_ENABLE_PYTHON YES -q -co {double_quote("PREDICTOR=2")} -co {double_quote("COMPRESS=LZW")} -of GTiff -co BIGTIFF=YES -co TILED=YES -co NUM_THREADS=ALL_CPUS {double_quote(vrt_file)} {double_quote(gsub(".vrt", ".tif", vrt_file))}'))
It seems to have started making the tif file. I'll have to wait a while now to see what the result is. I wish I'd tested on 3-4 small files rather than just running it on my whole world dataset .... doh.
If this works I guess I'll write a function to insert the numpy code into my vrt file, so that I don't have to manually edit the vrt each time. It seems a bit weird I need to do this. You'd think someone would have done it already.
Here is something you may find helpful;: https://gis.stackexchange.com/questions/350233/how-can-i-modify-a-vrtrasterband-sub-class-etc-from-python
Woohoo! Modifying the vrt manually actually worked. Though I had originally forgot to put subClass="VRTDerivedRasterBand"
in.
Will write out a full toy-example in the morning for future reference.
This didn't work actually. Something to do with my Python installation that I'm now trying to fix.
Python path configuration: PYTHONHOME = (not set) PYTHONPATH = (not set) program name = 'python' isolated = 0 environment = 1 user site = 1 import site = 1 sys._base_executable = 'C:\PROGRA~1\QGIS32~1.1\bin\GDAL_T~1.EXE' sys.base_prefix = '' sys.base_exec_prefix = '' sys.platlibdir = 'lib' sys.executable = 'C:\PROGRA~1\QGIS32~1.1\bin\GDAL_T~1.EXE' sys.prefix = '' sys.exec_prefix = '' sys.path = [ 'C:\PROGRA~1\QGIS32~1.1\bin\python39.zip', '.\DLLs', '.\lib', 'C:\PROGRA~1\QGIS32~1.1\bin', ] Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding Python runtime state: core initialized ModuleNotFoundError: No module named 'encodings'
Current thread 0x00001a24 (most recent call first):
Maybe I will go back to trying with {terra} once you've changed that error into a warning Robert (or did you already in the dev version?)
So the top of the VRT now looks like this:
<VRTDataset rasterXSize="1295999" rasterYSize="526500">
<SRS dataAxisToSRSAxisMapping="2,1">GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]</SRS>
<GeoTransform> -1.7999986259635691e+02, 2.7777778075829427e-04, 0.0000000000000000e+00, 9.0000007199999999e+01, 0.0000000000000000e+00, -2.7777778075829362e-04</GeoTransform>
<VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand">
<PixelFunctionType>add</PixelFunctionType>
<PixelFunctionLanguage>Python</PixelFunctionLanguage>
<PixelFunctionCode><![CDATA[
import numpy as np
def add(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
raster_ysize, buf_radius, gt, **kwargs):
np.round_(np.clip(np.amax(in_ar, axis = 0),0,255),
out = out_ar)
]]>
</PixelFunctionCode>
<NoDataValue>255</NoDataValue>
And I run it at the command line like so:
setx GDAL_DATA "C:\Program Files\GDAL\gdal-data"
setx GDAL_DRIVER_PATH "C:\Program Files\GDAL\gdalplugins"
setx PROJ_LIB "C:\Program Files\GDAL\projlib"
setx PYTHONPATH "C:\Program Files\GDAL\C:\Users\u1191972\AppData\Local\Programs\Python\Python311\"
"C://Program Files//GDAL//gdal_translate.exe" --config GDAL_VRT_ENABLE_PYTHON YES --config GDAL_CACHEMAX 64 -q -co "PREDICTOR=2" -co "COMPRESS=LZW" -of GTiff -co BIGTIFF=YES -co TILED=YES -co NUM_THREADS=ALL_CPUS "F://v1.0.1/industrial_base.vrt" "F://v1.0.1/industrial_base.tif
But I get the error:
ERROR 1: <class 'numpy.core._exceptions._ArrayMemoryError'>, ((211, 256, 65536), dtype('uint8'))
Which I am fully aware is nothing to do with terra but I thought someone might find it useful at some point. Somehow.
See this: https://stackoverflow.com/questions/57507832/unable-to-allocate-array-with-shape-and-data-type
What python version are you using? 3GB is not that crazy for the allocation
You could try to split it into two or four smaller VRTs, and then merge them all into one as Robert suggested.
Yes I found that SO answer earlier. I'm on Windows so I tried to mess about with the virtual memory settings and reboot, but it hasn't helped.
Yes I guess I will try doing them by region first.
I'm trying to do this region by region now. Like so:
for (each_region in regions){
my_files <- as.list(my_files[grepl(each_region , my_files)])
a <- lapply(my_files, rast)
b <- sprc(a)
m <- mosaic(b,
fun = "max",
filename=glue("{each_region}.tif"),
overwrite = TRUE,
wopt = list(datatype = "INT1U"))
}
It starts and looks fine, but after a minute or so, it keeps crashing with the Windows divide by zero I reported earlier. Some of the regions write 1KB ... wait a few minutes then crash. Some regions write some content, and then crash. For example these both crashes with the same error.
The terminal process "C:\Users\u1191972\AppData\Local\Programs\Python\Python311\Scripts\radian.exe '--no-save', '--no-restore'" terminated with exit code: 3221225620.
I would love to look at this, but I cannot do anything without being able to recreate the error. So it would be great if you can share an example that fails.
This does not work:
I am unsure how to take a different approach. The rasters overlap, so I can't use merge - I need the fun="max" bit.
If I make a VRT and then use gdaltranslate to write-out the VRT (with LZW compression, BIGTIFF=YES and PREDICTOR=2), then the final file size is around12GB. however gdaltranslate does not have a 'max' function for overlaps so I can't use that.
Any ideas? Thanks.