stevenpawley / Pyspatialml

Machine learning modelling for spatial data
GNU General Public License v3.0
145 stars 29 forks source link

Mask function using an humongous amount of memory #43

Open willieseun opened 2 years ago

willieseun commented 2 years ago

I tried to mask a tif file of 6gb on a pc of 64gb ram and the memory usage kept increasing till it reached about 97% then I cancelled it. I cancelled it because I had similar issues with another file that I didn't cancel and it brought a blue screen error. I hope it can be fixed soon because I tried masking the same image in R studio with the raster, terra package and had no memory problems.

Secondly, the resample function only allows for raster shape, How can I resample with cell size instead?

stevenpawley commented 2 years ago

Ah, yes the mask function is really just using rasterio's mask method, which reads everything into memory and does require quite a lot of RAM for the processing. The only thing that it does to reduce the memory footprint is it applies the mask by band, which was a quick and dirty approach to masking, assuming that the operation for a single band can still be performed in memory. It should be possible to mask by reading in chunks of the data - will have to look at implementing this.

willieseun commented 2 years ago

Thanks for your response Steve! Would love you to look at the resampling with cell size too or using a similar .tif template