satellogic / telluric

telluric is a Python library to manage vector and raster geospatial data in an interactive and easy way
MIT License
87 stars 18 forks source link

Problem with raster resizing #320

Closed enomis-dev closed 2 years ago

enomis-dev commented 2 years ago

Hi, I have the following problem when using Telluric's resize method.

Expected behavior and actual behavior

I read a raster that has unique values 0,1,2 I resized it with some factors and with nearest neighbor resampling I expect that the resized raster will not have values outside 0,1,2 I got as unique values 0,1,2,3,4

Steps to reproduce the problem

from telluric import GeoRaster2
import numpy as np
from rasterio.enums import Resampling
raster = GeoRaster2.open('raster.tif')
np.unique(raster.image.data)

array([0, 1, 2], dtype=uint8)

downsampling_factor = 0.0625
resized_raster = raster.resize(ratio=downsampling_factor, resampling=Resampling.nearest)
np.unique(resized_raster.image.data)

array([0, 1, 2, 3, 4], dtype=uint8)

Operating system and installation details

Libraries: numpy --> 1.19.5 telluric --> 0.14.0 rasterio --> 1.2.6 Python --> 3.7.13

Data raster.zip

System: Ubuntu 21.04

drnextgis commented 2 years ago

It seems like a well known issue: https://github.com/OSGeo/gdal/issues/1923. Workaround is to skip overviews:

import numpy as np
import rasterio
from telluric import GeoRaster2
from rasterio.enums import Resampling
from rasterio.io import MemoryFile

with rasterio.open("raster.tif") as src:
    num_overviews = src.overviews(1)

print("Overviews have more unique values than non-resampled data")
for level in [None] + list(range(len(num_overviews))):
    print(f"Overview level: {level}")
    with rasterio.open("raster.tif", overview_level=level) as src:
        print(f'{"Values:": <15}', np.unique(src.read()))

print("\nUsing non-resampled data solves the issue")
downsampling_factor = 0.0625
with rasterio.open("raster.tif") as src:
    # Read non-resampled version of data
    data = src.read()
    profile = src.profile
    with MemoryFile() as memfile:
        with memfile.open(**profile) as dst:
            dst.write(data)
        raster = GeoRaster2.open(memfile.name)
        print(f'{"Original:": <9}', np.unique(raster.image.data))
        resized_raster = raster.resize(ratio=downsampling_factor, resampling=Resampling.nearest)
        print(f'{"Resized:": <9}', np.unique(resized_raster.image.data))

Output:

Overviews have more unique values than non-resampled data
Overview level: None
Values:         [0 1 2]
Overview level: 0
Values:         [0 1 2 3]
Overview level: 1
Values:         [0 1 2 3]
Overview level: 2
Values:         [0 1 2 3]
Overview level: 3
Values:         [0 1 2 3 4]

Using non-resampled data solves the issue
Original: [0 1 2]
Resized:  [0 1 2]
enomis-dev commented 2 years ago

Thank you @drnextgis for finding the root cause!