microsoft / torchgeo

TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
https://www.osgeo.org/projects/torchgeo/
MIT License
2.35k stars 300 forks source link

What values are being "clipped to the interval [0, 98]" #2145

Open douglasmacdonald opened 3 weeks ago

douglasmacdonald commented 3 weeks ago

Issue

I am not sure what the following paragraph means, as the values following it are not clipped between 0 and 98, and I can't see where in the document this clipping is being applied.

"Below we have min/max values calculated across the dataset per band. The values were clipped to the interval [0, 98] to stretch the band values and avoid outliers influencing the band histograms." https://github.com/microsoft/torchgeo/blob/v0.5.2/docs/tutorials/transforms.ipynb

See also

https://torchgeo.readthedocs.io/en/stable/tutorials/transforms.html#Dataset-Bands-and-Statistics

Fix

It would be helpful if it were clarified which values are being clipped.

douglasmacdonald commented 3 weeks ago

I might need to read this more carefully. Is it saying that the original data between 0 and 98% of the maximum value was selected before the max and min values were calculated? If this is the case, a reference to the original work might be useful.

isaaccorley commented 3 weeks ago

You're right that it should actually read the 0 and 98th percentiles. The values were computed on the entire train set. This is a common approach to filtering outliers and then linear stretching multispectral data. This is similar to the functionality available in ArcGIS or QGIS. You can see more about this and other options here https://www.nv5geospatialsoftware.com/docs/BackgroundStretchTypes.html.