jgrss / geowombat

GeoWombat: Utilities for geospatial data
https://geowombat.readthedocs.io
MIT License
184 stars 10 forks source link

config issue? #216

Closed mmann1123 closed 2 years ago

mmann1123 commented 2 years ago

I was just rebuilding docs for the book and noticed this. Just wondering if this needs to be handled.

with gw.config.update(scale_factor=0.0001):
  with gw.open(l8_224078_20200518) as src:
    print(src.attrs['scales'])
(1.0,1.0,1.0)
mmann1123 commented 2 years ago

Maybe this is just my lack of understanding. Is scale_factor used to update the array only when used in functions like NDVI etc? Is there a way to scale it directly?

from geowombat.data import l8_224078_20200518

with gw.config.update(scale_factor=0.0001):
  with gw.open(l8_224078_20200518) as src:
    print(src.values[0])
    print(src.scales)
    print(src.gw.scale_factor)
[[   0    0    0 ...    0    0    0]
 [   0    0    0 ...    0    0    0]
 [   0    0    0 ...    0    0    0]
 ...
 [7692 7518 7513 ... 7440 7432 7415]
 [7586 7590 7610 ... 7440 7411 7425]
 [7576 7743 7770 ... 7464 7443 7406]]
(1.0, 1.0, 1.0)
0.0001
jgrss commented 2 years ago

Thanks for highlighting this. Currently, scale_factor is only passed to functions that use it. We don't, however, apply the 'scales' attribute upon opening. That attribute comes from rasterio/open_rasterio.

The way that scale_factor is currently used is legacy. Do you think we should apply the 'scales' attribute upon opening?

Say for a 3-band raster, scales of 1 would not trigger any scaling.

with gw.open('image.tif) as src:
    print(src.attrs['scales'])
    (1.0, 1.0, 1.0)
    # src would not have scaling applied

but if scales do not equal 1

with gw.open('image.tif) as src:
    print(src.attrs['scales'])
    (1e-4, 1e-4, 1e-4)
    # Within open(), we would do something like
    self._obj = self._obj * src.attrs['scales']
    # src would have scaling applied

The config would be an attribute override, similar to how nodata is used.

with gw.config.update(scale_factor=255.0):
    with gw.open('image.tif) as src:
        # Within open(), we would do something like
        if config['scale_factor'] is not None:
            # The 'scales' attribute would be ignored
            self._obj = self._obj * config['scale_factor']

Then, any function/method level scaling would be applied again, if requested. E.g.

with gw.config.update(scale_factor=255.0):
    with gw.open('image.tif) as src:
        # src would have 255 x applied
        # scale_factor here would be applied again, but defaulted to None
        src.gw.ndvi(scale_factor=#)
jgrss commented 2 years ago

See also rasterio's approach and rioxarray's mask_and_scale approach i.e., leave it to the user.

jgrss commented 2 years ago

@mmann1123 experiment with branch jgrss/scales_216 and let me know if it does what you would expect.

jgrss commented 2 years ago

In particular, see new open() keywords for scaling and new tests for examples.

mmann1123 commented 2 years ago

@jgrss I like the use of scale_data in open with #jgrss/scales_216. Its passing tests on my end.

Good to see that attrs['scales'] is 1 if scaling is applied. But shouldn't attrs['scales'] be updated to 0.0001 if scale_data =False?

>>> with gw.open(l8_224078_20200518, scale_data=False,scale_factor=0.0001) as src:
...     print(src)
... 
<xarray.DataArray (band: 3, y: 1860, x: 2041)>
dask.array<open_rasterio-5c7bef10720d35eff3f9baa732b582af<this-array>, shape=(3, 1860, 2041), dtype=uint16, chunksize=(3, 256, 256), chunktype=numpy.ndarray>
Coordinates:
  * band     (band) int64 1 2 3
  * x        (x) float64 7.174e+05 7.174e+05 7.174e+05 ... 7.785e+05 7.786e+05
  * y        (y) float64 -2.777e+06 -2.777e+06 ... -2.833e+06 -2.833e+06
Attributes:
    transform:           (30.0, 0.0, 717345.0, 0.0, -30.0, -2776995.0)
    crs:                 32621
    res:                 (30.0, 30.0)
    is_tiled:            1
    nodatavals:          (nan, nan, nan)
    _FillValue:          nan
    scales:              (1.0, 1.0, 1.0)
    offsets:             (0.0, 0.0, 0.0)
    AREA_OR_POINT:       Area
    filename:            /geowombat/src/geowombat/data/LC08_L1TP_224078_20200...
    resampling:          nearest
    _data_are_separate:  0
    _data_are_stacked:   0
jgrss commented 2 years ago

shouldn't attrs['scales'] be updated to 0.0001 if scale_data =False?

Ah, good call. I can add that.

jgrss commented 2 years ago

See #221

mmann1123 commented 2 years ago

Hey this didn't seem to trigger a version change which means in turn conda-forge doesn't look for an update. Why way you can trigger the v change retroactively?

On Wed, Sep 28, 2022, 6:29 PM Jordan Graesser @.***> wrote:

Closed #216 https://github.com/jgrss/geowombat/issues/216 as completed.

— Reply to this email directly, view it on GitHub https://github.com/jgrss/geowombat/issues/216#event-7481840470, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHR6VFXVAI4NIPTFX65XRLWATBFHANCNFSM6AAAAAAQVHMR64 . You are receiving this because you were mentioned.Message ID: @.***>

jgrss commented 2 years ago

Do you mean a geowombat version change? It was upgraded from v2.0.11 to v2.0.12. Do you know what triggers conda? Is it a release?

mmann1123 commented 2 years ago

Ok weird, thought I saw 2.0.12-> 2.0.12 Maybe i misread. But yeah I think it looks for an update in version number

On Wed, Sep 28, 2022 at 10:59 PM Jordan Graesser @.***> wrote:

Do you mean a geowombat version change? It was upgraded from v2.0.11 to v2.0.12. Do you know what triggers conda? Is it a release?

— Reply to this email directly, view it on GitHub https://github.com/jgrss/geowombat/issues/216#issuecomment-1261684905, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHR6VFFNSALXBNYZO7RB2LWAUAZRANCNFSM6AAAAAAQVHMR64 . You are receiving this because you were mentioned.Message ID: @.***>