jgrss / geowombat

GeoWombat: Utilities for geospatial data
https://geowombat.readthedocs.io
MIT License
184 stars 10 forks source link

Writing DataArrays to files keep throwing TypeError #184

Closed willieseun closed 2 years ago

willieseun commented 2 years ago

Hello, I tried writing data arrays using the to_raster function. But, I am getting an error 'TypeError: tqdm.std.tqdm() argument after ** must be a mapping, not NoneType'. I don't know what I am doing wrong so I appreciate if you can look into it.

mmann1123 commented 2 years ago

We would need to see a full example including link to data and the code to:

willieseun commented 2 years ago

`import geowombat as gw from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons from geowombat.ml import fit, predict, fit_predict import geopandas as gpd from sklearn.pipeline import Pipeline from sklearn.preprocessing import LabelEncoder, StandardScaler from sklearn.decomposition import PCA from sklearn.naive_bayes import GaussianNB from sklearn.cluster import KMeans import matplotlib.pyplot as plt

cl = Pipeline([ ('scaler', StandardScaler()), ('pca', PCA()), ('clf', KMeans(n_clusters=12, random_state=0))])

fig, ax = plt.subplots(figsize=(5,5))

Fit_predict unsupervised classifier

with gw.open(['Sen_clip2.tif', 'Sen_clip3.tif', 'Sen_clip4.tif'], stack_dim='band') as src: y= fit_predict(src, cl) y.plot(robust=True, ax=ax)

gw.to_vrt(y, 'filo.vrt')

src.gw.to_raster('Unsup.tif')

plt.tight_layout(pad=1) plt.show()`

This is the code. Any .tif file I use doesn't work.

mmann1123 commented 2 years ago

Ok yeah its a bit quirky at the moment, but you need to select the time period (its set up for multiperiod prediction at the moment). The following example should help. I will update the docs.

#%%

import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
from geowombat.ml import fit, predict, fit_predict
import geopandas as gpd
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.decomposition import PCA
from sklearn.naive_bayes import GaussianNB
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

cl = Pipeline([ ('scaler', StandardScaler()),
('pca', PCA()),
('clf', KMeans(n_clusters=12, random_state=0))])

# %%
from sklearn.cluster import KMeans
cl = Pipeline([ ('clf', KMeans(n_clusters=6, random_state=0))])

fig, ax = plt.subplots(dpi=200,figsize=(5,5))

# Fit_predict unsupervised classifier
with gw.config.update(ref_res=150):
    with gw.open(l8_224078_20200518) as src:
        y= fit_predict(src, cl)
        y.plot(robust=True, ax=ax)
plt.tight_layout(pad=1)

y.sel(time='t1').gw.to_raster('Unsup.tif')
willieseun commented 2 years ago

It is still not working

mmann1123 commented 2 years ago

So the example I just sent you isn't working? or with your data?

willieseun commented 2 years ago

Both keep giving that error when writing out.

willieseun commented 2 years ago

I think I was able to resolve it by doing this instead.

y.sel(time='t1').rio.to_raster('Unsup1.tif')

Thanks for your time!

mmann1123 commented 2 years ago

Ok good to hear. We are testing .save() soon. Hopefully it will resolve this.

willieseun commented 2 years ago

Yep. I hope so too

pjhav commented 2 years ago

Hi, I am getting the same error when trying to save a raster stack:

===== from geowombat.data import l8_224077_20200518_B2, l8_224077_20200518_B3 with gw.open([l8_224077_20200518_B2, l8_224077_20200518_B3]) as src:

# Xarray drops attributes
attrs = src.attrs.copy()

# Apply operations on the DataArray
src = src * 10.0

src.attrs = attrs

# Write the data to a GeoTiff
src.gw.to_raster('output.tif',
                 verbose=1,
                 n_workers=4,    # number of process workers sent to ``concurrent.futures``
                 n_threads=2,    # number of thread workers sent to ``dask.compute``
                 n_chunks=200,
                 overwrite=True)

====== returns:


TypeError Traceback (most recent call last) Input In [54], in <cell line: 2>() 10 src.attrs = attrs 12 # Write the data to a GeoTiff ---> 13 src.gw.to_raster('output.tif', 14 verbose=1, 15 n_workers=4, # number of process workers sent to concurrent.futures 16 n_threads=2, # number of thread workers sent to dask.compute 17 n_chunks=200, 18 overwrite=True)

File ~\Miniconda3\envs\gwenv\lib\site-packages\geowombat\core\geoxarray.py:786, in GeoWombatAccessor.to_raster(self, filename, readxsize, readysize, separate, use_dask_store, out_block_type, keep_blocks, verbose, overwrite, gdal_cache, scheduler, n_jobs, n_workers, n_threads, n_chunks, overviews, resampling, use_client, address, total_memory, driver, nodata, blockxsize, blockysize, tags, kwargs) 783 if 'dtype' not in kwargs: 784 kwargs['dtype'] = self._obj.data.dtype.name --> 786 to_raster(self._obj, 787 filename, 788 readxsize=readxsize, 789 readysize=readysize, 790 use_dask_store=use_dask_store, 791 separate=separate, 792 out_block_type=out_block_type, 793 keep_blocks=keep_blocks, 794 verbose=verbose, 795 overwrite=overwrite, 796 gdal_cache=gdal_cache, 797 scheduler=scheduler, 798 n_jobs=n_jobs, 799 n_workers=n_workers, 800 n_threads=n_threads, 801 n_chunks=n_chunks, 802 overviews=overviews, 803 resampling=resampling, 804 use_client=use_client, 805 address=address, 806 total_memory=total_memory, 807 tags=tags, 808 kwargs)

File ~\Miniconda3\envs\gwenv\lib\site-packages\geowombat\core\io.py:902, in to_raster(data, filename, readxsize, readysize, use_dask_store, separate, out_block_type, keep_blocks, verbose, overwrite, gdal_cache, scheduler, n_jobs, n_workers, n_threads, n_chunks, use_client, address, total_memory, processes, padding, tags, tqdm_kwargs, kwargs) 899 pass 901 else: --> 902 for __ in tqdm( 903 executor.map(_write_xarray, data_gen), 904 total=n_windows_slice, 905 tqdm_kwargs 906 ): 907 pass 909 # if overviews: 910 # 911 # if not isinstance(overviews, list): (...) 926 927 else:

TypeError: type object argument after ** must be a mapping, not NoneType

I try to add the sel call for bands, but this does not work. Although maybe I am setting it up incorrectly.

Thanks for any advice.

willieseun commented 2 years ago

Hello, you can use src.sel(time='t1').rio.to_raster('Unsup1.tif') after importing rioxarray as suggested above.

pjhav commented 2 years ago

Hi, thanks for responding.

I forgot to mention, I tried this, but there is no t1 data or time series, as they are just different bands of the same multispectral image. So I get a key error about t1. I tried to sel the bands I wanted to, but that also did not work.

When I try import geowombat as gw from geowombat.data import l8_224077_20200518_B2, l8_224077_20200518_B3 with gw.open([l8_224077_20200518_B2, l8_224077_20200518_B3]) as src:

# Xarray drops attributes
attrs = src.attrs.copy()

# Apply operations on the DataArray
src = src * 10.0

src.attrs = attrs

# Write the data to a GeoTiff
src.sel(time='t1').gw.to_raster('output.tif',
                 verbose=1,
                 n_workers=4,    # number of process workers sent to ``concurrent.futures``
                 n_threads=2,    # number of thread workers sent to ``dask.compute``
                 n_chunks=200,
                 overwrite=True)  `

I get a key error:


KeyError Traceback (most recent call last) File ~\Miniconda3\envs\gwenv\lib\site-packages\pandas\core\indexes\base.py:3621, in Index.get_loc(self, key, method, tolerance) 3620 try: -> 3621 return self._engine.get_loc(casted_key) 3622 except KeyError as err:

File ~\Miniconda3\envs\gwenv\lib\site-packages\pandas_libs\index.pyx:136, in pandas._libs.index.IndexEngine.get_loc()

File ~\Miniconda3\envs\gwenv\lib\site-packages\pandas_libs\index.pyx:144, in pandas._libs.index.IndexEngine.get_loc()

File pandas_libs\index_class_helper.pxi:41, in pandas._libs.index.Int64Engine._check_type()

KeyError: 't1'

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last) Input In [73], in <cell line: 3>() 11 src.attrs = attrs 13 # Write the data to a GeoTiff ---> 14 src.sel(time='t1').gw.to_raster('output.tif', 15 verbose=1, 16 n_workers=4, # number of process workers sent to concurrent.futures 17 n_threads=2, # number of thread workers sent to dask.compute 18 n_chunks=200, 19 overwrite=True)

File ~\Miniconda3\envs\gwenv\lib\site-packages\xarray\core\dataarray.py:1329, in DataArray.sel(self, indexers, method, tolerance, drop, indexers_kwargs) 1220 def sel( 1221 self, 1222 indexers: Mapping[Any, Any] = None, (...) 1226 indexers_kwargs: Any, 1227 ) -> DataArray: 1228 """Return a new DataArray whose data is given by selecting index 1229 labels along the specified dimension(s). 1230 (...) 1327 Dimensions without coordinates: points 1328 """ -> 1329 ds = self._to_temp_dataset().sel( 1330 indexers=indexers, 1331 drop=drop, 1332 method=method, 1333 tolerance=tolerance, 1334 **indexers_kwargs, 1335 ) 1336 return self._from_temp_dataset(ds)

File ~\Miniconda3\envs\gwenv\lib\site-packages\xarray\core\dataset.py:2501, in Dataset.sel(self, indexers, method, tolerance, drop, **indexers_kwargs) 2440 """Returns a new dataset with each array indexed by tick labels 2441 along the specified dimension(s). 2442 (...) 2498 DataArray.sel 2499 """ 2500 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel") -> 2501 pos_indexers, new_indexes = remap_label_indexers( 2502 self, indexers=indexers, method=method, tolerance=tolerance 2503 ) 2504 # TODO: benbovy - flexible indexes: also use variables returned by Index.query 2505 # (temporary dirty fix). 2506 new_indexes = {k: v[0] for k, v in new_indexes.items()}

File ~\Miniconda3\envs\gwenv\lib\site-packages\xarray\core\coordinates.py:421, in remap_label_indexers(obj, indexers, method, tolerance, **indexers_kwargs) 414 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "remap_label_indexers") 416 v_indexers = { 417 k: v.variable.data if isinstance(v, DataArray) else v 418 for k, v in indexers.items() 419 } --> 421 pos_indexers, new_indexes = indexing.remap_label_indexers( 422 obj, v_indexers, method=method, tolerance=tolerance 423 ) 424 # attach indexer's coordinate to pos_indexers 425 for k, v in indexers.items():

File ~\Miniconda3\envs\gwenv\lib\site-packages\xarray\core\indexing.py:121, in remap_label_indexers(data_obj, indexers, method, tolerance) 119 for dim, index in indexes.items(): 120 labels = grouped_indexers[dim] --> 121 idxr, new_idx = index.query(labels, method=method, tolerance=tolerance) 122 pos_indexers[dim] = idxr 123 if new_idx is not None:

File ~\Miniconda3\envs\gwenv\lib\site-packages\xarray\core\indexes.py:241, in PandasIndex.query(self, labels, method, tolerance) 237 raise KeyError( 238 f"not all values found in index {coord_name!r}" 239 ) 240 else: --> 241 indexer = self.index.get_loc(label_value) 242 elif label.dtype.kind == "b": 243 indexer = label

File ~\Miniconda3\envs\gwenv\lib\site-packages\pandas\core\indexes\base.py:3623, in Index.get_loc(self, key, method, tolerance) 3621 return self._engine.get_loc(casted_key) 3622 except KeyError as err: -> 3623 raise KeyError(key) from err 3624 except TypeError: 3625 # If we have a listlike key, _check_indexing_error will raise 3626 # InvalidIndexError. Otherwise we fall through and re-raise 3627 # the TypeError. 3628 self._check_indexing_error(key)

KeyError: 't1

willieseun commented 2 years ago

Try this with gw.open([l8_224077_20200518_B2, l8_224077_20200518_B3], stack_dim='band') as src:

pjhav commented 2 years ago
import geowombat as gw
from geowombat.data import l8_224077_20200518_B2, l8_224077_20200518_B3
with gw.open([l8_224077_20200518_B2, l8_224077_20200518_B3], stack_dim='band') as src:

    # Xarray drops attributes
    attrs = src.attrs.copy()

    # Apply operations on the DataArray
    src = src * 10.0

    src.attrs = attrs

    # Write the data to a GeoTiff
    src.gw.to_raster('output.tif',
                     verbose=1,
                     n_workers=4,    # number of process workers sent to ``concurrent.futures``
                     n_threads=2,    # number of thread workers sent to ``dask.compute``
                     n_chunks=200,
                     overwrite=True) 

Output: 22:04:37:INFO:806:io.to_raster: Creating the file ...

22:04:37:INFO:813:io.to_raster: Writing data to file ...

22:04:37:INFO:837:io.to_raster: Windows 1--48 of 48 ...

TypeError Traceback (most recent call last) Input In [75], in <cell line: 3>() 11 src.attrs = attrs 13 # Write the data to a GeoTiff ---> 14 src.gw.to_raster('output.tif', 15 verbose=1, 16 n_workers=4, # number of process workers sent to concurrent.futures 17 n_threads=2, # number of thread workers sent to dask.compute 18 n_chunks=200, 19 overwrite=True)

File ~\Miniconda3\envs\gwenv\lib\site-packages\geowombat\core\geoxarray.py:786, in GeoWombatAccessor.to_raster(self, filename, readxsize, readysize, separate, use_dask_store, out_block_type, keep_blocks, verbose, overwrite, gdal_cache, scheduler, n_jobs, n_workers, n_threads, n_chunks, overviews, resampling, use_client, address, total_memory, driver, nodata, blockxsize, blockysize, tags, kwargs) 783 if 'dtype' not in kwargs: 784 kwargs['dtype'] = self._obj.data.dtype.name --> 786 to_raster(self._obj, 787 filename, 788 readxsize=readxsize, 789 readysize=readysize, 790 use_dask_store=use_dask_store, 791 separate=separate, 792 out_block_type=out_block_type, 793 keep_blocks=keep_blocks, 794 verbose=verbose, 795 overwrite=overwrite, 796 gdal_cache=gdal_cache, 797 scheduler=scheduler, 798 n_jobs=n_jobs, 799 n_workers=n_workers, 800 n_threads=n_threads, 801 n_chunks=n_chunks, 802 overviews=overviews, 803 resampling=resampling, 804 use_client=use_client, 805 address=address, 806 total_memory=total_memory, 807 tags=tags, 808 kwargs)

File ~\Miniconda3\envs\gwenv\lib\site-packages\geowombat\core\io.py:902, in to_raster(data, filename, readxsize, readysize, use_dask_store, separate, out_block_type, keep_blocks, verbose, overwrite, gdal_cache, scheduler, n_jobs, n_workers, n_threads, n_chunks, use_client, address, total_memory, processes, padding, tags, tqdm_kwargs, kwargs) 899 pass 901 else: --> 902 for __ in tqdm( 903 executor.map(_write_xarray, data_gen), 904 total=n_windows_slice, 905 tqdm_kwargs 906 ): 907 pass 909 # if overviews: 910 # 911 # if not isinstance(overviews, list): (...) 926 927 else:

TypeError: type object argument after ** must be a mapping, not NoneType

============================================ If I change to sel(band=...)

import geowombat as gw
from geowombat.data import l8_224077_20200518_B2, l8_224077_20200518_B3
with gw.open([l8_224077_20200518_B2, l8_224077_20200518_B3], stack_dim='band',band_names=["B2", "B3"]) as src:
    print(src)
    # Xarray drops attributes
    attrs = src.attrs.copy()

    # Apply operations on the DataArray
    src = src * 10.0

    src.attrs = attrs

    # Write the data to a GeoTiff
    src.sel(band=["B2", "B3"]).gw.to_raster('output.tif',
                     verbose=1,
                     n_workers=4,    # number of process workers sent to ``concurrent.futures``
                     n_threads=2,    # number of thread workers sent to ``dask.compute``
                     n_chunks=200,
                     overwrite=True) 

Output: 22:09:01:INFO:806:io.to_raster: Creating the file ...

22:09:01:INFO:813:io.to_raster: Writing data to file ...

22:09:01:INFO:837:io.to_raster: Windows 1--48 of 48 ... <xarray.DataArray (band: 2, y: 1515, x: 2006)> dask.array<concatenate, shape=(2, 1515, 2006), dtype=uint16, chunksize=(1, 256, 256), chunktype=numpy.ndarray> Coordinates:

File ~\Miniconda3\envs\gwenv\lib\site-packages\geowombat\core\geoxarray.py:786, in GeoWombatAccessor.to_raster(self, filename, readxsize, readysize, separate, use_dask_store, out_block_type, keep_blocks, verbose, overwrite, gdal_cache, scheduler, n_jobs, n_workers, n_threads, n_chunks, overviews, resampling, use_client, address, total_memory, driver, nodata, blockxsize, blockysize, tags, kwargs) 783 if 'dtype' not in kwargs: 784 kwargs['dtype'] = self._obj.data.dtype.name --> 786 to_raster(self._obj, 787 filename, 788 readxsize=readxsize, 789 readysize=readysize, 790 use_dask_store=use_dask_store, 791 separate=separate, 792 out_block_type=out_block_type, 793 keep_blocks=keep_blocks, 794 verbose=verbose, 795 overwrite=overwrite, 796 gdal_cache=gdal_cache, 797 scheduler=scheduler, 798 n_jobs=n_jobs, 799 n_workers=n_workers, 800 n_threads=n_threads, 801 n_chunks=n_chunks, 802 overviews=overviews, 803 resampling=resampling, 804 use_client=use_client, 805 address=address, 806 total_memory=total_memory, 807 tags=tags, 808 kwargs)

File ~\Miniconda3\envs\gwenv\lib\site-packages\geowombat\core\io.py:902, in to_raster(data, filename, readxsize, readysize, use_dask_store, separate, out_block_type, keep_blocks, verbose, overwrite, gdal_cache, scheduler, n_jobs, n_workers, n_threads, n_chunks, use_client, address, total_memory, processes, padding, tags, tqdm_kwargs, kwargs) 899 pass 901 else: --> 902 for __ in tqdm( 903 executor.map(_write_xarray, data_gen), 904 total=n_windows_slice, 905 tqdm_kwargs 906 ): 907 pass 909 # if overviews: 910 # 911 # if not isinstance(overviews, list): (...) 926 927 else:

TypeError: type object argument after ** must be a mapping, not NoneType

mmann1123 commented 2 years ago

@jgrss Just wondering if we can push your save function out early. Looks like there are lots of people ready to test it.

willieseun commented 2 years ago

Just do this and It will work. `with gw.open([l8_224077_20200518_B2, l8_224077_20200518_B3], stack_dim='band',band_names=["B2", "B3"]) as src: print(src)

Xarray drops attributes

attrs = src.attrs.copy()

# Apply operations on the DataArray
src = src * 10.0

src.attrs = attrs

# Write the data
src.sel(time='t1').rio.to_raster('output.tif')`

Don't forget to import rioxarray. If module not found error, use pip install rioxarray in command line.

jgrss commented 2 years ago

I am out until next week and then can take a look at this thread in more detail.

@mmann1123 The jgrss/store branch is heading toward v2.0.0 and could resolve I/O issues with the addition of save(). It also has the needed changes to address Xarray ufuncs. Other incompatible backward changes include making all methods that currently return numpy arrays return delayed objects instead.

pjhav commented 2 years ago

Using the following code worked, thanks @willieseun

import geowombat as gw
import rioxarray as rio
from geowombat.data import l8_224077_20200518_B2, l8_224077_20200518_B3
with gw.open([l8_224077_20200518_B2, l8_224077_20200518_B3], stack_dim='band',band_names=["B2", "B3"]) as src:
    # Xarray drops attributes
    attrs = src.attrs.copy()

    # Apply operations on the DataArray
    src = src * 10.0

    src.attrs = attrs
    src.attrs.update({'scales':(1,1)})
    src.attrs.update({'offsets':(0,0)})
    # Write the data to a GeoTiff
    src.sel(band=["B2", "B3"]).rio.to_raster('output.tif')
willieseun commented 2 years ago

You're welcome

jgrss commented 2 years ago

@mmann1123 @pjhav @willieseun I reopened this because the above solution was a temporary fix using another library. See PR #187 for fixes. With the changes in #187, both of the following snippets work for me.

import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
from geowombat.ml import fit_predict

from sklearn.pipeline import Pipeline
from sklearn.cluster import KMeans

cl = Pipeline(
    [
        ('clf', KMeans(n_clusters=6, random_state=0))
    ]
)

# Fit_predict unsupervised classifier
with gw.config.update(ref_res=150):
    with gw.open(l8_224078_20200518) as src:
        y = fit_predict(src, cl)

y.sel(time='t1').gw.to_raster('Unsup.tif')
# GeoWombat expects a 3d (time, y, x | band, y, x) or 2d (y, x) array to write to file, so the following also works
# y.squeeze().gw.to_raster('Unsup.tif')
# Note that if the 'time' dimension is > 1, neither of the above would work because of the coordinate order. Instead, do
# y.transpose('time', 'y', 'x').gw.to_raster('Unsup.tif')

@mmann1123 in the above snippet, the output of fit_predict() is shaped (y, x, time), which is why a sel() is needed. Probably worth a PR to ensure the output data are transposed correctly.

import geowombat as gw
from geowombat.data import l8_224077_20200518_B2, l8_224077_20200518_B3

with gw.open(
    [l8_224077_20200518_B2, l8_224077_20200518_B3], 
    stack_dim='band',
    band_names=["B2", "B3"]
) as src:
    # Xarray drops attributes
    attrs = src.attrs.copy()

    # Apply operations on the DataArray
    src = src * 10.0

    src.attrs = attrs
    src.attrs.update({'scales':(1,1)})
    src.attrs.update({'offsets':(0,0)})
    # Write the data to a GeoTiff
    src.gw.to_raster('output.tif')
willieseun commented 2 years ago

Still not working. Note: I reinstalled the library

jgrss commented 2 years ago

@willieseun Did you install the jgrss/to_raster branch and it still didn't work for you?

jgrss commented 2 years ago

Just want to be clear that PR #187 has the fix but it has not yet been merged. It needs approval by @mmann1123.

willieseun commented 2 years ago

Ok. I am using pip with git.

willieseun commented 2 years ago

I just installed the library all over again. I guess it didn't work since it hasn't been merged.

jgrss commented 2 years ago

Thanks for clarifying. If you want to test the fixes before they are merged into main.

git clone https://github.com/jgrss/geowombat.git
cd geowombat/
git fetch origin jgrss/to_raster:jgrss/to_raster
git checkout jgrss/to_raster
pip install .

Otherwise, I will ping you as soon as it is merged and there is a new release.

jgrss commented 2 years ago

v1.11.4 has the fix

willieseun commented 2 years ago

Alright

willieseun commented 2 years ago

I can confirm it works now. Thanks for your help.

pjhav commented 2 years ago

Sorry for the delay, I have been on holidays. Everything is working now, thank you!

For future reference, I did have to sel(band=["B2","B3"]) to make it work., which I don't think was in the docs.