stevenpawley / Pyspatialml

Machine learning modelling for spatial data
GNU General Public License v3.0
145 stars 29 forks source link

ValueError when sample function is used #45

Closed behzad89 closed 2 years ago

behzad89 commented 2 years ago

Hello!

I am following the provided doc for sampling as follows, but facing an error. Could please let me know why it happens?

predictors = [nc.band1, nc.band2, nc.band3, nc.band4, nc.band5, nc.band7] stack = Raster(predictors)

df_rand = stack.sample(size=1000, random_state=1)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [45], in <cell line: 2>()
      1 # extract training data using a random sample
----> 2 df_rand = stack.sample(size=1000, random_state=1)
      3 df_rand.plot()

File ~/training/anaconda3/envs/py38/lib/python3.8/site-packages/pyspatialml/raster.py:2007, in Raster.sample(self, size, strata, return_array, random_state)
   2003 xy = np.transpose(
   2004     rasterio.transform.xy(self.transform, rows, cols))
   2006 # sample at random point locations
-> 2007 samples = self.extract_xy(xy, return_array=True)
   2009 # append only non-masked data to each row of X_random
   2010 samples = samples.astype("float32").filled(np.nan)

File ~/training/anaconda3/envs/py38/lib/python3.8/site-packages/pyspatialml/raster.py:2112, in Raster.extract_xy(self, xys, return_array, progress)
   2105 for i, (layer, pbar) in enumerate(
   2106         zip(self.iloc,
   2107             tqdm(self.iloc, total=self.count, disable=not progress))
   2108 ):
   2109     sampler = sample_gen(
   2110         dataset=layer.ds, xy=xys, indexes=layer.bidx, masked=True
   2111     )
-> 2112     v = np.ma.asarray([i for i in sampler])
   2113     X[:, i] = v.flatten()
   2115 # return as geopandas array as default (or numpy arrays)

File ~/training/anaconda3/envs/py38/lib/python3.8/site-packages/pyspatialml/raster.py:2112, in <listcomp>(.0)
   2105 for i, (layer, pbar) in enumerate(
   2106         zip(self.iloc,
   2107             tqdm(self.iloc, total=self.count, disable=not progress))
   2108 ):
   2109     sampler = sample_gen(
   2110         dataset=layer.ds, xy=xys, indexes=layer.bidx, masked=True
   2111     )
-> 2112     v = np.ma.asarray([i for i in sampler])
   2113     X[:, i] = v.flatten()
   2115 # return as geopandas array as default (or numpy arrays)

File ~/training/anaconda3/envs/py38/lib/python3.8/site-packages/rasterio/sample.py:61, in sample_gen(dataset, xy, indexes, masked)
     58     nodata = np.ma.array(nodata, mask=mask)
     60 for pts in _grouper(xy, 256):
---> 61     pts = zip(*filter(None, pts))
     63     for row_off, col_off in zip(*rowcol(dt, *pts)):
     64         if row_off < 0 or col_off < 0 or row_off >= height or col_off >= width:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
stevenpawley commented 2 years ago

Hello, this is working now. It was already fixed in the GitHub version but I've uploaded 0.21 to pypl