Closed bbarcelollull closed 1 year ago
Hi @bbarcelollull
1- Did you tried to run the code with the tutorial data, like in the example ? If it runs, this means that your install is correct and that it's your data setup that needs to be fixed
2- In the case where the example runs, then it's hard to help without more information. Could you please post the peace of code defining features_in_ds
, features_zdim
and paste a print of the ds
dataset ?
g
I am trying to run the tutorial example with the same data you provide.
Here the code that I run:
from pyxpcm.models import pcm
import numpy as np
import pyxpcm
z = np.arange(0.,-1000,-10.)
pcm_features = {'temperature': z, 'salinity':z}
m = pcm(K=8, features=pcm_features)
print(m)
ds = pyxpcm.tutorial.open_dataset('argo').load()
print(ds)
features_in_ds = {'temperature': 'TEMP', 'salinity': 'PSAL'}
features_zdim='DEPTH'
m.fit(ds, features=features_in_ds, dim=features_zdim)
And here what I get:
/Users/bbarcelo/HOME_SCIENCE/Scripts/2023_glider_clustering/PCM_run_on_environment/venv_pyXpcm/lib/python3.7/site-packages/pyxpcm/plot.py:30: UserWarning: pyXpcm requires matplotlib installed for plotting functionality
warnings.warn("pyXpcm requires matplotlib installed for plotting functionality")
/Users/bbarcelo/HOME_SCIENCE/Scripts/2023_glider_clustering/PCM_run_on_environment/venv_pyXpcm/lib/python3.7/site-packages/pyxpcm/plot.py:38: UserWarning: pyXpcm requires cartopy installed for full plotting functionality
warnings.warn("pyXpcm requires cartopy installed for full plotting functionality")
/Users/bbarcelo/HOME_SCIENCE/Scripts/2023_glider_clustering/PCM_run_on_environment/venv_pyXpcm/lib/python3.7/site-packages/matplotlib/__init__.py:886: MatplotlibDeprecationWarning:
examples.directory is deprecated; in the future, examples will be found relative to the 'datapath' directory.
"found relative to the 'datapath' directory.".format(key))
<pcm 'gmm' (K: 8, F: 2)>
Number of class: 8
Number of feature: 2
Feature names: odict_keys(['temperature', 'salinity'])
Fitted: False
Feature: 'temperature'
Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
Scaler: 'normal', <class 'sklearn.preprocessing._data.StandardScaler'>
Reducer: True, <class 'sklearn.decomposition._pca.PCA'>
Feature: 'salinity'
Interpoler: <class 'pyxpcm.utils.Vertical_Interpolator'>
Scaler: 'normal', <class 'sklearn.preprocessing._data.StandardScaler'>
Reducer: True, <class 'sklearn.decomposition._pca.PCA'>
Classifier: 'gmm', <class 'sklearn.mixture._gaussian_mixture.GaussianMixture'>
<xarray.Dataset>
Dimensions: (DEPTH: 282, N_PROF: 7560)
Coordinates:
* DEPTH (DEPTH) float32 0.0 -5.0 -10.0 -15.0 ... -1395.0 -1400.0 -1405.0
Dimensions without coordinates: N_PROF
Data variables:
LATITUDE (N_PROF) float32 ...
LONGITUDE (N_PROF) float32 ...
TIME (N_PROF) datetime64[ns] ...
DBINDEX (N_PROF) float64 ...
TEMP (N_PROF, DEPTH) float32 ...
PSAL (N_PROF, DEPTH) float32 ...
SIG0 (N_PROF, DEPTH) float32 ...
BRV2 (N_PROF, DEPTH) float32 ...
Attributes:
Sample test prepared by: G. Maze
Institution: Ifremer/LOPS
Data source DOI: 10.17882/42182
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-642f557d4184> in <module>
17 features_zdim='DEPTH'
18
---> 19 m.fit(ds, features=features_in_ds, dim=features_zdim)
~/HOME_SCIENCE/Scripts/2023_glider_clustering/PCM_run_on_environment/venv_pyXpcm/lib/python3.7/site-packages/pyxpcm/models.py in fit(self, ds, features, dim)
859 with self._context('fit', self._context_args) :
860 # PRE-PROCESSING:
--> 861 X, sampling_dims = self.preprocessing(ds, features=features, dim=dim, action='fit')
862
863 # CLASSIFICATION-MODEL TRAINING:
~/HOME_SCIENCE/Scripts/2023_glider_clustering/PCM_run_on_environment/venv_pyXpcm/lib/python3.7/site-packages/pyxpcm/models.py in preprocessing(self, ds, features, dim, action, mask)
785 dim=dim,
786 feature_name=feature_in_pcm,
--> 787 action=action)
788 xlabel = ["%s_%i"%(feature_in_pcm, i) for i in range(0, x.shape[1])]
789 if self._debug:
~/HOME_SCIENCE/Scripts/2023_glider_clustering/PCM_run_on_environment/venv_pyXpcm/lib/python3.7/site-packages/pyxpcm/models.py in preprocessing_this(self, da, dim, feature_name, action)
637 # MAKE THE ND-ARRAY A 2D-ARRAY
638 with self._context(this_context + '.1-ravel', self._context_args):
--> 639 X, z, sampling_dims = self.ravel(da, dim=dim, feature_name=feature_name)
640 if self._debug:
641 print("\t", "X RAVELED with success", str(LogDataType(X)))
~/HOME_SCIENCE/Scripts/2023_glider_clustering/PCM_run_on_environment/venv_pyXpcm/lib/python3.7/site-packages/pyxpcm/models.py in ravel(self, da, dim, feature_name)
358 z = da[dim].values
359
--> 360 X = X.chunk(chunks={'sampling': self._props['chunk_size']})
361 return X, z, sampling_dims
362
~/HOME_SCIENCE/Scripts/2023_glider_clustering/PCM_run_on_environment/venv_pyXpcm/lib/python3.7/site-packages/xarray/core/dataarray.py in chunk(self, chunks, name_prefix, token, lock)
812
813 ds = self._to_temp_dataset().chunk(chunks, name_prefix=name_prefix,
--> 814 token=token, lock=lock)
815 return self._from_temp_dataset(ds)
816
~/HOME_SCIENCE/Scripts/2023_glider_clustering/PCM_run_on_environment/venv_pyXpcm/lib/python3.7/site-packages/xarray/core/dataset.py in chunk(self, chunks, name_prefix, token, lock)
1484
1485 variables = OrderedDict([(k, maybe_chunk(k, v, chunks))
-> 1486 for k, v in self.variables.items()])
1487 return self._replace(variables)
1488
~/HOME_SCIENCE/Scripts/2023_glider_clustering/PCM_run_on_environment/venv_pyXpcm/lib/python3.7/site-packages/xarray/core/dataset.py in <listcomp>(.0)
1484
1485 variables = OrderedDict([(k, maybe_chunk(k, v, chunks))
-> 1486 for k, v in self.variables.items()])
1487 return self._replace(variables)
1488
~/HOME_SCIENCE/Scripts/2023_glider_clustering/PCM_run_on_environment/venv_pyXpcm/lib/python3.7/site-packages/xarray/core/dataset.py in maybe_chunk(name, var, chunks)
1479 token2 = tokenize(name, token if token else var._data)
1480 name2 = '%s%s-%s' % (name_prefix, name, token2)
-> 1481 return var.chunk(chunks, name=name2, lock=lock)
1482 else:
1483 return var
~/HOME_SCIENCE/Scripts/2023_glider_clustering/PCM_run_on_environment/venv_pyXpcm/lib/python3.7/site-packages/xarray/core/variable.py in chunk(self, chunks, name, lock)
893 data = indexing.ImplicitToExplicitIndexingAdapter(
894 data, indexing.OuterIndexer)
--> 895 data = da.from_array(data, chunks, name=name, lock=lock)
896
897 return type(self)(self.dims, data, self._attrs, self._encoding,
~/HOME_SCIENCE/Scripts/2023_glider_clustering/PCM_run_on_environment/venv_pyXpcm/lib/python3.7/site-packages/dask/array/core.py in from_array(x, chunks, name, lock, asarray, fancy, getitem)
1913 >>> a = da.from_array(x, chunks=(1000, 1000), lock=True) # doctest: +SKIP
1914 """
-> 1915 chunks = normalize_chunks(chunks, x.shape)
1916 if len(chunks) != len(x.shape):
1917 raise ValueError("Input array has %d dimensions but the supplied "
~/HOME_SCIENCE/Scripts/2023_glider_clustering/PCM_run_on_environment/venv_pyXpcm/lib/python3.7/site-packages/dask/array/core.py in normalize_chunks(chunks, shape)
1862 chunks = sum((blockdims_from_blockshape((s,), (c,))
1863 if not isinstance(c, (tuple, list)) else (c,)
-> 1864 for s, c in zip(shape, chunks)), ())
1865 for c in chunks:
1866 if not c:
~/HOME_SCIENCE/Scripts/2023_glider_clustering/PCM_run_on_environment/venv_pyXpcm/lib/python3.7/site-packages/dask/array/core.py in <genexpr>(.0)
1862 chunks = sum((blockdims_from_blockshape((s,), (c,))
1863 if not isinstance(c, (tuple, list)) else (c,)
-> 1864 for s, c in zip(shape, chunks)), ())
1865 for c in chunks:
1866 if not c:
~/HOME_SCIENCE/Scripts/2023_glider_clustering/PCM_run_on_environment/venv_pyXpcm/lib/python3.7/site-packages/dask/array/core.py in blockdims_from_blockshape(shape, chunks)
919 if shape is None:
920 raise TypeError("Must supply shape= keyword argument")
--> 921 if np.isnan(sum(shape)) or np.isnan(sum(chunks)):
922 raise ValueError("Array chunk sizes are unknown. shape: %s, chunks: %s"
923 % (shape, chunks))
TypeError: unsupported operand type(s) for +: 'int' and 'str'
so it's the same error, hence this is surely due to your Python environment a miniman environment like this one can be used
Thanks @gmaze! Issue solved!
I downloaded this file: https://github.com/euroargodev/boundary_currents_pcm/blob/main/environment.yml
And I created the environment from the environment.yml file (changing the name of the environment that is on the first line):
conda env create -f environment.yml
Now I can run my codes after activating the environment (named env_pyxpcm ) on the terminal:
conda activate env_pyxpcm
Then to close the environment:
conda deactivate
Or I can open the Anaconda Navigador, select the environment in which I want to run my codes (env_pyxpcm) and open and work with Spyder.
glad you solved this !
Hi @gmaze,
I want to use the pyXpcm tool to cluster glider profiles in the Mediterranean Sea. I have installed the pyXpcm module on my computer (in a virtual environment) with the required dependencies (although with Python 3.7):
However, when trying to run this example: https://pyxpcm.readthedocs.io/en/latest/example.html
I cannot run this line:
m.fit(ds, features=features_in_ds, dim=features_zdim)
Because I have the following error:
Do you know how can I solve it?
Thanks! Bàrbara