Open jones-michael-s opened 2 years ago
Hi Mike, could you share (here or via email) a small piece of data that we could use to reproduce? If it's sensitive data you can email me a link to japsai at gmail.
Hi Jasper,
I sent links to your gmail.
Regards, Mike
Hello,
I was able to find a workaround that fixes the crash (pass a list of nifti files and extract the voxel data in the Python script rather than reading in a ginormous mat file of the voxel data extracted in Matlab).
However I'm still getting a runtime warning from _evalfixed in the call to evaluate_models_searchlight:
/usr/local/lib/python3.7/site-packages/rsatoolbox/inference/evaluate.py:255: RuntimeWarning: Degrees of freedom <= 0 for slice
variances = np.cov(evaluations[0], ddof=1) \
/usr/local/lib/python3.7/site-packages/numpy/lib/function_base.py:2542: RuntimeWarning: divide by zero encountered in true_divide
c *= np.true_divide(1, fact)
/usr/local/lib/python3.7/site-packages/numpy/lib/function_base.py:2542: RuntimeWarning: invalid value encountered in multiply
c *= np.true_divide(1, fact)
I suspect this is because I'm testing a single subject, so N-1 used in the covariance calculation is 0, ergo divide by zero.
Would someone care to confirm this? It's probably obvious to experienced users, but being new to the toolbox confirmation would give us some peace of mind moving forward.
Mike
Hi @jones-michael-s, I just built in a check for the last set of Warnings you encounter in #256 . If this fixes your issue here I think this would close this issue for now.
For the issues with the data loaded from MATLAB it looks a bit like it might be due to some importing problems from there rather than a problem with the toolbox.
Could you confirm these two statements, i.e. that we don't need to do anything about the earlier import problems at the moment and that the fix I included removes the warnings you are getting from eval_fixed?
Hi @HeikoSchuett
1) the workaround I mention in my previous comment seems to have fixed (or at least avoids) the data importing problem. No further action is needed on this.
2) the change included in your latest PR indeed fixes the runtime warnings _nan_mean
was generating (thanks!). However, eval_fixed
is still giving the warning I describe. Perhaps a tweak could be made to eval_fixed
(line 255) so that np.cov
isn't called if evaluations contains a single element. The variable variance could simply be set to nan or zero (I'm not sure which is appropriate in this context).
As I mention in my previous comment, I'm testing searchlight on a single subject using a single model. Perhaps that is a dumb thing to attempt? Is the warning to be expected?
Hi @jones-michael-s, I had opened another pull request already, #256 , which should address the single subject case in eval_fixed, too. I'll merge that now then and close this issue. Evaluating only on a single subject most methods for error bar estimation will now work, but I think you should get the correlation values without warnings I think.
Apologies for re-opening a closed issue, but I'm running into the same ValueError: rdm1 and rdm2 have different nan positions
issue when calling evaluate_models_searchlight(SL_RDM, tone_model, eval_fixed, method='spearman', n_jobs=4)
(following the searchlight RSA demo).
Package info: Linux Red Hat 7.7, python 3.9, rsatoolbox-0.0.4, scipy-1.6.3, pandas-1.2.4, numpy-1.22.2
I have 16 stimuli with individual beta maps. As suggested by Mike, I am importing data from a list of .nii.gz
files:
mask_img = nib.load(mask_fpath)
mask_data = mask_img.get_fdata()
x, y, z = mask_data.shape
# loop over all images
data = np.zeros((len(image_paths), x, y, z))
for x, im in enumerate(image_paths):
data[x] = nib.load(im).get_fdata()
# only one pattern per image
image_value = np.arange(len(image_paths))
Finding the searchlight centers and getting searchlight RDMs seems to work fine, with SL_RDM.dissimilarities.shape
returning (420909, 120)
.
I constructed a couple categorical RDMs:
tone_rdm = np.array([[0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,],
[0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,],
[0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,],
[0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,],
[1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,],
[1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,],
[1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,],
[1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,],
[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1,],
[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1,],
[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1,],
[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1,],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0,],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0,],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0,],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0,], ])
talker_rdm = np.array([[0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, ],
[1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, ],
[1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, ],
[1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, ],
[0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, ],
[1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, ],
[1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, ],
[1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, ],
[0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, ],
[1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, ],
[1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, ],
[1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, ],
[0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, ],
[1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, ],
[1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, ],
[1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, ], ])
rdms_array = np.array([tone_rdm, talker_rdm])
model_rdms = RDMs(rdms_array,
rdm_descriptors={'categorical_model':['tone', 'talker'],},
dissimilarity_measure='Euclidean'
)
where model_rdms.dissimilarities.shape
returns (2, 120)
and for testing purposes, turn one into a Model:
tone_model = ModelFixed( 'Tone RDM', model_rdms.subset('categorical_model', 'tone'))
with tone_model.rdm.shape
returning (120,)
I confirm there are no nans:
np.where(tone_model.rdm==np.nan)
(array([], dtype=int64),)
np.where(SL_RDM.dissimilarities==np.nan)
(array([], dtype=int64), array([], dtype=int64))
But I get the ValueError: rdm1 and rdm2 have different nan positions
when calling evaluate_models_searchlight(SL_RDM, tone_model, eval_fixed, method='spearman', n_jobs=4)
regardless.
Any tips would be great!
Hi @sitek,
Could you try upgrading to version 0.1? i.e. pip install -U rsatoolbox
I think @HeikoSchuett's patch was applied after 0.0.4 Jasper
I'm actually getting a weird import error with 0.1.0 (EDIT: with 0.0.6.dev61 as well):
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 import rsatoolbox
2 from rsatoolbox.inference import eval_fixed
3 from rsatoolbox.model import ModelFixed, Model
File /bgfs/bchandrasekaran/krs228/software/miniconda3/envs/py3/lib/python3.9/site-packages/rsatoolbox/__init__.py:8, in <module>
6 from . import cengine
7 from . import data
----> 8 from . import inference
9 from . import model
10 from . import rdm
File /bgfs/bchandrasekaran/krs228/software/miniconda3/envs/py3/lib/python3.9/site-packages/rsatoolbox/inference/__init__.py:1, in <module>
----> 1 from .bootstrap import bootstrap_sample
2 from .bootstrap import bootstrap_sample_rdm
3 from .bootstrap import bootstrap_sample_pattern
File /bgfs/bchandrasekaran/krs228/software/miniconda3/envs/py3/lib/python3.9/site-packages/rsatoolbox/inference/bootstrap.py:7, in <module>
3 """inference module: bootstrapping
4 """
6 import numpy as np
----> 7 from rsatoolbox.util.rdm_utils import add_pattern_index
10 def bootstrap_sample(rdms, rdm_descriptor='index', pattern_descriptor='index'):
11 """Draws a bootstrap_sample from the data.
12
13 This function generates a bootstrap sample of RDMs resampled over
(...)
40
41 """
File /bgfs/bchandrasekaran/krs228/software/miniconda3/envs/py3/lib/python3.9/site-packages/rsatoolbox/util/rdm_utils.py:9, in <module>
7 from __future__ import annotations
8 from typing import Union, List, Dict, Tuple, TYPE_CHECKING
----> 9 from numpy.typing import NDArray
10 import numpy as np
11 from scipy.spatial.distance import squareform
File /bgfs/bchandrasekaran/krs228/software/miniconda3/envs/py3/lib/python3.9/site-packages/numpy/typing/__init__.py:324, in <module>
311 from ._scalars import (
312 _CharLike_co,
313 _BoolLike_co,
(...)
321 _VoidLike_co,
322 )
323 from ._shape import _Shape, _ShapeLike
--> 324 from ._dtype_like import (
325 DTypeLike as DTypeLike,
326 _SupportsDType,
327 _VoidDTypeLike,
328 _DTypeLikeBool,
329 _DTypeLikeUInt,
330 _DTypeLikeInt,
331 _DTypeLikeFloat,
332 _DTypeLikeComplex,
333 _DTypeLikeTD64,
334 _DTypeLikeDT64,
335 _DTypeLikeObject,
336 _DTypeLikeVoid,
337 _DTypeLikeStr,
338 _DTypeLikeBytes,
339 _DTypeLikeComplex_co,
340 )
341 from ._array_like import (
342 ArrayLike as ArrayLike,
343 _ArrayLike,
(...)
358 _ArrayLikeBytes_co,
359 )
360 from ._generic_alias import (
361 NDArray as NDArray,
362 _DType,
363 _GenericAlias,
364 )
File /bgfs/bchandrasekaran/krs228/software/miniconda3/envs/py3/lib/python3.9/site-packages/numpy/typing/_dtype_like.py:16, in <module>
13 import numpy as np
15 from ._shape import _ShapeLike
---> 16 from ._generic_alias import _DType as DType
18 from ._char_codes import (
19 _BoolCodes,
20 _UInt8Codes,
(...)
57 _ObjectCodes,
58 )
60 _DTypeLikeNested = Any # TODO: wait for support for recursive types
File /bgfs/bchandrasekaran/krs228/software/miniconda3/envs/py3/lib/python3.9/site-packages/numpy/typing/_generic_alias.py:211, in <module>
208 ScalarType = TypeVar("ScalarType", bound=np.generic, covariant=True)
210 if TYPE_CHECKING or sys.version_info >= (3, 9):
--> 211 _DType = np.dtype[ScalarType]
212 NDArray = np.ndarray[Any, np.dtype[ScalarType]]
213 else:
TypeError: 'type' object is not subscriptable
Which version of numpy is installed?
It appears to be a bug that only appears in combination with certain numpy versions. I've made a fix here. It should be ready in about an hour, I'll report back then.
Thanks @ilogue ! fyi, I'm on numpy 1.22.2 (although numpy-base is 1.19.2)
The fixed version is now ready as 0.1.1.dev2. Since this is a pre-release (non-stable version) to install it you have to add the --pre
flag:
pip install -U --pre rsatoolbox
Give it a go and let me know if this fixes the nan issue.
Unfortunately the same nan error arises—see below for full traceback.
With n_jobs=2
it makes a bit of progress but then seems to freeze—could be a memory issue on my HPC node:
FYI, I initially got the following error (matplotlib version 3.5.1):
File /bgfs/bchandrasekaran/krs228/software/miniconda3/envs/py3/lib/python3.9/site-packages/rsatoolbox/vis/icon.py:18, in <module>
16 from rsatoolbox.rdm import RDMs
17 from rsatoolbox.util.pooling import pool_rdm
---> 18 if hasattr(matplotlib.colormaps, 'get_cmap'):
19 mpl_get_cmap = matplotlib.colormaps.get_cmap
20 else:
AttributeError: module 'matplotlib' has no attribute 'colormaps'
But modifying matplotlib.colormaps
to matplotlib.cm
got the import working.
I see that this is related to #232 – unfortunately the current fix doesn't catch the version differences gracefully. Perhaps a more explicit check for matplotlib version, or a try
/except
clause instead of the if
/else
?
NaN error when calling eval_results = evaluate_models_searchlight(SL_RDM, tone_model, eval_fixed, method='spearman', n_jobs=1)
:
Evaluating models for each searchlight: 0%| | 0/420909 [00:00<?, ?it/s]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [29], in <cell line: 1>()
----> 1 eval_results = evaluate_models_searchlight(SL_RDM, tone_model, eval_fixed, method='spearman', n_jobs=1)
File /bgfs/bchandrasekaran/krs228/software/miniconda3/envs/py3/lib/python3.9/site-packages/rsatoolbox/util/searchlight.py:204, in evaluate_models_searchlight(sl_RDM, models, eval_function, method, theta, n_jobs)
183 def evaluate_models_searchlight(sl_RDM, models, eval_function, method='corr', theta=None, n_jobs=1):
184 """evaluates each searchlighth with the given model/models
185
186 Args:
(...)
201 list: list of with the model evaluation for each searchlight center
202 """
--> 204 results = Parallel(n_jobs=n_jobs)(
205 delayed(eval_function)(
206 models, x, method=method, theta=theta) for x in tqdm(
207 sl_RDM, desc='Evaluating models for each searchlight'))
209 return results
File /bgfs/bchandrasekaran/krs228/software/miniconda3/envs/py3/lib/python3.9/site-packages/joblib/parallel.py:1044, in Parallel.__call__(self, iterable)
1041 if self.dispatch_one_batch(iterator):
1042 self._iterating = self._original_iterator is not None
-> 1044 while self.dispatch_one_batch(iterator):
1045 pass
1047 if pre_dispatch == "all" or n_jobs == 1:
1048 # The iterable was consumed all at once by the above for loop.
1049 # No need to wait for async callbacks to trigger to
1050 # consumption.
File /bgfs/bchandrasekaran/krs228/software/miniconda3/envs/py3/lib/python3.9/site-packages/joblib/parallel.py:859, in Parallel.dispatch_one_batch(self, iterator)
857 return False
858 else:
--> 859 self._dispatch(tasks)
860 return True
File /bgfs/bchandrasekaran/krs228/software/miniconda3/envs/py3/lib/python3.9/site-packages/joblib/parallel.py:777, in Parallel._dispatch(self, batch)
775 with self._lock:
776 job_idx = len(self._jobs)
--> 777 job = self._backend.apply_async(batch, callback=cb)
778 # A job can complete so quickly than its callback is
779 # called before we get here, causing self._jobs to
780 # grow. To ensure correct results ordering, .insert is
781 # used (rather than .append) in the following line
782 self._jobs.insert(job_idx, job)
File /bgfs/bchandrasekaran/krs228/software/miniconda3/envs/py3/lib/python3.9/site-packages/joblib/_parallel_backends.py:208, in SequentialBackend.apply_async(self, func, callback)
206 def apply_async(self, func, callback=None):
207 """Schedule a func to be run"""
--> 208 result = ImmediateResult(func)
209 if callback:
210 callback(result)
File /bgfs/bchandrasekaran/krs228/software/miniconda3/envs/py3/lib/python3.9/site-packages/joblib/_parallel_backends.py:572, in ImmediateResult.__init__(self, batch)
569 def __init__(self, batch):
570 # Don't delay the application, to avoid keeping the input
571 # arguments in memory
--> 572 self.results = batch()
File /bgfs/bchandrasekaran/krs228/software/miniconda3/envs/py3/lib/python3.9/site-packages/joblib/parallel.py:262, in BatchedCalls.__call__(self)
258 def __call__(self):
259 # Set the default nested backend to self._backend but do not set the
260 # change the default number of processes to -1
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262 return [func(*args, **kwargs)
263 for func, args, kwargs in self.items]
File /bgfs/bchandrasekaran/krs228/software/miniconda3/envs/py3/lib/python3.9/site-packages/joblib/parallel.py:262, in <listcomp>(.0)
258 def __call__(self):
259 # Set the default nested backend to self._backend but do not set the
260 # change the default number of processes to -1
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262 return [func(*args, **kwargs)
263 for func, args, kwargs in self.items]
File /bgfs/bchandrasekaran/krs228/software/miniconda3/envs/py3/lib/python3.9/site-packages/rsatoolbox/inference/evaluate.py:199, in eval_fixed(models, data, theta, method)
197 for k, model in enumerate(models):
198 rdm_pred = model.predict_rdm(theta=theta[k])
--> 199 evaluations[k] = compare(rdm_pred, data, method)
200 evaluations = evaluations.reshape((1, len(models), data.n_rdm))
201 noise_ceil = boot_noise_ceiling(
202 data, method=method, rdm_descriptor='index')
File /bgfs/bchandrasekaran/krs228/software/miniconda3/envs/py3/lib/python3.9/site-packages/rsatoolbox/rdm/compare.py:62, in compare(rdm1, rdm2, method, sigma_k)
60 sim = compare_cosine(rdm1, rdm2)
61 elif method == 'spearman':
---> 62 sim = compare_spearman(rdm1, rdm2)
63 elif method == 'corr':
64 sim = compare_correlation(rdm1, rdm2)
File /bgfs/bchandrasekaran/krs228/software/miniconda3/envs/py3/lib/python3.9/site-packages/rsatoolbox/rdm/compare.py:176, in compare_spearman(rdm1, rdm2)
162 def compare_spearman(rdm1, rdm2):
163 """calculates the spearman rank correlations between
164 two RDMs objects
165
(...)
174
175 """
--> 176 vector1, vector2, _ = _parse_input_rdms(rdm1, rdm2)
177 vector1 = np.apply_along_axis(scipy.stats.rankdata, 1, vector1)
178 vector2 = np.apply_along_axis(scipy.stats.rankdata, 1, vector2)
File /bgfs/bchandrasekaran/krs228/software/miniconda3/envs/py3/lib/python3.9/site-packages/rsatoolbox/rdm/compare.py:623, in _parse_input_rdms(rdm1, rdm2)
621 vector2_no_nan = vector2[~np.isnan(vector2)].reshape(vector2.shape[0], -1)
622 if not vector1_no_nan.shape[1] == vector2_no_nan.shape[1]:
--> 623 raise ValueError('rdm1 and rdm2 have different nan positions')
624 return vector1_no_nan, vector2_no_nan, nan_idx[0]
ValueError: rdm1 and rdm2 have different nan positions
I've made a separate issue from the matplotlib thing in #297
As for the nan issue this is a bit hard to debug without the data. Are you able to share some of it? My email is japsai @ gmail. Also to give you some context we are planning a larger revamp of the searchlight functionality over the coming weeks/months.
hi @sitek and @jones-michael-s,
I’ve looked deeper into this and found the source of this issue. In short, it’s to do with the masking. In the new PR where I’ll be working on the searchlight interface, I have included test scripts for both of your datasets (data not included of course), which will be removed when the PR is finished but for now may help us to talk about this:
Kevin: https://github.com/rsagroup/rsatoolbox/blob/a484a9cd316efdb7d0717a876e27fdce37daad7c/test_sitek.py Mike: https://github.com/rsagroup/rsatoolbox/blob/a484a9cd316efdb7d0717a876e27fdce37daad7c/test_jones.py
The problem was almost the same in both cases; the image-based masks included some voxels without stats. As your code, and the demo’s, uses correlation distance, which is undefined for all-zero patterns, this causes nan’s in the RDMs. This problem didn’t occur in the demo since the mask that Daniel used is something that is data-based and automatically filters out voxels without data.
There’s a few technical workarounds;
A better solution I’d propose is to mask specifically the voxels that don’t have data, I’ve done this in the test scripts as follows:
mask_2d = ~numpy.all(data==0, axis=0)
mask_3d = mask_2d.reshape(x, y, z)
Both scripts work with this change.
Now after I’ve worked on the Searchlight implementation more, we’d probably also want the inference step in the searchlight pipeline to just fill such values with NaNs, (instead of throwing an exception) but I thought you might want to know this before such a fix is deployed.
Sorry Mike for taking so long to look into this!
Please let us know if you come across any further issues!
Hello,
I'm currently getting runtime warnings and errors when using searchlight (specifically from evaluate_models_searchlight).
Background:
• OS-X 10.14.3, Python 3.7.3, rsatoolbox 0.0.4, numpy 1.21.6, scipy 1.7.3, pandas 1.3.5, nibabel 4.0.1
• I've been successful in getting basic RSA toolbox functionality to work (i.e., defining a data RDM and a model RDM and fitting the model to the data using rsatoolbox.inference.eval_fixed). I don't get any warnings or errors and the results look sensible. It's only when attempting to use searchlight that I run into problems.
• As far as I have explored, the problem isn't fixed by options, parameter settings, data size, etc.
1) runtime warning:
I get the following warning when i) running the searchlight example in the documentation or ii) running searchlight on toy data (generated using numpy random):
The analysis completes, but the divide by zero warning is still concerning.
2) runtime error
When trying to run searchlight on real (i.e., our) data I get a different warning and then an error. What's different here is this uses data saved from Matlab and then input using io.matlab.loadmat (code to follow):
Here's my code (which I mostly stole from the docs)
Any help is much appreciated, Mike