Closed agramfort closed 1 year ago
adding @functools.lru_cache(maxsize=None) on get_runs_all_subjects
leads to TypeError: unhashable type: 'types.SimpleNamespace'
Indeed types.SimpleNamespace is not Hashable like a dict. What's annoying is that joblib.Memory manages to do the job easily.
@larsoner do you have any opinion on this?
We could subclass SimpleNamespace
using object_hash
pretty easily, but a less hacky solution would be not to pass cfg
directly but just the cfg.subjects
, cfg.exclude_subjects
, etc. that are needed internally by the function. And the easy way to do that is to make get_runs_all_subjects
call a _get_runs_all_subjects
that takes those cfg.whatever
values, and lru_cache
that function. That way we don't have to change all our calls to get_runs_all_subjects
but we can construct a function _get_runs_all_subjects
with hashable inputs
@apmellot played with a dataset with 1400 subjects and the pipeline appeared super slow to get started. The quality step takes more than a day on the NFS disks... When profiling it seems the bottleneck is in the get_runs function. It calls get_runs_all_subjects for all subject so it seems we have a quadratic complexity here.
It seems we need to change the logic or use more caching.