Open aalok-sathe opened 2 years ago
proposal: make the __repr__
method of each Cacheable
class uniquely identify that instance.
E.g., the repr(BrainScore())
should contain information about Mapping
, Metric
, and the encoders (all this can come from respective calls to the repr
methods of these objects)
below list is in the form:
[ ] Object to repr()
[ ] BrainScore
Mapping
Metric
Encoder1
outputsEncoder2
outputs (should we create a class EncoderOutput
, for more logical dependency in cache handling?) @lipkinb @gretatuckute [ ] Mapping
str
algorithm[ ] Metric
str
algorithm[x] EncoderOutput (?)
Encoder
Dataset
[ ] HFEncoder
str
algorithm (pretrained_model_name_or_path
)str
aggregation choicesDataset
~[ ] BrainEncoder
Dataset
~[ ] Dataset
str
path to the datazarr
is unable to cache xarray
s with dtype object
in them. Somehow we're getting dtype object bleed in from somewhere. Once that is corrected to string, this issue disappears.
This issue is referenced here: https://github.com/pydata/xarray/issues/3476
It is partially sovled by commits in #34
we need reliable state-caching for most classes to persist results to the disk, for later analysis and reuse in pipelines. if cached results exist, they may be reused based on a flag (e.g.
overwrite_cache=False
)