rsagroup / rsatoolbox

Python library for Representational Similarity Analysis
MIT License
188 stars 37 forks source link

calc_rdm_movie can change the type of the observation descriptors #345

Open caiw opened 1 year ago

caiw commented 1 year ago

Describe the bug If you pass a dictionary whose values are numpy.int64 to calc_rdm_movie's obs_descriptors argument, the resultant RDMs object will have .pattern_descriptors value types of numpy.float64 (which annoyingly changes they way they print).

To Reproduce Following the Temporal RSA demo, we have the following set (among other variables):

with Path(path.abspath(""), "data", "TemporalSampleData", "meg_sample_data.pkl").open("rb") as d:
    meg_sample_data = pickle.load(d)
# ...
condition_indices = meg_sample_data["cond_idx"]
print(type(condition_indices[0]))  # <class 'numpy.int64'>

Later on in the demo, we create a TemporalDataset and run calc_rdm_movie:

observ_des = {"conditions": condition_indices}
# ...
dataset = TemporalDataset(measurements, descriptors=des, obs_descriptors=observ_des, channel_descriptors=chan_des, time_descriptors=time_des)
print(dataset.obs_descriptors["conditions"][:5])  # [1, 1, 1, 1, 1]
print(type(dataset.obs_descriptors["conditions"][0]))  # <class 'numpy.int64'>

so no funny business there.

Following further down we run calc_rdm_movie:

rdms_data = calc_rdm_movie(dataset, method="correlation", descriptor="conditions")
print(rdms_data.pattern_descriptors["conditions"])  # [1.0, 2.0, 3.0, 4.0]
print(type(rdms_data.pattern_descriptors["conditions"][0]))  # <class 'numpy.float64'>

Expected behavior The final two prints should look like:

print(rdms_data.pattern_descriptors["conditions"])  # [1, 2, 3, 4]
print(type(rdms_data.pattern_descriptors["conditions"][0]))  # <class 'numpy.int64'>

Versions Please report:

caiw commented 1 year ago

Ok, digging further, the culprit here is TemporalDataset.convert_to_dataset, which is called by calc_rdm_movie, at least in v0.1.3. Around line 735 it repeatedly calls np.concatenate, starting with an empty list, thereby converting the dtype to numpy.float64.

The solution would probably be to do something like detect and reset the datatype, but TemporalDataset.convert_to_dataset is now deprecated, so not sure how far to pursue this.

JasperVanDenBosch commented 1 year ago

We also more generally need to discuss descriptor datatypes I think. Many methods (naturally) make certain assumptions about them, which don't always hold up because we are quite permissive right now