Closed bhomass closed 1 year ago
Hi @bhomass,
Can you give more detail on the dataset you are referring to? I assume the Sciplex data? I agree that this bit of the code needs some refactoring. To give a bit more context: chemCPA is designed to operate with any number of covariates but requires the cell_type
one, which should always be present.
Any yaml file with covariants_keys set to cell_id needs to be changed to cell_type. There are many such yaml files throughout the repo. This is independent of which dataset. Like you said, the data.py code is fixed to look for cell_type.
you can see
cov = indx(self.covariate_names["cell_type"], i)
but self.covariate_names came from self.covariate_keys
self.covariate_names = {}
for cov in self.covariate_keys:
self.covariate_names[cov] = indx(dataset.covariate_names[cov], indices)
There are a few .yaml files which declare dataset.data_params.covariate_keys: cell_id
most declare dataset.data_params.covariate_keys: cell_type
but the code in data.py
has hard coding expecting cell_type.
Should I assume all instances of "cell_id" need to be converted to "cell_type"?
turns out they are one and the same in value adata.obs['cell_type'] = adata.obs['cell_id']