theislab / scgen

Single cell perturbation prediction
https://scgen.readthedocs.io
GNU General Public License v3.0
262 stars 54 forks source link

Normalization question for batch_removal #59

Closed ryeking2010 closed 2 years ago

ryeking2010 commented 2 years ago

Hello, thanks for such a great tool that's been ranked as a high performer!

I am hoping to understand the normalization process. In the readme, it is mentioned to normalize the data as follows: import scanpy as sc adata = sc.read(data) sc.pp.normalize_total(adata) sc.pp.log1p(adata)

However, in the tutorial, we see warning message: corrected_adata = model.batch_removal() WARNING Make sure the registered X field in anndata contains unnormalized count data.

Should the data be normalized or contain raw counts if batch_removal() is run? Where would be the fix be made if unnormalized counts are used? model.X or train.X before/after training?

There's also the warning of filtering, if that could be addressed too, I think it would make the tutorial much more clearer.

Thanks!

adamgayoso commented 2 years ago

Should the data be normalized or contain raw counts if batch_removal() is run?

The data should be normalized, please ignore that warning for now, we will update it soon.