poseidonchan / TAPE

Deep learning-based tissue compositions and cell-type-specific gene expression analysis with tissue-adaptive autoencoder (TAPE)
https://sctape.readthedocs.io/
GNU General Public License v3.0
49 stars 9 forks source link

data normalization #12

Open doulijun777 opened 1 year ago

doulijun777 commented 1 year ago

Thanks for this wonderful tool. I have one question: when I simulated Psedobulk data using sc-RNAseq data, I just check the function "generate_simulated_data", it looks as

print('Normalizing raw single cell data with scanpy.pp.normalize_total') sc_data = anndata.AnnData(sc_data)

sc.pp.normalize_total(sc_data, target_sum=1e4)

So, do we need to normalize here or not? I am little confused?

Another question is that for bulk data, do we need to change to TPM or FPKM or only use the count data.

Thank you.

doulijun777 commented 1 year ago

In the published code, this sentence was commented out, actually. so I am little confused.

poseidonchan commented 1 year ago

Hi doulijun777:

Thanks for trying TAPE. Actually, I am not very sure about the normalization problem right now. Probably I should not commentated it out. For the bulk data, whatever the normalization is, please use "count" argument in the function to make sure the proper deconvolution performance.

Regards, Yanshuo