zhengtaoxiao / Single-Cell-Metabolic-Landscape

Pipeline for characterizing metebolic heterogeneity from single-cell RNA-seq data
MIT License
23 stars 59 forks source link

“imputation issue”error #9

Open xutongran opened 2 years ago

xutongran commented 2 years ago

Hi zhengtaoxiao! when i repeated your data, some errors produced. error result as follows:

scimpute(file.path(outDir,"tumor.tpm"),infile="csv",outfile="csv",outdir=file.path(outDir,"malignant"),

  • labeled=TRUE,labels=as.vector(labels_tumor),
  • type="TPM",genelen=genelen,drop_thre=0.5,ncores=num_cores) [1] "reading in raw count matrix ..." [1] "number of genes in raw count matrix 23684" [1] "number of cells in raw count matrix 1167" Error in if (min(raw_count) < 0) { : missing value where TRUE/FALSE needed In addition: Warning message: In dir.create(outdir, recursive = TRUE) : 'dataset/NA/malignant' already exists

Mine is only relatively shallow, so I do not know how to deal with, would like to ask you for advice. I hope you will give me your advice

gloknar commented 2 years ago

Perhaps some Inf values are created in your sce object after running the line

raw_tpm <- (2^all_data) - 1

Check if there are Inf, NA, NaN or negative values in your sce@assays@data$tpm slot

gloknar commented 2 years ago

Just checking in, did it fix the error?

WANNQI commented 1 year ago

Just checking in, did it fix the error?

Hello,I met the same error, the code 'raw_tpm <- (2^all_data) - 1' indeed produce some Inf values, then how should I deal with these Inf values? just deplete them? Thanks a lot.

gloknar commented 1 year ago

We circunvented the issue capping those inf values to a reasonably high value by using the following reasoning:

The gene expression data present in Xiao's datasets are normalized as TPM (transcripts per Million), that means, for a given cell, its library depth was divided by one million, and therefore its total amount of TPMs should sum up to 1.000.000. Those genes that give Inf could account for... Maybe 800.000 out of 1.000.000 TPMs per cell? You could cap those genes so that instead of giving (2^huge number)-1 = Inf, maybe try instead (2^8)-1 = a more reasonable TPM value

You can find more info about the formula and meaning of TPM here: https://wiki.arrayserver.com/wiki/index.php?title=TPM

Cheers

El jue., 9 mar. 2023 9:22, wannqi @.***> escribió:

Just checking in, did it fix the error?

Hello,I met the same error, the code 'raw_tpm <- (2^all_data) - 1' indeed produce some Inf values, then how should I deal with these Inf values? just deplete them? Thanks a lot.

— Reply to this email directly, view it on GitHub https://github.com/zhengtaoxiao/Single-Cell-Metabolic-Landscape/issues/9#issuecomment-1461544543, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN4YT7ISXCPW62CYW7VONF3W3GHM5ANCNFSM5GEM32RA . You are receiving this because you commented.Message ID: @.*** com>