inmybrain / SClineager

SClineager: a Bayesian hierarchical model that performs lineage tracing of single cells based on genetic markers
GNU General Public License v3.0
9 stars 2 forks source link

possibility bug in the code of SClineager? #3

Open MenglinC opened 3 years ago

MenglinC commented 3 years ago

Dear professors@tianshiliu,

Recently,I am very interested in your work for lineage tracing use this SClineager package.I use your test data in the data folder,which is the coverage.txt and the germline_mutations_mm10.txt. But I come across the error :

Error in apply(mutations_mat, 1, function(x) mean(x > 0.1, na.rm = T) > : dim(X) must have a positive length

So,I check you source code "SClineager_io.R",I find that the mutations_mat is not the matrix class,it just a numberic vector!Even you first define it as a matrix.

mutations_mat=matrix(0,ncol=length(unique(mutations$id)),nrow=length(unique(mutations$mutation))) class(mutations_mat) [1] "matrix" "array"

But it changes when the operation occurs I guess:

mutations_mat=mutations_mat[apply(mutations_mat,1,function(x) mean(x>0.1,na.rm=T)>cutoff),,drop=F] Error in apply(mutations_mat, 1, function(x) mean(x > 0.1, na.rm = T) > : dim(X)的值必需是正数 mutations_mat chr15 8144078 C G chr17 33838239 CT C 0.9000000 0.4285714 class(mutations_mat) [1] "numeric"

So,I want to ask how can I continue this part while do not destroy the whole code structure? In addition,I want to complain that there are so many continuous assignment to the vector mutations_mat that we can not trace the operation of code easily!

Hope for your suggestions! Xiu

MenglinC commented 3 years ago

Dear professors,

I overcome this problem by transform the numberic type to the matrix in the source code of the file "SClineager_io.R":

mutations_mat=as.matrix(mutations_mat[!apply(mutations_mat,1,function(x) all(is.na(x))),])
#add the as.matrix()

However,I also find another problems in the "remove mutations that are not estimatable in too many cells"part: mutations_mat=mutations_mat[apply(mutations_mat,1,function(x) sd(x,na.rm=T))>0,,drop=F]

mutations_mat [,1]

NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

I currently only have one cell from the data file for test!So when I perform this code,It turn out to be all error! So can you provide more test file for performing? Thanks a lot!

tianshilu commented 3 years ago

@MenglinC ,

Thanks for your interest in SClineager. SClineaager infers variant allele frequency and lineage information by borrowing information from related cells (please refer to the paper for more details https://www.cell.com/cell-reports/fulltext/S2211-1247(20)31578-3?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS2211124720315783%3Fshowall%3Dtrue ]). So I recommend you to use at least 100 cells for testing.

Thanks

MenglinC commented 3 years ago

@tianshiliu Hi,Thank you for your explanation.I also have other questions about this pipeline. I want to use the CTCL patient raw data for testing as your referred as CTCL.RData in the data folder.I check the data in the GSE126310,but it only have the CTCL-ADT,CTCL-HTO sample available,becasue of the privacy.So,I want to ask which samples your used in your paper,and I want to repreat this part! Thanks a lot!

tianshilu commented 3 years ago

@MenglinC

Hi, CTCL-cDNA is used for calling variants and inferring missing data points. CTCL-TCRab is used for lineages validation.

Thanks!

SuqinYang commented 2 years ago

Hi,@TianshiLiu Sorry to trouble you. I am trying to use SClineager, but I met an error in R when I used the function “ read_sclineager” :Error in mutations[, c("Func.refGene", "mutation", "ExonicFunc.refGene", : incorrect number of dimensions

Following are part of the somatic_mutations_hg38.txt and coverage.txt

[coverage.txt]: chr1 14523 C 1 chr1 14524 C 1 chr1 14525 C 1 chr1 14526 C 1 chr1 14527 A 1 chr1 14528 G 1 chr1 14529 C 1 chr1 14530 T 1 chr1 14531 G 1

[somatic_mutations_hg38.txt]: Chr Start End Ref Alt Caller Normal_ref Normal_alt Tumor_ref Tumor_alt Func.refGene mutation ExonicFunc.refGene AAChange.refGene SIFT_pred Polyphen2_HVAR_pred cosmic70 esp6500siv2_all ExAC_ALL X1000g2015aug_all chr1 629345 629345 G A lofreq 62 9 62 9 upstream LOC101928626 . . . . 0 0 0 0 chr1 631877 631877 T G lofreq 24 6 24 6 downstream MIR12136 . . . . 0 0 0 0 chr1 631878 631878 T G lofreq 29 6 29 6 downstream MIR12136 . . . . 0 0 0 0 chr1 631920 631920 C A lofreq 328 3 328 3 downstream MIR12136 . . . . 0 0 0 0

tianshilu commented 2 years ago

Hi @SuqinYang,

Thanks for your interest in our study. To better diagnose the problem, could you show the command line you used for running SClineager?

Thanks! Tianshi

SuqinYang commented 2 years ago

Hi,@tianshilu Following are my commands: setwd("D:/sclineager") coverage_cutoff <- 1 coverage_percentage <- 0.2 cell_percentage <- 0.2 artefact_percentage <- 0.03 library(SClineager) for (folder in c(2)){ print(folder) runinfo <- data.frame( Cell = list.files(paste("./mutations/",folder, sep = "")), Path = list.files(paste("./mutations/", folder, sep = ""), full = T), stringsAsFactors = F ) out_folder <- paste("./processed/", folder, sep = "") }

for (folder in c(2)){ preprocess_genetics <- read_sclineager(runinfo, coverage_cutoff, coverage_percentage, cell_percentage, out_folder, artefact_percentage)

}

And I found another confusing thing,I got somatic_mutations_hg38.txt and germline_mutations_hg38.txt. When I use the "germline_mutations_hg38.txt" for SClineager, it seems like there is no problem, but when I use "somatic_mutations_hg38.txt" for SClineager ,it has an error: ”Error in mutations[, c("Func.refGene", "mutation", "ExonicFunc.refGene", :incorrect number of dimensions“ . Is there some problem with my somatic_mutations_hg38.txt? Can you give an example for your somatic_mutations_hg38.txt

Following are some of the somatic_mutations_hg38.txt and germline_mutations_hg38.txt: 1.somatic_mutations_hg38.txt Chr Start End Ref Alt Caller Normal_ref Normal_alt Tumor_ref Tumor_alt Func.refGene Gene.refGene ExonicFunc.refGene AAChange.refGene SIFT_pred Polyphen2_HVAR_pred cosmic70 esp6500siv2_all ExAC_ALL X1000g2015aug_all chr1 632142 632142 T G lofreq 30 4 30 4 downstream MIR12136 . . . . 0 0 0 0 chr1 632144 632144 T G lofreq 34 4 34 4 downstream MIR12136 . . . . 0 0 0 0 chr1 1014470 1014470 C T lofreq 15 3 15 3 exonic ISG15 nonsynonymous SNV ISG15:NM_005101:exon2:c.C490T:p.R164W D B 0 0 0.000008958 0

2.germline_mutations_hg38.txt Chr Start End Ref Alt Caller Normal_ref Normal_alt Tumor_ref Tumor_alt Func.refGene Gene.refGene ExonicFunc.refGene AAChange.refGene SIFT_pred Polyphen2_HVAR_pred cosmic70 esp6500siv2_all ExAC_ALL X1000g2015aug_all chr1 632142 632142 T G lofreq 30 4 30 4 downstream MIR12136 . . . . 0 0 0 0 chr1 632144 632144 T G lofreq 34 4 34 4 downstream MIR12136 . . . . 0 0 0 0 chr1 634229 634229 A C lofreq 3 12 3 12 intergenic MIR12136;OR4F16 . . . . 0 0 0 0 chr1 634244 634244 T C lofreq 1 15 1 15 intergenic MIR12136;OR4F16 . . . . 0 0 0 0

tianshilu commented 2 years ago

Hi @SuqinYang,

Thank you for providing more details. This tool is based on germline mutations for lineage references. You can check the paper for more details https://www.sciencedirect.com/science/article/pii/S2211124720315783https://www.sciencedirect.com/science/article/pii/S2211124720315783. So the function will only read germline mutation files under the directory you provided in the run_info (please see the code . I hope it helps.

Thanks!

SuqinYang commented 2 years ago

Hi,@tianshilu Thank you for your reply! It helps me a lot! I saw you say in the article (Overcoming Expressional Drop-outs in Lineage Reconstruction from Single-Cell RNA Sequencing Data) that it can also be used for somatic mutation detection " Finally, these four software tools can only work on somatic mutations from tumor single-cell-sequencing data, whereas SClineager is flexible and can handle somatic and/or germline variants." But you said the function would only read germline mutation files under the directory I provided in the run_info, I want to know how I can use SClineager with my somatic mutation data.

Best wishes!

tianshilu commented 2 years ago

@SuqinYang,

Our tool can be used for somatic mutations. You will need to replace "germline_mutations" with "somatic_mutations" in this line https://github.com/inmybrain/SClineager/blob/c550f70ecc1b95dd5b2f7e1df7b80aafb4abdd63/SClineager/R/SClineager_io.R#L49.

Thanks!

SuqinYang commented 2 years ago

Hi,@tianshilu Thank you for your reply,it helps me a lot! And I have other problems. When I run the "SClineager",I will get the summary.pdf, imputation_results.pdf and mCelSeq2_exploratory.pdf, is there any difference between this pdfs ?And I see codes in your "Another toy example with simulated data" are different from codes in "A basic example of using the package ",which codes are better for me to refer to?

tianshilu commented 2 years ago

Hi @SuqinYang,

summary.pdf is a heatmap showing the actual vaf matrix under the cutoffs that you input. imputation_results.pdf shows the heatmaps of the SClineager imputation results. mCelSeq_exploratory.pdf shows the distribution of vaf, count, and standard deviation of the mutations in your input data.

I recommend using the code under the "basic example" section which has more details. You can also use the toy example to test out the method.

Tianshi Tianshi

SuqinYang commented 2 years ago

Hi, @tianshilu Sorry to trouble you again. I have run the SClineager ,and I got the cleaned.RData and results.RData.When I loaded results.RData, and I transfered the "results[["genotype_mat"]]" into a data.frame (such as:a<-as.data.frame(results[["genotype_mat"]])),I found that every cell has the same variants but different VAF, I want to know why every cell has the same variants. Will sclineager filter variants and get the final variants of every cell?

dstyaaaaaa commented 9 months ago

Hi @tianshilu , I met the same error when I ran this package on your test data(coverage.txt and germline_mutations_mm10.txt). I don't know if this is due to a bug, but I wonder if there is a way to get the sample data to work without modifying your source code? stding