Closed Hemantcnaik closed 6 months ago
Hi @Hemantcnaik ,
Thanks for using SClineager. After installation of the package, you can handle drop out alleles by using following commends in a R environment.
"read_sclineager(runinfo,coverage_cutoff,coverage_percentage, cell_percentage, out_folder, artefact_percentage)" "run_sclineager(file_in=file_in0,folder=file_out,categories=categories,max_iter=max_iter,keep_genes=mask_genes,vaf_offset=vaf_offset,dfreedom=dfreedom,skip_common=skip_common,psi=NULL,control=control,save=FALSE)"
You can refer to "Read data" and "Run SClineager" sections of https://github.com/inmybrain/SClineager/blob/master/README.md for more details.
Thanks,
Hi @tianshilu
Thank for the quick reply
I have successfully install SClineger in R,
I am confused with the input file format, mentioned test data file above, can you please suggest me is there any modification I have to do. I am not aware of what type of input file I have to provide when I tried with the example data it gave me error, I followed as mentioned in Readme and i got the error created all directory mentioned and copy the input file from.
coverage_cutoff <- 3 coverage_percentage <- 0.2 cell_percentage <- 0.2 artefact_percentage <- 0.03
for (folder in c(301)){ print(folder) runinfo <- data.frame( Cell = list.files(paste("./mutations/", folder, sep = "")), Path = list.files(paste("./mutations/", folder, sep = ""), full = T), stringsAsFactors = F ) out_folder <- paste("./processed/", folder, sep = "") preprocess_genetics <- read_sclineager(runinfo, coverage_cutoff, coverage_percentage, cell_percentage, out_folder, artefact_percentage) }
ERROR what I got
Error in mutations[, c("Func.refGene", "mutation", "ExonicFunc.refGene", : incorrect number of dimensions
Thank you
@Hemantcnaik Sorry for the confusion. For each cell of each sample, you will need two files for input. One is coverage file and the other is mutation file. For coverage file, there are three columns of "chromosome", "postion", "nucleoacid", and "coverage" separated by tab without colunames (shown below) chr10 3191043 T 1 chr10 3191044 C 1 chr10 3191045 T 1 chr10 3191046 C 1 chr10 3191047 A 1 chr10 3191048 A 1 chr10 3191049 G 1 chr10 3191050 G 1 chr10 3191051 C 1 chr10 3191052 T 1 The mutation file contains 14 columns with column names and information shown below:
Chr Start End Ref Alt Caller Normal_ref Normal_alt Tumor_ref Tumor_alt Func.refGene Gene.refGene ExonicFunc.refGene AAChange.refGene chr10 111486741 111486741 T C strelka_germline 2 4 2 4 exonic Nap1l1 synonymous SNV Nap1l1:NM_015781:exon4:c.T201C:p.I67I chr11 6478113 6478113 G GAA strelka_germline 4 6 4 6 intergenic Purb;Myo1g . . chr11 20685238 20685238 G GT strelka_germline 4 6 4 6 UTR3 Aftph . . chr11 24041505 24041505 A C strelka_germline 0 6 0 6 intergenic Papolg;Bcl11a . . chr11 24041506 24041506 A G strelka_germline 0 6 0 6 intergenic Papolg;Bcl11a . . chr11 52122587 52122587 T G strelka_germline 10 4 10 4 UTR3 Ppp2ca . . chr11 52389130 52389130 T C strelka_germline 4 4 4 4 UTR3 Vdac1 . . chr11 55454900 55454900 T C strelka_germline 10 4 10 4 exonic Atox1 synonymous SNV Atox1:NM_009720:exon2:c.A63G:p.R21R chr11 63444278 63444278 A C strelka_germline 0 16 0 16 intergenic Pmp22;Hs3st3b1 . . chr11 69396182 69396182 T G strelka_germline 4 6 4 6 exonic Naa38 synonymous SNV Naa38:NM_030083:exon2:c.T87G:p.S29S chr11 109011936 109011936 T C strelka_germline 0 6 0 6 intergenic Axin2;E030025P04Rik . . chr11 109011942 109011942 T C strelka_germline 0 6 0 6 intergenic Axin2;E030025P04Rik . . chr11 109011999 109011999 A G strelka_germline 0 6 0 6 intergenic Axin2;E030025P04Rik . . chr11 109012019 109012019 A G strelka_germline 0 6 0 6 intergenic Axin2;E030025P04Rik . . chr11 114668590 114668590 T C strelka_germline 6 6 6 6 UTR5 Rpl38 . . chr12 72938859 72938859 A T strelka_germline 0 6 0 6 intergenic 4930447C04Rik;Six6 . . chr12 72938864 72938864 T C strelka_germline 0 8 0 8 intergenic 4930447C04Rik;Six6 . . chr12 104475105 104475105 T G strelka_germline 0 6 0 6 intergenic Gsc;Dicer1 . . chr12 104475106 104475106 C A strelka_germline 0 6 0 6 intergenic Gsc;Dicer1 . .
You can either arrange your results to the format described above. Alternatively, you can also use QBRC mutation calling pipeline https://github.com/tianshilu/QBRC-Somatic-Pipeline to get coverage file and mutation file directly.
Tianshi
@tianshilu
My main objective to handling the allelic dropout as you mentioned in paper your method can handle the dropout event, can you please suggest me how can I use your method on my data it will be very help full, data is from mouse
suggestions and tips will be helpful Thank you
@Hemantcnaik You can format your input files according to the example files shown here: https://github.com/inmybrain/SClineager/blob/master/data/germline_mutations_mm10.txt; https://github.com/inmybrain/SClineager/blob/master/data/coverage.txt
Then, you can directly apply SClineager to coverage.txt and germline_mutations_mm10.txt.
Thanks!
@tianshilu
I have created the files as mentioned format. still I am getting error I have tried with your example data also still same error, for your convenience i am providing data of mine and code which I have tried can you please let me know what is the problem, why I am getting error.
Link with folders as you mentioned in Readme and code which I have tried
error what I got [1] "cells" [1] "./mutations/cells/cell1" [1] "./mutations/cells/cell2" Error in mutations[, c("Func.refGene", "mutation", "ExonicFunc.refGene", : incorrect number of dimensions
with your data [1] "cells" [1] "./mutations/cells/cell1" Error in mutations[, c("Func.refGene", "mutation", "ExonicFunc.refGene", : incorrect number of dimensions
Thank you
@Hemantcnaik Could you please double check if the working directory and the files are under the directory "./mutations/cells/cell1"?
@tianshilu Yes , I have checked and you can check with link which I have provided all files are there in the folder, and provided script also which used for running
I have checked with your example file also which is giving same error not able to figure it out can you please help me
Thank you
Hello @Hemantcnaik ,
Your coverage file has three columns but not four columns which is required by the tool. The four columns should be chromosome, position, nucleid acid, and coverage. The example file works fine on my end. Could you reformat your input files and try?
Tianshi
hello @tianshilu
Some how not able resolve my issue please help me I have reformatted my data and checked still getting same error, i have checked same folder format structure as mentioned in the readme, and copy pasted the coverage and mutation files what you given in the data to respective folder. still getting same error with your files.
please correct in code and folder structure
data coverage and mutation files are from the what you have given
I am attaching link contained test data test data
Thank you
@tianshilu
Can you please help, correcting me on above comment what I am doing wrong
Thank you
@Hemantcnaik Sorry for the late response. I checked your command and input files. I found you set the coverage cutoff to be 3 and the coverage for all the positions is 1. Sclineager will filter out the positions with coverages under the coverage cutoff. Thus, no position is kept after filtering. Do your samples have all positions with coverage 1? I recommend you to get the genome coverage information by tools such as GATK, bedtools etc.
Tianshi
Hi,@Hemantcnaik Did you solve the problem? I also had the same problem. I hope you can help me with the problem
Error in mutations[, c("Func.refGene", "mutation", "ExonicFunc.refGene", : incorrect number of dimensions
Hi, @tianshilu Did you use the somatic.pl to get the coverage and mutation files?
@ysq1770368148
For your first question, it looks like a problem with the somatic mutation calling pipeline. Weirdly we haven't run into any problem with the somatic mutation calling pipeline on our end ourselves. But we did have recently updated it. Could you please re-run your data through it? https://github.com/tianshilu/QBRC-Somatic-Pipeline. Let me know if you still have problem, and we will take a closer look
For your second question, yes, we did use somatic.pl to get the coverage and mutation files
Hello,
Thanks for the method, I have mouse single cell RNA seq data how I can apply this SClineager to mouse datasets, I have allele specific read count file for each SNPs a test data, my data link mentioned below, similar datasets I have it for different single cells.
Test data file
Can you please suggest me how your handling allelic drop out or how I can apply SClineager for my datasets
Thank you