inmybrain / SClineager

SClineager: a Bayesian hierarchical model that performs lineage tracing of single cells based on genetic markers
GNU General Public License v3.0
9 stars 2 forks source link

Hello I have a mouse dataset how can I use SClineager #2

Closed Hemantcnaik closed 6 months ago

Hemantcnaik commented 3 years ago

Hello,

Thanks for the method, I have mouse single cell RNA seq data how I can apply this SClineager to mouse datasets, I have allele specific read count file for each SNPs a test data, my data link mentioned below, similar datasets I have it for different single cells.

Test data file

Can you please suggest me how your handling allelic drop out or how I can apply SClineager for my datasets

Thank you

tianshilu commented 3 years ago

Hi @Hemantcnaik ,

Thanks for using SClineager. After installation of the package, you can handle drop out alleles by using following commends in a R environment.

"read_sclineager(runinfo,coverage_cutoff,coverage_percentage, cell_percentage, out_folder, artefact_percentage)" "run_sclineager(file_in=file_in0,folder=file_out,categories=categories,max_iter=max_iter,keep_genes=mask_genes,vaf_offset=vaf_offset,dfreedom=dfreedom,skip_common=skip_common,psi=NULL,control=control,save=FALSE)"

You can refer to "Read data" and "Run SClineager" sections of https://github.com/inmybrain/SClineager/blob/master/README.md for more details.

Thanks,

Hemantcnaik commented 3 years ago

Hi @tianshilu

Thank for the quick reply

I have successfully install SClineger in R,

I am confused with the input file format, mentioned test data file above, can you please suggest me is there any modification I have to do. I am not aware of what type of input file I have to provide when I tried with the example data it gave me error, I followed as mentioned in Readme and i got the error created all directory mentioned and copy the input file from.

coverage_cutoff <- 3 coverage_percentage <- 0.2 cell_percentage <- 0.2 artefact_percentage <- 0.03

for (folder in c(301)){ print(folder) runinfo <- data.frame( Cell = list.files(paste("./mutations/", folder, sep = "")), Path = list.files(paste("./mutations/", folder, sep = ""), full = T), stringsAsFactors = F ) out_folder <- paste("./processed/", folder, sep = "") preprocess_genetics <- read_sclineager(runinfo, coverage_cutoff, coverage_percentage, cell_percentage, out_folder, artefact_percentage) }

ERROR what I got

Error in mutations[, c("Func.refGene", "mutation", "ExonicFunc.refGene", : incorrect number of dimensions

Thank you

tianshilu commented 3 years ago

@Hemantcnaik Sorry for the confusion. For each cell of each sample, you will need two files for input. One is coverage file and the other is mutation file. For coverage file, there are three columns of "chromosome", "postion", "nucleoacid", and "coverage" separated by tab without colunames (shown below) chr10 3191043 T 1 chr10 3191044 C 1 chr10 3191045 T 1 chr10 3191046 C 1 chr10 3191047 A 1 chr10 3191048 A 1 chr10 3191049 G 1 chr10 3191050 G 1 chr10 3191051 C 1 chr10 3191052 T 1 The mutation file contains 14 columns with column names and information shown below:

Chr Start End Ref Alt Caller Normal_ref Normal_alt Tumor_ref Tumor_alt Func.refGene Gene.refGene ExonicFunc.refGene AAChange.refGene chr10 111486741 111486741 T C strelka_germline 2 4 2 4 exonic Nap1l1 synonymous SNV Nap1l1:NM_015781:exon4:c.T201C:p.I67I chr11 6478113 6478113 G GAA strelka_germline 4 6 4 6 intergenic Purb;Myo1g . . chr11 20685238 20685238 G GT strelka_germline 4 6 4 6 UTR3 Aftph . . chr11 24041505 24041505 A C strelka_germline 0 6 0 6 intergenic Papolg;Bcl11a . . chr11 24041506 24041506 A G strelka_germline 0 6 0 6 intergenic Papolg;Bcl11a . . chr11 52122587 52122587 T G strelka_germline 10 4 10 4 UTR3 Ppp2ca . . chr11 52389130 52389130 T C strelka_germline 4 4 4 4 UTR3 Vdac1 . . chr11 55454900 55454900 T C strelka_germline 10 4 10 4 exonic Atox1 synonymous SNV Atox1:NM_009720:exon2:c.A63G:p.R21R chr11 63444278 63444278 A C strelka_germline 0 16 0 16 intergenic Pmp22;Hs3st3b1 . . chr11 69396182 69396182 T G strelka_germline 4 6 4 6 exonic Naa38 synonymous SNV Naa38:NM_030083:exon2:c.T87G:p.S29S chr11 109011936 109011936 T C strelka_germline 0 6 0 6 intergenic Axin2;E030025P04Rik . . chr11 109011942 109011942 T C strelka_germline 0 6 0 6 intergenic Axin2;E030025P04Rik . . chr11 109011999 109011999 A G strelka_germline 0 6 0 6 intergenic Axin2;E030025P04Rik . . chr11 109012019 109012019 A G strelka_germline 0 6 0 6 intergenic Axin2;E030025P04Rik . . chr11 114668590 114668590 T C strelka_germline 6 6 6 6 UTR5 Rpl38 . . chr12 72938859 72938859 A T strelka_germline 0 6 0 6 intergenic 4930447C04Rik;Six6 . . chr12 72938864 72938864 T C strelka_germline 0 8 0 8 intergenic 4930447C04Rik;Six6 . . chr12 104475105 104475105 T G strelka_germline 0 6 0 6 intergenic Gsc;Dicer1 . . chr12 104475106 104475106 C A strelka_germline 0 6 0 6 intergenic Gsc;Dicer1 . .

You can either arrange your results to the format described above. Alternatively, you can also use QBRC mutation calling pipeline https://github.com/tianshilu/QBRC-Somatic-Pipeline to get coverage file and mutation file directly.

Tianshi

Hemantcnaik commented 3 years ago

@tianshilu

My main objective to handling the allelic dropout as you mentioned in paper your method can handle the dropout event, can you please suggest me how can I use your method on my data it will be very help full, data is from mouse

suggestions and tips will be helpful Thank you

tianshilu commented 3 years ago

@Hemantcnaik You can format your input files according to the example files shown here: https://github.com/inmybrain/SClineager/blob/master/data/germline_mutations_mm10.txt; https://github.com/inmybrain/SClineager/blob/master/data/coverage.txt

Then, you can directly apply SClineager to coverage.txt and germline_mutations_mm10.txt.

Thanks!

Hemantcnaik commented 3 years ago

@tianshilu

I have created the files as mentioned format. still I am getting error I have tried with your example data also still same error, for your convenience i am providing data of mine and code which I have tried can you please let me know what is the problem, why I am getting error.

Link with folders as you mentioned in Readme and code which I have tried

data files

error what I got [1] "cells" [1] "./mutations/cells/cell1" [1] "./mutations/cells/cell2" Error in mutations[, c("Func.refGene", "mutation", "ExonicFunc.refGene", : incorrect number of dimensions

with your data [1] "cells" [1] "./mutations/cells/cell1" Error in mutations[, c("Func.refGene", "mutation", "ExonicFunc.refGene", : incorrect number of dimensions

Thank you

tianshilu commented 3 years ago

@Hemantcnaik Could you please double check if the working directory and the files are under the directory "./mutations/cells/cell1"?

Hemantcnaik commented 3 years ago

@tianshilu Yes , I have checked and you can check with link which I have provided all files are there in the folder, and provided script also which used for running

I have checked with your example file also which is giving same error not able to figure it out can you please help me

Thank you

tianshilu commented 3 years ago

Hello @Hemantcnaik ,

Your coverage file has three columns but not four columns which is required by the tool. The four columns should be chromosome, position, nucleid acid, and coverage. The example file works fine on my end. Could you reformat your input files and try?

Tianshi

Hemantcnaik commented 3 years ago

hello @tianshilu

Some how not able resolve my issue please help me I have reformatted my data and checked still getting same error, i have checked same folder format structure as mentioned in the readme, and copy pasted the coverage and mutation files what you given in the data to respective folder. still getting same error with your files.

please correct in code and folder structure

data coverage and mutation files are from the what you have given

I am attaching link contained test data test data

Thank you

Hemantcnaik commented 3 years ago

@tianshilu

Can you please help, correcting me on above comment what I am doing wrong

Thank you

tianshilu commented 3 years ago

@Hemantcnaik Sorry for the late response. I checked your command and input files. I found you set the coverage cutoff to be 3 and the coverage for all the positions is 1. Sclineager will filter out the positions with coverages under the coverage cutoff. Thus, no position is kept after filtering. Do your samples have all positions with coverage 1? I recommend you to get the genome coverage information by tools such as GATK, bedtools etc.

Tianshi

ysq1770368148 commented 2 years ago

Hi,@Hemantcnaik Did you solve the problem? I also had the same problem. I hope you can help me with the problem

Error in mutations[, c("Func.refGene", "mutation", "ExonicFunc.refGene", : incorrect number of dimensions

ysq1770368148 commented 2 years ago

Hi, @tianshilu Did you use the somatic.pl to get the coverage and mutation files?

wtwt5237 commented 2 years ago

@ysq1770368148

For your first question, it looks like a problem with the somatic mutation calling pipeline. Weirdly we haven't run into any problem with the somatic mutation calling pipeline on our end ourselves. But we did have recently updated it. Could you please re-run your data through it? https://github.com/tianshilu/QBRC-Somatic-Pipeline. Let me know if you still have problem, and we will take a closer look

For your second question, yes, we did use somatic.pl to get the coverage and mutation files