Open HenriettaHolze opened 1 month ago
So, I'm trying to run the whole nextflow pipeline on my data.
I've adjusted the config.xml to
umi_length 10
fileWithAllPossibleTenXbarcodes 737k-august-2016.txt
And I've modified the nextflow pipeline to accept parameters for 5' sequencing in step 1 and 3
--noPolyARequired
and --fivePbc
The pipeline completes on a small subset of my data and 04b.matrices/molecules.tags.GE.bam
contains 88,481 reads, which is reasonable.
However, I cannot manage to run the SNV calling. It tells me that "nothing has been detected".
Below are some BAM records from 04b.matrices/molecules.tags.GE.bam
. Could you please confirm that this is the correct input format for the SNV calling? Maybe I'm still not using the pipeline correctly for 5' data?
Thanks!
TGAGGGAAGTGCCAGA-AGAGCTCTGG-1 2048 chr17 30636139 60 31H20M6I1M1I2M1I5M133705N2M1I10M4D1M3D9M1I10M1D3M1D50M6331N106M182H * 0 0 AATACTTACACTCTTTCCCTACACGACGCTCTTCCTATCTTGAGGGAAGTGCCAGAAGAGCTCTGGTTTCTTATATGGGGGAAAGACATTTGTTGCACAAATGACAGTATTTGATAAAAACAGGCGCTTACAGCTTTTAGATGGGGAATATGAAGTAGCCATGCAGGAAATGGAAGAATGTCCAATAAGCAAGAGAGAAGCAACATGGGAGACTATTCTTGATGGGAAG $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ s1:i:130 s2:i:0 U8:Z:AGAGCTCTGG SA:Z:chr17,31976566,+,105S304M11600D33S,60,12; BC:Z:TGAGGGAAGTGCCAGA GE:Z:LRRC37BP1,SUZ12P1 XF:Z:UTR NM:i:35 RN:i:1 AS:i:87 GS:Z:+,+ de:f:0.1096 rl:i:0 cm:i:45 nn:i:0 tp:A:P ms:i:143 ts:A:+
AGGTCATTCCTCCTAG-AGAGTTTCAC-1 256 chr17 30731643 0 128S64M1I34M1I37M1D189M12I85M21I71M2773N47M89N65M8203N69M16121N28M1I21M6963N66M2D2M1D3M1D11M3283N27M1D52M1D13M6331N70M1D14M164S * 0 0 * * s1:i:681 U8:Z:AGAGTTTCAC BC:Z:AGGTCATTCCTCCTAG GE:Z:SUZ12P1 XF:Z:UTR NM:i:72 RN:i:1 AS:i:718 GS:Z:+ de:f:0.0408 rl:i:15 cm:i:184 nn:i:0 tp:A:S ms:i:841 ts:A:+
CGGACGTGTTACGACT-CCATGCCTCA-1 256 chr17 30731652 0 62S51M1D2M1I97M1D11M1I3M1D2M2D113M8D24M12I80M22I76M2773N47M89N65M24393N28M1I21M6963N50M1D1M4I34M2245N78M1I141M1I12M807N47M1I47M6331N70M1D31M179S * 0 0 ** s1:i:906 U8:Z:CCATGCCTCA BC:Z:CGGACGTGTTACGACT GE:Z:SUZ12P1 XF:Z:UTR NM:i:75 RN:i:1 AS:i:894 GS:Z:+ de:f:0.0279 rl:i:30 cm:i:243 nn:i:0 tp:A:S ms:i:1025 ts:A:+
AAGTCTGCACCGGAAA-GTAAACGCAT-2 256 chr17 30731657 0 110S61M1D145M1D51M1D7M1D3M1I29M3D6M2D85M21I19M1D51M2773N47M89N65M8203N69M16121N28M1I21M6963N86M2245N219M1I12M807N94M6331N79M5I27M182S * 0 0 * * s1:i:940 U8:Z:GTAAACGCAT BC:Z:AAGTCTGCACCGGAAA GE:Z:SUZ12P1 XF:Z:UTR NM:i:65 RN:i:2 AS:i:951 GS:Z:+ de:f:0.0312 rl:i:0 cm:i:269 nn:i:0 tp:A:S ms:i:1083 ts:A:+
CGATTGAAGTATCTCG-TACGGCTACG-1 256 chr17 30731658 0 118S19M1D29M2I34M1I14M2D21M1D60M1D10M1I12M1D5M1D10M2D1M1I28M1D57M12I80M12I5M12I24M1D5M1D40M2773N47M89N65M8203N37M1D2M1D28M16121N28M1I21M6963N86M2245N219M1I12M807N94M6331N48M* 0 0 * * s1:i:799 U8:Z:TACGGCTACG BC:Z:CGATTGAAGTATCTCG GE:Z:SUZ12P1 XF:Z:UTR NM:i:85 RN:i:1 AS:i:846 GS:Z:+ de:f:0.0422 rl:i:30 cm:i:214 nn:i:0 tp:A:S ms:i:984 ts:A:+
Hi. I have the same problem running de sicelore pipeline from polyA data. Have you found something?
it seems read number per UMI is always 1 or 2, for SNV calling, RN default is 3 i guess, you have to change it to integrate the molecules having 1 or 2 reads. Please be sure to have some signal on the SNVs you are calling, visualiing the bam fil ein IGV prior to calling. The format of the .csv SNV file is also important but it shoud works.
I'm very sorry. I was mixing chromosome names as you suggested also for @HenriettaHolze . I did write the wrong chromosomes for the SNPs file. Now it's working.
Besides, is the tool able to genotype small indels?
Yes I was also able to resolve the issue, I specified the wrong strand in the csv.
I got the consensus sequence from sicelore but had to use cellsnv-lite for SNV calling because it was too resource intensive with sicelore.
@cartof I'm also looking for a small indel caller, please let me know if you've found something :)
Hi,
I have run the epi2me-labs/wf-single-cell workflow on my nanopore data but would like to create consensus sequences for reads tagged with the same cell barcode, UMI and gene and then do SNV calling.
Is it possible to use sicelore to create the consensus sequence from the BAM file, re-align and call SNVs?
The SAM flags differ and I am a bit confused what parameters I would have to set to make the pipeline take the epi2me BAM file.
It is single-cell 5' 10X data with PCR amplification (no polyA expected).
Below an example BAM record. If I subset the reads to my genes of interest, I have 3,928,283 reads and 248,167 UMI.
If it is not possible to concatenate the 2 workflows, I would run sicelore start-to-finish with the nextflow pipeline, and then Step5 SNV calling.
What parameters do I have to set in the nextflow pipeline for PCR amplified 5' 10X data?
Would it make sense to specify the filtered cell barcode list of non-empty droplets from cellranger from matched short-read scRNA-seq data?
Thanks a lot for your help!
Best, Henrietta