Closed wentgithub closed 5 years ago
Q1: can this tools applied to germline and somatic amplicon and Hybrid data, if so, how it decide whether to markduolicates or data? in your article, it says
Yes it can. For vanilla PCR sequencing data, you don't need any pre-processing as Octopus identifies and removes PCR and optical duplicates internally - bases on read position. However, if you have samples which has undergone specific library preparation procedures to improve duplicate identification (e.g. UMI), then you probably want to disable this feature and have Octopus only remove duplicates marked in the input alignments (see command --allow-octopus-duplicates
).
For such high-depth data, you will probably need to adjust some other parameters (e.g. downsampling limits). Have a look at the UMI config for more ideas.
Q2: for tumor-only mode, is there any resource to filter FP like the gatk germline resource?
No. I'm sceptical of this approach. There's nothing stopping you doing this downstream if you want though.
Q3: How the tool detect complex variant, can I see the origin code thanks a lot
Most 'complex' variants will be discovered by the local de novo assembly candidate generator. The source code for this is located here.
@dancooke thanks a lot. for Q1. I am still a liitle puzzled, because I do not kown vanilla PCR sequencing data is, is this amplicon data? for amplicon data, we usually do not do deduplicates for hybird data, we do deduplicates, how does this tool distinguish this, can you descbibe more clearly, thanks a lot
I also have anothe question, does this caller give the variants ara all in the first strand, or put it in another way, the variant in the vcf, how can I konw it belongs to first strand or second? thanks a lot
for Q1. I am still a liitle puzzled, because I do not kown vanilla PCR sequencing data is, is this amplicon data? for amplicon data, we usually do not do deduplicates for hybird data, we do deduplicates, how does this tool distinguish this, can you descbibe more clearly, thanks a lot
By 'vanilla PCR' I meant any experimental design including amplification where reads originality from duplicate fragments can be removed computationally (e.g. WGS/WES). Amplicon sequencing would not fall in this category as there's no way to computationally identify duplicate fragments. It sounds like you're already doing the right thing. Octopus does not distinguish what time library preparation you have done, it simply applies a naive de-duplication algorithm (by default) to all input reads, it's intention is the same as GATK MarkDuplicates. You must disable this functionality if your data is not appropriate for this type of de-duplication.
I also have anothe question, does this caller give the variants ara all in the first strand, or put it in another way, the variant in the vcf, how can I konw it belongs to first strand or second? thanks a lot
All variants in VCF are w.r.t. the forward strand. Please try reading the VCF specification if you have any other questions regarding VCF - this is a well known & used format.
thanks a lot
hello, thanks for supplying such a powerful tool. i have several other questions here. Q1: can this tools applied to germline and somatic amplicon and Hybrid data, if so, how it decide whether to markduolicates or data? in your article, it says
Q2: for tumor-only mode, is there any resource to filter FP like the gatk germline resource?
Q3: How the tool detect complex variant, can I see the origin code thanks a lot