Closed georgiiprovisor closed 1 month ago
By default, NEAT will still produce some SNPs as background mutations and errors outside of targeted regions, but those should also be reflected in the BAM. I'll have to dig in deeper to see what might be going on. Are you using the master branch or a tagged version of NEAT for this?
I used the master branch.
There are too many SNPs outside golden bam, here is the statistic:
TotalGoldenVariants,1262106
FilteredGoldenVariants,0
MergedGoldenVariants,158
RedundantGoldenVariants,77
TotalWorkflowVariants,144001
FilteredWorkflowVariants,0
MergedWorkflowVariants,29
RedundantWorkflowVariants,0
PerfectMatches,143686
FN_variants,1118420
FP_variants,375
EquivalentVariants,1005
Seems like 10x more variants than are relly here.
Okay, let me see if I can replicate.
If it helps, I filtred "golden" vcf by capture kit. Leave only SNPs in the targeted area. And now it looks more realistic:
Total Golden Variants: 68287 [ 0 filtered, 9 merged, 2 redundant ]
Total Workflow Variants: 63435 [ 0 filtered, 10 merged, 0 redundant ]
Perfect Matches: 63342 (92.76%)
FN variants: 4945 (7.24%)
FP variants: 99
Number of equivalent variants denoted differently between the two vcfs: 520
Okay. I'll see if there is a bug in the code causing this, or if it was a 'feature' as sometimes happens.
Describe the bug I generate reads with the following command:
In the golden VCF, I see a lot of SNPs that are not present in golden BAM (these regions are not covered at all) nor in the vcf that was used for producing reads.
Is it a bug or I'm using the tool wrong?