ncsa / NEAT

NEAT (NExt-generation Analysis Toolkit) simulates next-gen sequencing reads and can learn simulation parameters from real data.
Other
47 stars 14 forks source link

Missing information from "golden" vcf output #71

Closed sfragkoul closed 4 months ago

sfragkoul commented 1 year ago

Hey there,

first of all thank you for this amazing tool, it has really helped me in my research!

I have noticed that in the "golden" .vcf output file there is some information missing.

Specifically in the INFO column there is only the value for the ploidy indicator (WP), although in the meta-data section there are others mentioned (i.e. DP, AF, DUP etc).

This is the command that I ran to generate my data: python gen_reads.py -r testing/TP53/TP53.fasta -R 150 -c 5000 -M 0.1 -o testing/TP53/TP53 --pe 300 30 --bam --vcf.

I am also attaching a print screen from the "golden" file. image

Thank you in advance, Stella

joshfactorial commented 1 year ago

I suspect that this was a case where the original programmers had ambitions to add that sort of thing, but never actually did. The only code in the original script was for that WP field. This is something we can look to add in an upcoming release though.

joshfactorial commented 1 year ago

It's also been pointed out that NEAT omits this info from an input VCF as well. That's something I will remedy in version 4.0, which is currently undergoing some refinement but will be ready soon.

sfragkoul commented 1 year ago

Hey @joshfactorial thanks for your response! Looking forward to using the new version, when it's ready :)

joshfactorial commented 4 months ago

This should be fixed now.