ncsa / NEAT

NEAT (NExt-generation Analysis Toolkit) simulates next-gen sequencing reads and can learn simulation parameters from real data.
Other
47 stars 14 forks source link

Specification of variant allele frequency for inserted mutation ? #66

Closed yvanw closed 4 months ago

yvanw commented 1 year ago

Hi,

Thanks a lot for creating this tools, they look very useful !

I am not sure if this is a feature that would be nice to have or if it already exists but so far I could not find it: is it possible to specify variant allele frequencies for injected variants ? For example, I would like to be able to specify that the variant 1:100000 A>T will be present with an allelic fraction of 9%.

Is this something that is possible already? If so, via which option? I tried so far to use the INFO field with AF=0.09 in the vcf containing injected variants (option -v) but I have the impression that this input is not considered. I also tried to specify this in the fourth field ("meta_data") of the bed file provided via the -tr option but my impression is that this is also ignored.

Best, Yvan

Ps. does the project have a user group ? It was for me not very straightforward to find on which medium or to who my question could be asked.

joshfactorial commented 1 year ago

Currently, yes, those fields are ignored in an input VCF. We're working on expanding the capabilities of NEAT and that is a feature we can consider adding on a future release. Currently the only thing it considers is the genotype of the variant. In the next version, we'll at least try to preserve those fields.

joshfactorial commented 1 year ago

This is the best place to ask questions for now.

joshfactorial commented 4 months ago

In the newest version, 4.2, which will be ready soon, we have preserved the data from the input vcf, and will output that as well, though simulated reads still will not have that information, just genotype.

joshfactorial commented 4 months ago

I added a backlog item to investigate this.