zstephens / neat-genreads

NEAT read simulation tools
Other
92 stars 27 forks source link

Multiple records with the same position in the golden VCF file #81

Open mohammedkhalfan opened 3 years ago

mohammedkhalfan commented 3 years ago

Describe the bug Depending on the mutation rate, NEAT will produce a golden VCF with multiple variants found per position. If you then give this VCF back to NEAT with the -v option, NEAT will skip inserting these variants and tell you "X variants skipped due to multiple variants found per position". Should NEAT be producing a golden VCF with multiple variants found per position? This doesn't happen when the mutation rate is very low, but happens when M = 4.5% for example. Ex:

SARS-CoV2       1443    .       T       A       .       PASS    WP=1
SARS-CoV2       1443    .       T       C       .       PASS    WP=1

To Reproduce Run NEAT genReads.py with a mutation model (-m) and set the mutation rate (-M) to 0.045.

Expected behavior I don't expect there to be multiple variants at the same position in the golden VCF.

Screenshots Here is an the error from NEAT if you provide the VCF as input:

reading input VCF...
found 1278 valid variants in input vcf.
 * 0 variants skipped: (qual filtered / ref genotypes / invalid syntax)
 * 12 variants skipped due to multiple variants found per position

Desktop (please complete the following information): NEAT-genReads V2.0

jdonzallaz commented 2 years ago

Hi, I have the same problem using default mutation rate/model. Any update ?