solgenomics / sgn

The code behind the Sol Genomics Network, Cassavabase and other Breedbase websites
https://solgenomics.net
MIT License
66 stars 35 forks source link

fix reading VCF with no header #5206

Open ClayBirkett opened 1 week ago

ClayBirkett commented 1 week ago

When the VCF has no header then the first few accessions are skipped while loading This modification works on the transposed file

5203

Checklist

lukasmueller commented 3 days ago

Instead of starting with the first line that contains number - slash - number, the parsing should simply start at the line after the line starting with #CHROM ?

ClayBirkett commented 3 days ago

The error is when reading in the transposed VCF so there is no #CHROM line. It might be that the problem is with the creation of the transposed file not reading the file.

lukasmueller commented 2 days ago

In the code it actually skips 8 entries, which is correct for the untransposed file, but probably incorrect for the transposed file?

ClayBirkett commented 2 days ago

I have it working now. The problem was that the parse_with_plugin() function was reading a fixed number of comment lines which is never true. I changed this to read all lines starting with "##", the comment lines. Then the next_genotype() function then skips the crhom, pos, id, ref, alt correctly