quinlan-lab / vcf2db

create a gemini-compatible database from a VCF
MIT License
55 stars 13 forks source link

Issue querying db's made with latest cyvcf2 #69

Open matthdsm opened 2 years ago

matthdsm commented 2 years ago

Hi @brentp,

I've come across a weird issue. I've updated one of our bcbio installations, resulting in an environment with the latest vcf2db + cyvcf2=0.30.14. When I try to query one of the db's generated with this setup using the gemini python API, I get a funky result when fetching genotypes.

when I try to print the gts field from a table row (gemini.GeminiQuery.GeminiRow) I get the following numpy array

["T" "" "" "T" "" "" "C" "" "" "/" "" "" "T" "" "" "T" "" "" "C" "" "" "T"
 "" "" "/" "" "" "T" "" "" "T" "" "" "C" "" "" "" "" "" "" "" "" "" "" "T"
 "" "" "/" "" "" "T" "" "" "T" "" "" "C" "" "" "" "" "" "" "" "" "" "" ""]

which should show

["TTC/TTC","T/TTC","T/TTC"]

e.g. the genotypes for three individuals. This is the case for our older installs running cyvcf2=0.20.9.

I suppose this error may have something to do with https://github.com/brentp/cyvcf2/issues/227. When downgrading cyvcf2 to the older version and regenerating the db, everything seems to work again.

Any thoughts? M

PS: Not sure if this is the right repo to post this, so feel free to move this issue to a more relevant place.