monarch-initiative / oncoexporter

Cancer data to GA4GH phenopacket
https://monarch-initiative.github.io/oncoexporter
MIT License
6 stars 1 forks source link

Unable to create VCF record for mutations with `-` in the allele strings #72

Open ielis opened 5 months ago

ielis commented 5 months ago

Some rows of the mutation table include alleles with - besides IUPAC bases.

For instance: Image

The VCF specification does not allow for dashes in the allele strings. Therefore, we cannot create a VCF record for the row without consulting the reference genome sequence and fetching the previous base.

It is not a big deal to use pysam or similar to fetch the base. However, that will require the library user to download a FASTA file. Alternatively, we could do a REST call to fetch the base. However, I am not sure I know of such API (perhaps variant validator starting from HGVS c str?).

The current code will skip creating a VCF record for these rows. The rest of the row, including the HGVS strings, tumor/normal read depths, etc. will be processed.