griffithlab / civicpy

A python interface for the CIViC db application
MIT License
9 stars 5 forks source link

Update logic for how to annotate CIViC evidence in the CIViC VCF file #27

Closed susannasiebert closed 5 years ago

susannasiebert commented 5 years ago

The VCFWriter will now output entries on a per-variant level instead of per-evidence item. This is so that the final VCF only has one entry per variant. Any per-evidence INFO fields were removed.

To conform to Google BigQuery standards, CIViC data is annotated into the CSQ field (see https://github.com/googlegenomics/gcp-variant-transforms/blob/master/docs/variant_annotation.md). It was decided during one of the CIViC meetings that each evidence item and assertion entity would be annotated in their own CSQ entries, with a field denoting whether the CSQ entry is for an evidence item or an assertion.

This PR also fixes the variant sorting logic to account for integer vs string sorting.

Certain fields (variant aliases, and HGVS strings) were commented out for now since they may contain = special characters. We need to decide how to proceed. The correct way would be to hex-encode special characters but that results in very ugly annotations.

Closes #3