iobio / gene.iobio

Gene.iobio vue
MIT License
55 stars 11 forks source link

Some values not being exported with CSV exports #941

Closed jdy4389 closed 1 year ago

jdy4389 commented 1 year ago

I did some testing after I saw the recent changes for fixing AF on flagged variant exports with 4.7.1a, and noticed that polyphen/SIFT values don't seem to be populated anymore, they are empty fields. REVEL, which comes after those 2, is still being exported. I experience this with my own local VCF/CRAM data as well as when running with the demo data on the main gene.iobio.io site and exporting the results to CSV.

Also, I'm not sure what the criteria is for whether or not the bamDepth values for proband/mother/father are supposed to be exported, but for as long as I've used the tool I've noticed issues with that. Recently it seems like sometimes it is, and sometimes it isn't filled in, and I haven't been able to pinpoint why. When I exported the CSV with the trio demo data for example, all bamDepth values are empty, but when I tested exporting a single flagged variant in a different VCF/CRAM trio dataset which was heterozygous in the proband/father of a trio, the bamDepthProband is there, but bamDepthFather is empty. I was wondering if perhaps it is only populated if the bamDepth is different than the depth/combined refcount/altcount, as I've noticed that as a pattern, but I wasn't sure.

gene-iobio-flagged-variants-test.csv

I've attached the .csv from exporting the demo trio data, which shows the blank values

tonydisera commented 1 year ago

Thank you for the informative report on these gene.iobio issues. I will look into these this week and hopefully have a fix by the end of this week or next. Best regards, Tony Di Sera

tonydisera commented 1 year ago

I'm working on this bug fix right now. Since we no longer display SIFT and Polyphen in the variant detail panel, I would vote for removing it from the .csv. Any objections?

AlistairNWard commented 1 year ago

No objections

tonydisera commented 1 year ago

In order to export bamDepth (along with bamDepthMothe, bamDepthFather), preprocessing code was added to CohortModel.promiseExportVariants. This code refreshes the variants to be exported with these field by finding the matching coverage region in the bam depth. This code that finds the matching coverage region was incorrectly looking for an exact match on the variant start position. The code was fixed to look for a matching region, as the bam depth records are binned to minimize traffic and memory consumption.

Code in model.VariantExporter had to be fixed to properly initialize the bamDepth fields for called variants. Exporting called variants in a vcf format is an edge case that requires special code that recalls the variants in order to capture the correct vcf header and records. This code also had to preserve the bamDepth fields that were obtained in the CohortModel preprocessing described above.

In summary, the bamDepth, bamDepthMother, and bamDepthFather fields are now exporting properly. Also, for clarity, SIFT and Polyphen columns have been removed from the exported dataset since they are no longer displayed in the variant detail panel.