stevekm / reportIT

IonTorrent variant reporting pipeline for clinical interpretation of cancer panel results
GNU General Public License v3.0
7 stars 4 forks source link

missing deletion mutations from pipeline output #25

Closed stevekm closed 7 years ago

stevekm commented 7 years ago

Some deletion variants which were previously identified manually from IonTorrent output may not have been included in reportIT pipeline output; need to verify whether or not this is the case, and if so adjust pipeline to ensure their inclusion. It is possible that filtering steps were put in place early during development to intentionally remove deletions, and system requirements have since changed.

stevekm commented 7 years ago

This is indeed the case. Potential culprit of this is due to the VCF-based files listing the Ref/Alt values for variants differently than the ANNOVAR-based files; during conversion to ANNOVAR format, leading common bases are stripped from the Ref/Alt values (e.g. Ref: C / Alt: CCAGA becomes Ref: - / Alt: CAGA). An inner merge is used to merge the VCF metadata with the ANNOVAR output, using these columns as merge keys. Since they no longer match, these entries are being silently dropped from the pipeline output. Need to come up with a solution for this, since neither the VCF-based files nor the current ANNOVAR output alone contain all necessary metadata. Possibilities: