sigven / pcgr

Personal Cancer Genome Reporter (PCGR)
https://sigven.github.io/pcgr
MIT License
253 stars 48 forks source link

What does mapping_rank mean in grch37/civic/civic.biomarkers.tsv ? #130

Closed sunsong95 closed 3 years ago

sunsong95 commented 3 years ago

Thanks for your research!

I'm very interested in how pcgr integrates the scattered annotation resources into the final results. I observed that the files in the civic directory of grch37 data bundle are created by files in https://civicdb.org/releases, but I don't quite understand how to get the column named "mapping_rank" which in the civic.biomarkers.tsv, and what does it mean?

In addition, column named "alterationtype" is modified based on the civic source infomation? For example, I noticed "missense variant,transcript_ Fusion" was changed to "TRANSLOCATION_FUSION_MUT"

Thanks!

sigven commented 3 years ago

Hi @sunsong95,

Very good observation! The mapping_rank variable is created during consolidation of CIViC data, it's essentially a numeric indicator for how precise a given biomarker is reported in the literature (and accordingly by CIViC). I believe this matter is frequently ignored by many, but it's an important aspect for variant interpretation, and for how a given variant in a given tumor can be mapped towards biomarkers. Essentially, a mapping_rank of 1 means that the biomarker (variant) was mapped exactly to the genome (with ref and alt alleles, en example being BRAF V600E), a mapping_rankof 2 means that the biomarker was mapped to a codon (e.g. BRAF V600), a mapping_rank of 3 is for the exon (e.g. EGFR exon 19), 4 is at the gene level (mutations), and 5 is also at the gene level (non-mutations, i.e. expression biomarkers etc).

Yes, the alteration_type is modified internally in PCGR, basically for convenience and slight simplificiation, but essentially using the CIViC data to set this variable.

Hopefully this may clarify somewhat.

kind regards, Sigve

sunsong95 commented 3 years ago

Hi @sigven

Thank you for your reply.

I also want to know if I can update the annotation resources files (e.g. CIViC or cancerhotspots) by myself so that the clinical annotation notes are always up-to-date. If I only update the annotation file in the corresponding folder (e.g. ~/grch37/civic/), can I get the correct result/report?

For example, I noticed that the data package of civic (https://civicdb.org/releases) is updated almost every month. So if I can update the resource pack in time, it would be great.

Thanks!

sigven commented 3 years ago

Hi @sunsong95,

I am afraid it will not work just yet, although I surely realize that this will be the optimal situation, to ensure that all databases are up-to-date. PCGR relies on a fairly large number of resources, and I have established multiple update scripts for these, but as now not yet streamlined so that users can update whenever they like. I hope to support such a strategy in the future, meanwhile I will try to update the bundle more frequently than what i have done recently. Sorry for this slight inconvenience, but it requires a fair amount of work to ensure that it works without errors for the users.

kind regards, Sigve