oncokb / oncokb-annotator

Annotates variants in MAF with OncoKB annotation.
GNU Affero General Public License v3.0
122 stars 61 forks source link

`INFO:AnnotatorCore:Cancer type for the sample should be defined for a more accurate result.` #213

Open Teezi opened 10 months ago

Teezi commented 10 months ago

Hi,

I'm using MafAnnotator.py and encountering numerous warnings: INFO:AnnotatorCore:Cancer type for the sample should be defined for a more accurate result.

I'm wondering how I can define the cancer type for my sample or disable these warnings.

Many thanks!

jjc2718 commented 10 months ago

I don't have any affiliation with OncoKB, but I was looking into this recently for a project of mine and here's what I found:

Hope this helps! I'd be particularly curious what the answer is to my third point - I annotated a large number of MAF files without specifying a cancer type, but we're primarily interested in the oncogenic vs. neutral variant annotations. It would be good to know if I need to re-annotate them or if it won't have any effect on those calls.

zhx828 commented 10 months ago
  • nd mutation effect annotations appear

Sorry about the late reply! To ur third question, it affects Therapeutics/Diagnostic/Prognostic implications. The tumor type summary will not be included if it's not there.

paulsalachan commented 8 months ago

Hi,

I get the same warnings when running MafAnnotator.py.

INFO:AnnotatorCore:Cancer type for the sample should be defined for a more accurate result

I have tried to include either CANCER_TYPE OR ONCOTREE_CODE in the clinical data file provided as input using the -c option. According to the documentation, the cancer type should be assigned based on the clinical data file as it has the highest priority. So something must be going wrong. Having the cancer type column in the input file -i does not help either.

However there are no warnings when the default tumor type -t is set, but this is only possible when you have one cancer type in your dataset. I guess I could subset the data for each cancer type and run the annotation separately, but that would defeat the purpose of the -c option? But also, there does not seem to be any check on whether a valid cancer type is specified by the -t option, so I could specify some random string, and it would not complain or give a warning about cancer type not being specified.

Do you know what could be going on here? Ideally, I would like to be able to specify different levels of ONCOTREE_CODE and get output for those levels.

Thanks for your help.

zhx828 commented 8 months ago

Hi,

I get the same warnings when running MafAnnotator.py.

INFO:AnnotatorCore:Cancer type for the sample should be defined for a more accurate result

I have tried to include either CANCER_TYPE OR ONCOTREE_CODE in the clinical data file provided as input using the -c option. According to the documentation, the cancer type should be assigned based on the clinical data file as it has the highest priority. So something must be going wrong. Having the cancer type column in the input file -i does not help either.

However there are no warnings when the default tumor type -t is set, but this is only possible when you have one cancer type in your dataset. I guess I could subset the data for each cancer type and run the annotation separately, but that would defeat the purpose of the -c option? But also, there does not seem to be any check on whether a valid cancer type is specified by the -t option, so I could specify some random string, and it would not complain or give a warning about cancer type not being specified.

Do you know what could be going on here? Ideally, I would like to be able to specify different levels of ONCOTREE_CODE and get output for those levels.

Thanks for your help.

Hi @paulsalachan , in the example script, I have clinical file referenced in most annotator scripts so the -c should work. For the clinical file you created, do you also have SAMPLE_ID column? I'm happy to take a look at your files if you send me a snapshot.

We currently do not have any checks on cancer type which I think is a good idea to support. https://github.com/oncokb/oncokb-annotator/issues/214

paulsalachan commented 8 months ago

Hi @zhx828, thank you for your quick reply. That resolved it. In the clinical file I had the Sample ID column but the column was named 'Tumor_Sample_Barcode' instead. When I renamed it to 'SAMPLE_ID', it is annotating without any warnings, so that's great! A suggestion would be to be able to provide either 'SAMPLE_ID' or 'Tumor_Sample_Barcode' as column header, so that it is consistent with the header in the Maf file. Thanks for your time with the help!

zhx828 commented 8 months ago

Hi @zhx828, thank you for your quick reply. That resolved it. In the clinical file I had the Sample ID column but the column was named 'Tumor_Sample_Barcode' instead. When I renamed it to 'SAMPLE_ID', it is annotating without any warnings, so that's great! A suggestion would be to be able to provide either 'SAMPLE_ID' or 'Tumor_Sample_Barcode' as column header, so that it is consistent with the header in the Maf file. Thanks for your time with the help!

Oh Tumor_Sample_Barcode is supposed to be supported but for some reason it's not in for the clinical file. I made a patch to fix the issue https://github.com/oncokb/oncokb-annotator/releases/tag/v3.4.1