`INFO:AnnotatorCore:Cancer type for the sample should be defined for a more accurate result.`

Teezi commented 10 months ago

Hi,

I'm using MafAnnotator.py and encountering numerous warnings: INFO:AnnotatorCore:Cancer type for the sample should be defined for a more accurate result.

I'm wondering how I can define the cancer type for my sample or disable these warnings.

Many thanks!

jjc2718 commented 10 months ago

I don't have any affiliation with OncoKB, but I was looking into this recently for a project of mine and here's what I found:

It looks like you can specify a cancer type for each sample using the ONCOTREE_CODE or CANCER_TYPE headers in your MAF file (see here in the code for annotating samples).
The name format comes from OncoTree, I tried some examples and it seems like most of the top-level nodes there should work (e.g. "Liver Cancer", "Melanoma", "Breast Cancer", etc). It would be good to see some examples of this, though, since there are no cancer type columns in the example MAFs in the data directory.
I think whether or not cancer type is specified only affects the "level" of therapeutic implications for each variant; the oncogenic/neutral/unknown and mutation effect annotations appear to me to be unchanged. I only compared a few annotated MAFs manually, though, before and after specifying a cancer type - it would be good to get confirmation of exactly what part of the annotation process the cancer type specification is influencing from someone on the OncoKB team.
You should be able to turn off the warnings by changing the logging level here to something higher than INFO (e.g. logging.WARN should work).

Hope this helps! I'd be particularly curious what the answer is to my third point - I annotated a large number of MAF files without specifying a cancer type, but we're primarily interested in the oncogenic vs. neutral variant annotations. It would be good to know if I need to re-annotate them or if it won't have any effect on those calls.

zhx828 commented 10 months ago

nd mutation effect annotations appear

Sorry about the late reply! To ur third question, it affects Therapeutics/Diagnostic/Prognostic implications. The tumor type summary will not be included if it's not there.

paulsalachan commented 8 months ago

Hi,

I get the same warnings when running MafAnnotator.py.

INFO:AnnotatorCore:Cancer type for the sample should be defined for a more accurate result

I have tried to include either CANCER_TYPE OR ONCOTREE_CODE in the clinical data file provided as input using the -c option. According to the documentation, the cancer type should be assigned based on the clinical data file as it has the highest priority. So something must be going wrong. Having the cancer type column in the input file -i does not help either.

However there are no warnings when the default tumor type -t is set, but this is only possible when you have one cancer type in your dataset. I guess I could subset the data for each cancer type and run the annotation separately, but that would defeat the purpose of the -c option? But also, there does not seem to be any check on whether a valid cancer type is specified by the -t option, so I could specify some random string, and it would not complain or give a warning about cancer type not being specified.

Do you know what could be going on here? Ideally, I would like to be able to specify different levels of ONCOTREE_CODE and get output for those levels.

Thanks for your help.

zhx828 commented 8 months ago

Hi,

I get the same warnings when running MafAnnotator.py.

INFO:AnnotatorCore:Cancer type for the sample should be defined for a more accurate result

I have tried to include either CANCER_TYPE OR ONCOTREE_CODE in the clinical data file provided as input using the -c option. According to the documentation, the cancer type should be assigned based on the clinical data file as it has the highest priority. So something must be going wrong. Having the cancer type column in the input file -i does not help either.

However there are no warnings when the default tumor type -t is set, but this is only possible when you have one cancer type in your dataset. I guess I could subset the data for each cancer type and run the annotation separately, but that would defeat the purpose of the -c option? But also, there does not seem to be any check on whether a valid cancer type is specified by the -t option, so I could specify some random string, and it would not complain or give a warning about cancer type not being specified.

Do you know what could be going on here? Ideally, I would like to be able to specify different levels of ONCOTREE_CODE and get output for those levels.

Thanks for your help.

Hi @paulsalachan , in the example script, I have clinical file referenced in most annotator scripts so the -c should work. For the clinical file you created, do you also have SAMPLE_ID column? I'm happy to take a look at your files if you send me a snapshot.

We currently do not have any checks on cancer type which I think is a good idea to support. https://github.com/oncokb/oncokb-annotator/issues/214

paulsalachan commented 8 months ago

Hi @zhx828, thank you for your quick reply. That resolved it. In the clinical file I had the Sample ID column but the column was named 'Tumor_Sample_Barcode' instead. When I renamed it to 'SAMPLE_ID', it is annotating without any warnings, so that's great! A suggestion would be to be able to provide either 'SAMPLE_ID' or 'Tumor_Sample_Barcode' as column header, so that it is consistent with the header in the Maf file. Thanks for your time with the help!

zhx828 commented 8 months ago

Hi @zhx828, thank you for your quick reply. That resolved it. In the clinical file I had the Sample ID column but the column was named 'Tumor_Sample_Barcode' instead. When I renamed it to 'SAMPLE_ID', it is annotating without any warnings, so that's great! A suggestion would be to be able to provide either 'SAMPLE_ID' or 'Tumor_Sample_Barcode' as column header, so that it is consistent with the header in the Maf file. Thanks for your time with the help!

Oh Tumor_Sample_Barcode is supposed to be supported but for some reason it's not in for the clinical file. I made a patch to fix the issue https://github.com/oncokb/oncokb-annotator/releases/tag/v3.4.1

oncokb / oncokb-annotator

`INFO:AnnotatorCore:Cancer type for the sample should be defined for a more accurate result.` #213