phenopackets / phenopacket-schema

Repository for the GA4GH phenopacket schema
https://phenopacket-schema.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
78 stars 30 forks source link

Problem with modeling of tumor_stage and suggested solution #188

Closed pnrobinson closed 4 years ago

pnrobinson commented 5 years ago

@mbaudis @julesjacobsen @cmungall @mellybelly

The Histologic grade of a tumor (the element tumor_grade of the biosample of a phenopacket) describes the histologic appearance of a tumor, and thus the hierarchy (tumor_grade is part of biosample) seems correct. On the other hand, tumor_stage is referring both to the tumor and to the overall disease. Although staging differs per cancer, often a stage might be something like this:

Therefore, I do not think that tumor_stage should be in the biosample element. Here is a highly simplified summary of the current structure:

phenopacket (i.e., top level) :
{
    "phenotypicFeatures": ["Hematuria", "Dysuria"],
    "biosamples": [ {
    "id":"sample 1",
    "histologicalDiagnosis": "Infiltrating Urothelial Carcinoma",
    "tumor_grade": -- list of terms representing the tumor grade
    "tumor_stage": -- list of terms representing the tumor stage
    },
    {
        "id": "sample3",
    "histologicalDiagnosis": "Negative Finding" 
    },
    {
    "id": "sample5",
    "tumorProgression": "Metastatic Neoplasm",
    "tumor_grade": [],
    "tumor_stage": [],
    }
    ]
}

My suggestion is that tumor_stage should be a subelement of Disease

// Message to indicate a disease (diagnosis) and its recorded onset.
message Disease {
    // The identifier of this disease e.g. MONDO:0007043, OMIM:101600, Orphanet:710, DOID:14705 (note these are all equivalent)
    OntologyClass term = 1;

    // Cancer findings in the TNM system that is relevant to the diagnosis of cancer.
    // See https://www.cancer.gov/about-cancer/diagnosis-staging/staging
    // e.g. Child terms of NCIT:C48232 (Cancer TNM Finding)
    repeated OntologyClass tumor_stage = 2;

    // The onset of the disease. The values of this will come from the HPO onset hierarchy
    // i.e. subclasses of HP:0003674
    // FHIR mapping: Condition.onset
    oneof onset {
        Age age_of_onset = 3;
        AgeRange age_range_of_onset = 4;
        OntologyClass class_of_onset = 5;
    }
}
cmungall commented 5 years ago

This proposed change makes sense to me

mbaudis commented 5 years ago

+1

A practical conundrum of the „stage as separate attribute“ however is the clash between ontology classes specifying a stage, and combinations of disease class w/ stage attribute. This doesn‘t change the IMO current need to capture both.

julesjacobsen commented 5 years ago

Seems a clear and logical solution to me as well.

ahwagner commented 4 years ago

While TNM is a common form of staging, I would advocate for the inclusion of a related term, staging_system to formally specify the staging system, ideally as an OntologyClass. Perhaps this isn't necessary–it could be reasonably extracted from an ontology such as NCIT–but I think that a formally indicated system might be a commonly-requested attribute.

However, even if the proposed addition of another attribute to specify this is ultimately rejected, I think that we should make the spec language clearer to indicate that other staging systems are acceptable. Currently the spec makes it seem like it must be TNM staging in the attribute definition, and the associated expository text is ambiguous as to what "or equivalent" means (another staging system? TNM staging, but defined by a different ontology?)

If the intention of the spec is to say that other staging systems beyond TNM are not permissible (I don't think this is what the spec is advocating), a compelling argument will need to crafted as to why.

pnrobinson commented 4 years ago

Hi Alex, this seems a great suggestion. Could you possibly provide us with an example of how other staging systems are used? We could at the very least add this to the documentation, but @julesjacobsen perhaps this is something we can still think about for v1? It seems very reasonable to me.

ahwagner commented 4 years ago

Sure. When I looked at this, I was thinking about an SCLC cohort that I previously reported on. In SCLC, tumors are often classified under the limited/extensive staging system, one of the conventional staging systems (VALG) for this disease.

mbaudis commented 4 years ago

While there certainly is a need to go beyond TNM, my wish wouldn't be for different attributes for system + value, but rather for a parameter w/ scoped concepts. E.g. extending EFO (better choice?) with other staging systems.

But please see this as food for thought, no specific objection to a "system + value" implementation.

(I'm not sure about the current status of mappings between staging systems, but I guess this is one of those "intractable since medical specialists being involved" problems).

julesjacobsen commented 4 years ago

I'm hesitant to add a staging_system field to be used in conjunction with the tumor_stage for two reasons. Firstly encoding this information in two places allows for inconsistencies between the two fields where this should not be allowable. Secondly the tumor_stage is already an 'OntologyClass' object which should encode the concept in its entirety, e.g. the stage and the system to which the stage belongs. The third of my two points is that the TNM system is compatible with the limited/extensive staging system and I think we should be attempting to drive adoption of standardised systems rather than allowing a proliferation of sub-standards as these become impossible to compare.

mbaudis commented 4 years ago

@julesjacobsen Generally +1 on your conclusion; only w/ the addition that we should allow OntologyClass objects beyond TNM - all conversions (either to TNM or to some stage-containing diagnostic code) will have edge cases where this mapping will be lossy.

ahwagner commented 4 years ago

I agree with your first point, @julesjacobsen; let's drop the notion of a separate attribute, and leave it to implementations to utilize the ontologies to retrieve the corresponding system. However, I have to agree with @mbaudis that the second point–creating mappings from other systems to TNM to drive adoption of a single staging system–is often lossy.

To illustrate this problem, consider if an institution currently uses a non-TNM staging system, such as the previously referenced SCLC limited/extensive disease system (note: as is currently done in practice by our thoracic oncologists here at WUSTL for SCLC patients). If I wanted to map such a case to TNM, what stage do I record for this limited-stage disease (LD) presentation?

The AJCC TNM defines LD as any T, except for T3-4, due to multiple lung nodules that do not fit in a tolerable radiation field, any N, and M0. –IASLC-AJCC TNM Staging System

Which one of these values do I select when trying to represent this case? If I make a mapping, how do I indicate the process used to select a specific TNM stage from the reported LD state? This would be important to capture in case the mapping system changes down the road. If I don't do a mapping, should I suggest that practicing clinicians alter their current practice for compatibility of their case data with Phenopackets? Either way, this artificial restriction is fraught with difficulties.

More importantly, restricting to TNM ignores a large component of both historical and contemporary practice. Keep in mind that this is only one of numerous staging systems used across different cancer types due to the relevance of those other systems to diagnosis and treatment. I believe it is beyond our remit as a data transmission standards group to enforce usage of a particular staging system outside of our domain, and that choosing to do so anyway might dissuade providers that are seeking to accurately reflect current practice from using this standard.

julesjacobsen commented 4 years ago

OK, so lets change the docs to mention other coding systems are also valid, but the model stays the same.

Given this, are there other coding systems already encoded in an ontology we could point people to?

ahwagner commented 4 years ago

Yes! I think that this is actually just a short step away from where the documentation already is. Rather than using the NCIT:C48232 (Cancer TNM Finding) as an exemplar root concept in docs, I advocate that we use the NCIT:C28108 (Disease Stage Qualifier) concept. This parent concept has stages from many staging systems as child concepts, including limited/extending staging, versioned AJCC (TNM-based) staging, and the 0-IV staging classes that @pnrobinson mentions above.

In addition, NCIT:C28108 (Disease Stage Qualifier) and its descendents are formally represented as disease qualifiers in NCIT, mirroring the qualifier intent of the tumor_stage attribute in the Phenopackets Disease model.

Notably, NCIT:C48232 (Cancer TNM Finding) is not a descendent of Disease Stage Qualifier (Code C28108)–it is a class of Findings, which is used for AJCC staging (in much the same way that phenotypes are used for disease classification). We might want to consider creating a second field that is designed to capture this distinction, e.g. tnm_finding.

I've put in a PR (#214) to illustrate how I think this change might look.

julesjacobsen commented 4 years ago

Nice suggestion @ahwagner, this makes the model more generalisable too. For example, consider reporting on a case of Parkinsonism, given this is a progressive degenerative disease it would make sense to report on the stage of the disease in the phenopacket. Your PR includes

// Cancer staging, the extent to which a cancer has developed.
// See https://www.cancer.gov/about-cancer/diagnosis-staging/staging
// e.g. Child terms of NCIT:C28108 (Disease Stage Qualifier)
repeated OntologyClass tumor_stage = 5;

// Cancer findings in the TNM system that is relevant to the diagnosis of cancer.
// See https://www.cancer.gov/about-cancer/diagnosis-staging/staging
// e.g. Child terms of NCIT:C48232 (Cancer TNM Finding)
repeated OntologyClass tnm_finding = 6;

So how about trying to generalise tumor_stage to disease_stage? The only problem with this is I've no idea what to recommend using and each disease will have their own set of criteria, which if not encoded in an ontology isn't possible to encode here. Thoughts?

// Disease stage, the extent to which the disease has progressed. In the case of cancers,
// see https://www.cancer.gov/about-cancer/diagnosis-staging/staging we recommend the 
// use of child terms of NCIT:C28108 (Disease Stage Qualifier). For other non-cancer diseases
// use the child terms of...
repeated OntologyClass disease_stage = 5;

// Cancer findings in the TNM system that is relevant to the diagnosis of cancer.
// See https://www.cancer.gov/about-cancer/diagnosis-staging/staging
// e.g. Child terms of NCIT:C48232 (Cancer TNM Finding)
repeated OntologyClass tnm_finding = 6;

Actually the NCIT does have other terms for this, such as NCIT:C9439 - Chronic Kidney Disease, Stage 5, so I think changing tumor_stage to disease_stage would be applicable to more disease types without losing the current intention and allow for other disease stages to be used if and when they become available.

@pnrobinson @mbaudis can you comment please as I'd like to release v1.0.0 asap.

ahwagner commented 4 years ago

👍for tumor_stage to disease_stage change. No suggestions for related terms from other ontologies.

pnrobinson commented 4 years ago

@ahwagner @julesjacobsen The one issue I have with is is whether tnm_finding is still too specific. If we are going to make things very generalizable then maybe we can recommend that the TNM (and any other findings) just go into the phenotypic_features field?

mellybelly commented 4 years ago

finally catching up on this thread - i agree, there must be disease type specific staging systems utilized in the model. The coordination of these with their parent disease type in NCIt or Mondo is a different question, but the in longer term there should be consistency checks for this, eg. you cannot use a staging system designed for a different branch of the disease hiearchy (from which it has not been assigned or included).

mbaudis commented 4 years ago

@pnrobinson It works if it is scoped for a biosample... Or other coordination w/ branch of the disease hiearchy (I was typing while @mellybelly posted this ...).

julesjacobsen commented 4 years ago

Is there any reason for not putting the TNM and other findings in the same list under Disease.disease_stage? The drawback is it's not that well scoped, but it keeps the model simple.

julesjacobsen commented 4 years ago

The one issue I have with is is whether tnm_finding is still too specific. If we are going to make things very generalizable then maybe we can recommend that the TNM (and any other findings) just go into the phenotypic_features field?

@pnrobinson the TNM and other findings are based on the classification of phenotypic features, so I'm not sure adding them to the phenotypic_features field would be the right place for them, plus originally tumor_stage was moved out of the biosample message as it pertained to the disease progression as opposed to being a quality of the biosample itself.

ahwagner commented 4 years ago

I agree with @julesjacobsen; tnm_finding should be kept with the disease, as should disease_stage. This is because the finding itself is necessarily made in the context of the disease (e.g., the AJCC Lip and Oral Cavity Staging Form contains site-specific anatomical descriptions for the TNM classification for lip and oral cavity carcinomas):

image

I'd also like to further clarify the important semantic distinction between TNM staging findings and the disease stage qualifiers. TNM staging is a classification system for anatomical presentation of a cancer, but is often applied to clinical reports in conjunction with other clinical phenotypic factors as they pertain to a specific disease. In the American Joint Committee on Cancer (AJCC; the body that created and maintains the TNM system) cancer staging manual (7th edition), the relationship between TNM staging and stage groups is described:

For the purposes of tabulation and analysis of the care of patients with a similar prognosis, T, N, and M are grouped into so-called anatomic stage/prognostic groups, commonly referred to as stage groups. Groups are classified by Roman numerals from I to IV with increasing severity of disease. Stage I generally denotes cancers that are smaller or less deeply invasive with negative nodes; Stage II and III define cases with increasing tumor or nodal extent, and Stage IV identifies those who present with distant metastases (M1) at diagnosis.

When considering disease progression as it relates to a clinical outcome, it is often the stage group that is used to classify the disease (e.g. Stage IIIA SCLC), due to its role in prognosis and treatment (though the AJCC clearly states that TNM classifications should be preserved in the medical record). Limited/Extensive staging for SCLCs (e.g. Limited Stage SCLC) is of this same "group" classification that is tied to prognosis and treatment, and Limited Stage (Code C28065) could easily reside as a sibling element to Stage IIIA (Code C27977) in the Phenopackets Disease.disease_stage field. And while model simplicity is nice, it is not as clear that T1 Stage Finding (Code C48720) is an equivalent, sibling concept to these other two. I think it is appropriate for these to remain separated, given the important distinction between "anatomic stage/prognostic groups" and "anatomic stage".

Sorry about the wall of text, there's just a lot to say here! 😄

mauraakush commented 4 years ago

coming late to the party - agree with @julesjacobsen wrt having checks within the hierarchy of stage+staging_system values. for reference, we are aggregating data from many different types of pediatric cancer with staging systems including INSS, Evans, FAB, AJCC, FIGO, INRG, etc. frequently, more than one staging system is used when aggregating data from different trials so having a way to differentiate between the systems is vital. i don't necessarily have a problem with having both a tnm_finding field and a disease_stage field if these are different concepts... from what i have come across wrt Ann Arbor staging, it seems it the AA criteria has roughly the same function as TNM staging in solid tumors.. -- although the reference from wiki so, if they are, in fact, different concepts, we don't have an issue with separating them. we currently having all staging values in one field, but it wouldn't be hard to change this. any consensus that happens here will be helpful. probably moot at this point but i think mapping collected values to the TNM stage would 1) lose granularity that we want to retain in our database and 2) not be relevant for Hodgkin's lymphoma, AML or ALL. we don't want to misrepresent our collected data and would benefit a great deal from having each staging value and the associated staging_system. that being said, i think it would be fine if someone wanted to do create one field where staging is harmonized in their analysis and describe the methods used.

ahwagner commented 4 years ago

All (and in particular, @pnrobinson and @julesjacobsen), I've updated the PR (commits 962a52a and 4343cf5) to include generalization of cancer_stage to disease_stage. For reasons stated above [1][2], I have left tnm_finding as-is. Are there additional changes you would like to see made to the PR to get this in before 1.0.0?

pnrobinson commented 4 years ago

This sounds reasonable to me. We will need to update the documentation, which I could do next Tuesday or you could add to the PR if you have time. thanks!

ahwagner commented 4 years ago

Great, thanks @pnrobinson. The PR has some documentation changes already, but it's all localized to the disease.rst page. Searching around it looks like other places to update are:

Let me know if I'm missing anything else.

My most recent commit seems to have failed testing; could I have a maintainer rebuild that, and if it's still failing, tell me what went wrong? It was a one-line change in 4343cf5 that broke things, but I can't tell why that would cause tests to fail, and the logs aren't very informative.

pnrobinson commented 4 years ago

There were two errors. I am not sure I understand the first but it might be some network slowness. The second seems to be that we need to rename the method addTumorStage to addDiseaseStage in this test class.

/home/travis/build/phenopackets/phenopacket-schema/src/test/java/org/phenopackets/schema/v1/examples/UrothelialCarcinomaExample.java:[143,17] cannot find symbol
  symbol:   method addTumorStage(org.phenopackets.schema.v1.core.OntologyClass)
  location: class org.phenopackets.schema.v1.core.Disease.Builder

I cannot do much more today but maybe @julesjacobsen can take a look at it next week? Possibly just changing this name will fix the problem.

julesjacobsen commented 4 years ago

It looks like we've reached a consensus? I'm going to merge in @ahwagner's PR and I'll fix any of the unit tests. Good thing we changed this now as otherwise this would have been a breaking change! I'll release a new RC and create a 'ready to release?' issue where all stakeholders will have a week to thumbs-up or voice a reason for not being OK to release. Assuming all is OK I'll push the release button next Tuesday. Note, we don't actually have a formal voting and release procedure in place yet, so this is a first stab at getting something relatively formal in place.

julesjacobsen commented 4 years ago

Fix for broken tests: 90fdec1fc954d30e2e3059d5ad1a7e9a15309037

julesjacobsen commented 4 years ago

Cutting out a lot of the extraneous normal samples and metadata the example now stands as follows: I think that given sample5 is described as a Metastatic Neoplasm the disease TNM finding should also have an M1 Stage Finding NCIT:C48700. In addition we can now specify the diseaseStage which ought to be Stage IV Bladder Urothelial Carcinoma AJCC v7 NCIT:C8941. @ahwagner / @pnrobinson / @mbaudis Can someone confirm this please?

{
  "id": "example case",
  "subject": {
    "id": "patient1",
    "dateOfBirth": "1964-03-15T00:00:00Z",
    "sex": "MALE"
  },
  "phenotypicFeatures": [{
    "type": {
      "id": "HP:0000790",
      "label": "Hematuria"
    }
  }, {
    "type": {
      "id": "HP:0100518",
      "label": "Dysuria"
    },
    "severity": {
      "id": "HP:0012828",
      "label": "Severe"
    }
  }],
  "biosamples": [{
    "id": "sample1",
    "individualId": "patient1",
    "sampledTissue": {
      "id": "UBERON_0001256",
      "label": "wall of urinary bladder"
    },
    "ageOfIndividualAtCollection": {
      "age": "P52Y2M"
    },
    "histologicalDiagnosis": {
      "id": "NCIT:C39853",
      "label": "Infiltrating Urothelial Carcinoma"
    },
    "tumorProgression": {
      "id": "NCIT:C84509",
      "label": "Primary Malignant Neoplasm"
    },
    "procedure": {
      "code": {
        "id": "NCIT:C5189",
        "label": "Radical Cystoprostatectomy"
      }
    }
  }, {
    "id": "sample2",
    "individualId": "patient1",
    "sampledTissue": {
      "id": "UBERON:0002367",
      "label": "prostate gland"
    },
    "ageOfIndividualAtCollection": {
      "age": "P52Y2M"
    },
    "histologicalDiagnosis": {
      "id": "NCIT:C5596",
      "label": "Prostate Acinar Adenocarcinoma"
    },
    "tumorProgression": {
      "id": "NCIT:C95606",
      "label": "Second Primary Malignant Neoplasm"
    },
    "tumorGrade": {
      "id": "NCIT:C28091",
      "label": "Gleason Score 7"
    },
    "procedure": {
      "code": {
        "id": "NCIT:C15189",
        "label": "Biopsy"
      }
    }
  }, {
    "id": "sample5",
    "individualId": "patient1",
    "sampledTissue": {
      "id": "UBERON:0015876",
      "label": "pelvic lymph node"
    },
    "ageOfIndividualAtCollection": {
      "age": "P52Y2M"
    },
    "tumorProgression": {
      "id": "NCIT:C3261",
      "label": "Metastatic Neoplasm"
    },
    "procedure": {
      "code": {
        "id": "NCIT:C15189",
        "label": "Biopsy"
      }
    }
  }],
  "diseases": [{
    "term": {
      "id": "NCIT:C39853",
      "label": "Infiltrating Urothelial Carcinoma"
    },
    "diseaseStage": [{
      "id": "NCIT:C8941",
      "label": "Stage IV Bladder Urothelial Carcinoma AJCC v7"
    }],
    "tnmFinding": [{
      "id": "NCIT:C48766",
      "label": "pT2b Stage Finding"
    }, {
      "id": "NCIT:C48750",
      "label": "pN2 Stage Finding"
    }, {
      "id": "NCIT:C48700",
      "label": "M1 Stage Finding"
    }]
  }],
}
julesjacobsen commented 4 years ago

One more comment - shouldn't disease_stage be singular and tnm_finding be a repeated value?

"diseases": [{
    "term": {
      "id": "NCIT:C39853",
      "label": "Infiltrating Urothelial Carcinoma"
    },
    "diseaseStage": {
      "id": "NCIT:C8941",
      "label": "Stage IV Bladder Urothelial Carcinoma AJCC v7"
    },
    "tnmFindings": [{
      "id": "NCIT:C48766",
      "label": "pT2b Stage Finding"
    }, {
      "id": "NCIT:C48750",
      "label": "pN2 Stage Finding"
    }, {
      "id": "NCIT:C48700",
      "label": "M1 Stage Finding"
    }]
  }],
ahwagner commented 4 years ago

disease_stage and tnm_finding should both be repeated. This is because a disease can be characterized under multiple staging systems.

julesjacobsen commented 4 years ago

OK will leave it then - sounds like a recipe for confusion!

@ahwagner can you comment on whether the diseaseStage example is actually correct in the example above?

ahwagner commented 4 years ago

It isn't what I was thinking of. It's a little weird because the NCIT:C8941 (Stage IV Bladder Urothelial Carcinoma AJCC v7) concept is a disease, that is also a stage. The corresponding stage is NCIT:C27971 (Stage IV). We should provide guidance on this point, or just clearly delineate disease from stage in phenopackets.

julesjacobsen commented 4 years ago

Ahh, yes, my mistake - I didn't follow the actual guidelines described in the documentation! The diseaseStage ought to be a property of the disease, not a subclass of the disease itself.

"diseases": [{
    "term": {
      "id": "NCIT:C39853",
      "label": "Infiltrating Urothelial Carcinoma"
    },
    "diseaseStage": [{
      "id": "NCIT:C27971",
      "label": "Stage IV"
    }],
    "tnmFinding": [{
      "id": "NCIT:C48766",
      "label": "pT2b Stage Finding"
    }, {
      "id": "NCIT:C48750",
      "label": "pN2 Stage Finding"
    }, {
      "id": "NCIT:C48700",
      "label": "M1 Stage Finding"
    }]
  }]
julesjacobsen commented 4 years ago

Changes are in commit: f39864b3d1d1b7f44a14005efc3c8211c4690398

I'm going to close this and release!

mbaudis commented 4 years ago

Closed but still want to re-emphasize @ahwagner 's point:

disease_stage and tnm_finding should both be repeated. This is because a disease can be characterized under multiple staging systems.

Also, for ref, this is the way we started out in 2016 - representing biocharacteristics (phenotypes, diseases...) in the GA4GH schema as bags of individual "characteristics" items (i.e. ontology classes + addtl. data). Earlier docs here.

pnrobinson commented 4 years ago

They are! For reference, this is the code for v1:

// Message to indicate a disease (diagnosis) and its recorded onset.
message Disease {
    // The identifier of this disease e.g. MONDO:0007043, OMIM:101600, Orphanet:710, DOID:14705 (note these are all equivalent)
    OntologyClass term = 1;

    // The onset of the disease. The values of this will come from the HPO onset hierarchy
    // i.e. subclasses of HP:0003674
    // FHIR mapping: Condition.onset
    oneof onset {
        Age age_of_onset = 2;
        AgeRange age_range_of_onset = 3;
        OntologyClass class_of_onset = 4;
    }

    // Disease staging, the extent to which a disease has developed.
    // For cancers, see https://www.cancer.gov/about-cancer/diagnosis-staging/staging
    // Valid values include child terms of NCIT:C28108 (Disease Stage Qualifier)
    repeated OntologyClass disease_stage = 5;

    // Cancer findings in the TNM system that is relevant to the diagnosis of cancer.
    // See https://www.cancer.gov/about-cancer/diagnosis-staging/staging
    // Valid values include child terms of NCIT:C48232 (Cancer TNM Finding)
    repeated OntologyClass tnm_finding = 6;

}
mbaudis commented 4 years ago

Sure, I know - just wanted to have the strong support for the general principle being documented, in case of future "let's discuss this - shall we" attempts :-)

ianfore commented 4 years ago

Haven't had a chance to work through this thread in any detail, but I don't see any recognition of the distinction between clinical stage and pathologic stage (see here or here for the distinction ). These are not different staging systems, both are accommodated within the TNM system. They can be different in a given case and in some models are treated as different attributes.

To some extent it's covered by the concept code within tnm finding. In the examples C48766 and C48750 are pathologic findings. C48700 (M1) could be either. There is actually a concept code for pM1 (pathologic M1) which is C48741, and given that the T and N are pathologic it's likely that the M would be pathologic also.