phenopackets / phenopacket-schema

Repository for the GA4GH phenopacket schema
https://phenopacket-schema.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
77 stars 29 forks source link

Primary, Met, Recurrence #60

Closed pnrobinson closed 5 years ago

pnrobinson commented 5 years ago

Check NCIT terms and document

julesjacobsen commented 5 years ago

https://github.com/phenopackets/phenopacket-schema/blob/069faedba9c154c237e3999a7e1ab9d84ade9e45/src/main/proto/org/phenopackets/schema/v1/core/base.proto#L252-L254

Not sure how many terms we need to add:

    // Is the specimen tissue from the primary tumor, a metastasis or a recurrence?
    // Most likely a child term of NCIT:C7062 (Neoplasm by Special Category)
    // NCIT:C3677 (Benign Neoplasm)
    // NCIT:C84509 (Primary Malignant Neoplasm)
    // NCIT:C95606 (Second Primary Malignant Neoplasm)
    // NCIT:C3261 (Metastatic Neoplasm)
    // NCIT:C4813 (Recurrent Malignant Neoplasm)
    OntologyClass tumor_progression = 18;
pnrobinson commented 5 years ago

we will recommend the NCIT terms -- Primary, metastasis,recurrence Is the specimen tissue from the primary tumor, a metastasis or a recurrance For now, we are using PDXNet entities, but we should use the NCIT terms for these items. This would allow users to enter a more specific NCIT term such as Distant metastasis (C18206), which is a child of Metastasis (C19151)

cmungall commented 5 years ago

What if we want to record both recurrent and metastatic? The tumor_progression field is single valued.

Is the intent to only use a restricted subset of these classes, and not to combine the tumor type with the progression? E.g. "recurrent breast carcinoma" http://purl.obolibrary.org/obo/NCIT_C7771 would be prohibited, instead this would be composed from "recurrent neoplasm" in the progression field plus "breast carcinoma" in the disease or phenotype fields.

I recommend formally defining value sets for each of these fields (not sure if there is a way to specify this at the protobuf level, other than reifying as an enum in the schema).

It seems there a lot of ways to capture the actual cancer, this could potentially be confusing.

This gets more complex as we consider different use cases - e.g. a disease of hereditary polyposis syndrome with a phenotype of colerectal cancer that is both recurrent and has spread to lymph nodes.

I think it's a fundamental law of file formats or data models that if you give people more than one way to say the same thing, they WILL find that way and use it, even if to the designer it seems "obvious" not to do it that way.

pnrobinson commented 5 years ago

This was a recommendation by the PDX oncology teams and I think the field is used in a predefined way. I will ask Steve to help us either define this better of figure out how we should improve the item.

julesjacobsen commented 5 years ago

@pnrobinson is this issue resolved? It's not clear if it is or not. Most cancer folks appear to be positive about what we have and I think you've been more specific about the required terms/child terms in the (docs)[https://phenopackets-schema.readthedocs.io/en/latest/biosample.html#histological-diagnosis].