NTR: AJCC tumor grade value specification

obi-ontology / obi

The Ontology for Biomedical Investigations

http://obi-ontology.org

Creative Commons Attribution 4.0 International

75 stars 25 forks source link

NTR: AJCC tumor grade value specification #856

Closed cstoeckert closed 6 years ago

cstoeckert commented 7 years ago

The Ontology for Biobanking has requests from the NCI BBRB for tumor grades and stagings. This term request is for a class of tumor grade and the instances covered by the class. It is meant to be consistent with the discussion in https://github.com/obi-ontology/obi/issues/818 and a figure illustrating the class and instances was added there by Mark Miller. The parent class can be updated if new appropriate subclasses of categorical value specification are introduced (ordinal value specification, composite value specification).

class: histologic grade according to AJCC 7th edition parent class: categorical value specification definition: A categorical value specification that is a histologic grade assigned to a tumor slide specimen according to the American Joint Committee on Cancer (AJCC) 7th Edition grading system.

instance: G1: Well differentiated type of: histologic grade according to AJCC 7th edition definition: A histologic grade according to AJCC 7th edition indicating that the tumor cells and the organization of the tumor tissue appear close to normal. definition source: https://www.cancer.gov/about-cancer/diagnosis-staging/prognosis/tumor-grade-fact-sheet

instance: G2: Moderately differentiated type of: histologic grade according to AJCC 7th edition definition: A histologic grade according to AJCC 7th edition indicating that the tumor cells are moderately differentiated and reflect an intermediate grade. definition source: https://www.cancer.gov/about-cancer/diagnosis-staging/prognosis/tumor-grade-fact-sheet

instance: G3: Poorly differentiated type of: histologic grade according to AJCC 7th edition definition: A histologic grade according to AJCC 7th edition indicating that the tumor cells are poorly differentiated and do not look like normal cells and tissue. definition source: https://www.cancer.gov/about-cancer/diagnosis-staging/prognosis/tumor-grade-fact-sheet

instance: G4: Undifferentiated type of: histologic grade according to AJCC 7th edition definition: A histologic grade according to AJCC 7th edition indicating that the tumor cells are undifferentiated and do not look like normal cells and tissue. definition source: https://www.cancer.gov/about-cancer/diagnosis-staging/prognosis/tumor-grade-fact-sheet

instance: GX: Cannot be assessed type of: histologic grade according to AJCC 7th edition definition: A histologic grade according to AJCC 7th edition indicating that the grade cannot be assessed and is undetermined. definition source: https://www.cancer.gov/about-cancer/diagnosis-staging/prognosis/tumor-grade-fact-sheet

Public-Health-Bioinformatics commented 7 years ago

The above schema allow for an instance "GX:Undifferentiated" etc. to be a 'type of' other versions as they come up, like "histologic grade according to AJCC 8th edition", right? In this light, perhaps "definition" can be tweaked to just say when a categorical value was introduced instead, i.e. "A histologic grade introduced in AJCC 7th edition indicating that the grade cannot be assessed and is undetermined."

cstoeckert commented 7 years ago

There may be an instance "G4: Undifferentiated" in "histologic grade according to AJCC 8th edition" but I think it would be a different instance from "G4: Undifferentiated" in "histologic grade according to AJCC 7th edition" and would be distinguishable by a different IRI despite the same label. As a result they are different data values but could be about the same thing. I don't actually know when the values were introduced but suspect it was before the 7th edition so would not want to say that.

Public-Health-Bioinformatics commented 7 years ago

So to compare retrospectively datasets that have been ontologized as different versions of standards have been introduced would require an equivalency mapping table of instances across those standards. The benefit is that there is no ambiguity, but the work involved is establishing the equivalency, and creating a simplified view on the data that can run queries. Anyone tried this kind of approach out, visa vis ontology or traditional RDBMS? It does sound burdensome, and I suspect most people would need some kind of software intermediary in order to cope with analysis of datasets coded in this way, and of course trained ontologist resources would need to be available for standards mapping.

Another vision is that there is only one "G4:Undifferentiated" that can be a member of different versions of a standard, indicating constancy of semantics over time. The benefit in this approach is that it enables querying across datasets immediately as they all reference the same individual; and new categorical values can be added cumulatively across standards (with some complexity visa vis ordinal rankings as noted separately). The drawback is in complexity around deprecating a categorical value that exists in one version but is done away with in a subsequent standard's version. One would have to introduce one other piece of metadata - the date on which a certain categorical value was deprecated, and not use OWL's deprecated flagging mechanism.) In this scenario one could mark a dataset as a whole as subscribing to a certain set of standards' versions, rather than each individual observation/datum, and have the version of a datum be inferred.

Public-Health-Bioinformatics commented 7 years ago

Ah, I made a slight mistake above - if each categorical value has explicit membership in one or more of a standard's versions, then no deprecation timestamp is needed. However, there is a third vision where each categorical value is a member of an overall standard, and the value has a date or version of standard when it was introduced, and a possible date of deprecation. Separately, a list of standards versions exists, each with a date or version of introduction, and replacement. Here no link need be made explicitly between a categorical value and a particular subclass or instance of a standard. By having a dataset marked as subscribing to a given standard it should be inferable which categorical values pertain to it (and string and numeric ones for that matter).

cstoeckert commented 6 years ago

This was discussed on Aug 21 call and revisited on Sept. 11 call. Result was that it is not an ordinal value specification as it include GX: Cannot be assessed. Will add as a categorical value specification as originally proposed. With this pattern, I have additional classes and instances to add. See: BBRB value specifications https://docs.google.com/spreadsheets/d/1mu2815JvvjPUstvXDyhN9_aG5T7kU1YZer7Zwsb7u-w/edit?usp=sharing

Public-Health-Bioinformatics commented 6 years ago

There is the measure itself, and then there are metadata states that pertain to the collection of the data. For any likert scale question leaving answers blank would be communicated by a "blank" metadata state. So is the "GX: Cannot be assessed" meant to be a metadata state? In which case from an informatics perspective it should be excluded from the other choices which would remain ordinal.

Aqua1ung commented 6 years ago

I, for one, (and I sense that I may not be the only one) am rather uncomfortable with capturing values such as "GX" ("Cannot be assessed") as results of a measurement process/experiment. If anything, arriving at the GX "value" signifies precisely the fact that, for whatever reason, the measurement process/experiment has been unsuccessful (histological specimen was not good enough, microscope was broken/badly calibrated, whatever), hence there was no measurement process in the first place--but merely an attempt at a measurement process. I agree, however, that one may need to make a note of this GX "value," though that should not happen as a value of some physical magnitude/quality, but as a "property" of the measurement attempt itself. I thus propose to design a class "assay attempt" around the assay class, thus having "assay/measurement process" as a proper subclass. The "assay" class would thus encompass only successful assays: Only those deserve to be deemed (genuine) measurement processes. And if we are hell bent on using the string "GX" itself in the ontology (as it can always be cooked up in the code by whoever designs the application that uses the ontology), simply make a subclass of "assay attempt" that does not overlap the "assay" class, and label that "GX" for "unsuccessful histological assay." My $.02.

Public-Health-Bioinformatics commented 6 years ago

I agree that treating "GX: cannot be assessed" as an instance of the "Histological Grade" class is awkward and is not a "histological grade", but rather a fact about the measurement attempt. So top-of-page approach may work for some folks but won't engender automated statistical analysis and isn't a pure solution. I like idea of having a generic "assay attempt" class, and of having a "unsuccessful assay" class to reference. This metadata state could pave way for other possibilities too that at moment I don't know how to deal with, e.g. NCBI Biosample submissions require stating "missing" or "not collected" (see bottom of MIXS) in many cases where free text or categorical input data is not available.

Public-Health-Bioinformatics commented 6 years ago

Here's a diagram of the lifespan of these categorical/ordinal variable specification items according to top-of-page scheme "Replace by edition", and something I had in mind "Replace or create on demand" as datasets are encoded over time. In 2nd schema, G1, G2, G3 etc are members of a general "AJCC histologic grade code" class; and particular annotations indicate which versions they are created or discontinued for. Comments?

variable specification

bpeters42 commented 6 years ago

Sorry for being AWOL, have been grant writing. This subject of 'meta-values' came up before. We have seen different types (I am copying this from an old thread):

"'not done', 'not performed', 'not determined' - information isn't here because a measurement wasn't made 'unknown'; 'not available' - information isn't here because the person entering it doesn't have it 'not applicable' - information for this instance is not meaningful e.g. capturing a gender column for different samples and filling it out for organism samples, but entering 'not applicable' for environmental samples. 'none of the above' - there is a correct value, but it does not fit in the value specification scheme.

These types of list values are essentially 'meta-values'. I believe we should represent them as such, and state that a database column holds either a 'meta-value specification' or a 'value specification', the later being what we are trying to capture in measurement data and those value specifications. A 'meta-value specification' is not the output of an experiment, but an ICE stating why that is not available and something else is put there."

It sounds like there is support for this notion?

On Wed, Sep 27, 2017 at 4:29 AM, Damion Dooley notifications@github.com wrote:

I agree that treating "GX: cannot be assessed" as an instance of the "Histological Grade" class is awkward and is not a "histological grade", but rather a fact about the measurement attempt. So top-of-page approach may work for some folks but won't engender automated statistical analysis and isn't a pure solution. I like idea of having a generic "assay attempt" class, and of having a "unsuccessful assay" class to reference. This metadata state could pave way for other possibilities too that at moment I don't know how to deal with, e.g. NCBI Biosample submissions require stating "missing" or "not collected" (see bottom of MIXS http://gensc.org/mixs/ in many cases where free text or categorical input data is not available.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/obi-ontology/obi/issues/856#issuecomment-332491746, or mute the thread https://github.com/notifications/unsubscribe-auth/ANN9IhdWg91Fmlfcr08TgxrPbhelTklyks5smjGAgaJpZM4O76qr .

-- Bjoern Peters Associate Professor La Jolla Institute for Allergy and Immunology 9420 Athena Circle La Jolla, CA 92037, USA Tel: 858/752-6914 Fax: 858/752-6987 http://www.liai.org/pages/faculty-peters

Aqua1ung commented 6 years ago

Adopting this "meta-value specification" fix appears to me to be more of a cop-out than an actual attempt to analyze and capture the various issues (of various natures) that happen under this umbrella. Surely, if appeal to "meta-value" is solely meant to constitute a temporary fix for the sake of expediency, I'm all for it--that is, as long as we are committed to return to this issue in the near future and give it a permanent and final solution (however ominous that may sound). Otherwise, if not in a mad rush, I think it would be worth spending a bit of time to analyze things in detail: in this respect, capturing the distinction between assays and assay attempts looks to me to be a good start toward resolving "not performed"/"not determined"/"not available" and related issues--which, as you mentioned, are encountered very often in the world of assays.

bpeters42 commented 6 years ago

I fail to see how this is a cop-out? I was very much trying to go for a 'more final solution', and primarily wanted to point out that there are analogous scenarios other than 'assay attempts' that lead to reports of 'not applicable' or 'none of the above'. In all those cases they reported side-by-side to actual value specifications, and in one way or another indicate why not a normal value specification is given. In that sense they are 'meta value specifications'. Happy to change the label to something else. I think each of these cases can be modeled more carefully, and it is interesting to do so.

On Wed, Sep 27, 2017 at 8:51 AM, Cristian Cocos notifications@github.com wrote:

Adopting this "meta-value specification" fix appears to me to be more of a cop-out than an actual attempt to analyze and capture the various issues (of various natures) that happen under this umbrella. Surely, if appeal to "meta-value" is solely meant to constitute a temporary fix for the sake of expediency, I'm all for it--that is, as long as we are committed to return to this issue in the near future and give it a permanent and final solution (however ominous that may sound). Otherwise, if not in a mad rush, I think it would be worth spending a bit of time to analyze things in detail: in this respect, capturing the distinction between assays and assay attempts looks to me to be a good start toward resolving "not performed"/"not determined"/"not available" and related issues--which, as you mentioned, are encountered very often in the world of assays.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/obi-ontology/obi/issues/856#issuecomment-332567418, or mute the thread https://github.com/notifications/unsubscribe-auth/ANN9IvDjYY646YUs7YFxrN1J0y3w8tUVks5smm74gaJpZM4O76qr .

Aqua1ung commented 6 years ago

Yes, there may be scenarios where the "assay attempt" fix does not apply, which is why I said that it constitutes merely a (good) start. I do not think we should even aim at capturing all these scenarios in one fell swoop. Coining yet another informational entity for this purpose seems to me to be just another way of avoiding to tackle things at the reality level, where things happen--after all, it seems plausible that one could (re)build OBI exclusively out of informational entities without ever having the need to represent the things that these informational entities are about. While attempting to solve this by appeal to more and more informational entities may be tempting from the point of view of expediency (which is certainly not something that one should scoff at), I think getting one's hands dirty with the things-that-informational-entities-are-(purportedly)-about will yield more benefits in the long run, however messy it may look at this point.

bpeters42 commented 6 years ago

I understand your point, respectfully disagree, and think it should be possible to compromise. In our modeling of entries in the IEDB, and in what Chris is needing, we will need to deal with more than assay attempts being reported. These reports are perfectly real entities. Creating a bucket for such information content entities would mean that we will be able to fully map our database entries to OBI values, and that would have immediate practical benefit, even if most of those OBI entries would not be fully satisfactorily defined in terms of relation to non-information content entities. That can be noted in the curation status of those terms, and they could be revised as we model them one-by-one.

Starting with assay-attempts is a good idea. Here is a start.

Possible scenarios how "GX: Cannot be assessed" can arise 1) an assay was executed in full according to protocol, and control readouts indicated that the results can't be trusted (e.g. PHA stimulated samples did not result in >20SFC per million input cells in an IFN-g ELISPOT assay). I would say that this is a planned process, but not an assay, as it did not generate the desired data item. . 2) an assay was attempted to be performed, but could not be completed according to protocol (e.g. the building burned down). One can argue that this is a process that has a part that was planned but overall failed. 3) an assay should have been performed, but wasn't. This is tricky, and I think the most common scenario. If the surgeon does not provide the pathologist with a tumor sample, the pathologist can't grade it. Much harder to say what this is. I guess there is the larger planned process (study design, or treatment plan) which if executed in full should have included the assay but did not. I would argue that 'lack of process' is not a process. But rather that whatever the evaluant was did not participate in an assay.

So in all cases "GX: Cannot be assessed"

is an ICE
is about a physical tumor
is about the plan that the tumor should have been assessed
is about the failure of that plan to be executed properly

In the case of 1), it is the output of a 'assay attempt with controlled failure' (a planned process). In the case of 2) or 3) it is the output of someone making a determination that a deviation from the original plan occurred.

Stopping here, having too much fun while I should be grant writing.

On Thu, Sep 28, 2017 at 7:25 AM, Cristian Cocos notifications@github.com wrote:

Yes, there may be scenarios where the "assay attempt" fix does not apply, which is why I said that it constitutes merely a (good) start. I do not think we should even aim at capturing all these scenarios in one fell swoop. Coining yet another informational entity for this purpose seems to me to be just another way of avoiding to tackle things at the reality level, where things happen--after all, it seems plausible that you could (re)build OBI exclusively out of informational entities without ever having the need to represent the things that these informational entities are about. While attempting to solve this by appeal to more and more informational entities may be tempting from the point of view of expediency (which is certainly not something that one should scoff at), I think getting one's hands dirty with the things-that-informational-entities-are-(purportedly)-about will yield more benefits in the long run, however messy it may look at this point.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/obi-ontology/obi/issues/856#issuecomment-332852928, or mute the thread https://github.com/notifications/unsubscribe-auth/ANN9Ijr8zmzdSgLrFl5npFcVWtTwUY23ks5sm6xAgaJpZM4O76qr .

cstoeckert commented 6 years ago

Thanks for the interesting feedback. I agree that GX (and pTX, pNX, etc.) are not tumor stages/ grades in the same sense as G1, G2, etc. Therefore, to move ahead, I will not add these as instances to the stage and grade classes. However, as Bjoern correctly points out, they are needed and we need to figure out how to best capture them. And as Cristian points out, I think it useful to clarify what exactly GX is about. So I asked a pathologist! Assigning GX is rare and happens when there isn't enough tissue to make a call. The other "cannot be assessed" values (TX, NX, MX) are more common and occur when the timing isn't right to make a call because the tumor size isn't provided or other needed things (e.g., lymph nodes checked) haven't been done. He agreed that theses values are not about the tumor but about the process of grading the tumor and reflect that an attempt was made to make a call but could not be properly done at the time with what was provided. Time was spent making that call so it was more than just a plan. And the plan includes clearly defined instructions for not making a call. Therefore: "GX: Cannot be assessed"

is an ICE
is about the attempt to assess a tumor (i.e. is about a planned process performed at time T)
is about the inability to output a grade for the tumor from the process with the given inputs. So I'm thinking along the same lines as Bjoern but focusing on the planned process rather than the plan.

bpeters42 commented 6 years ago

This is excellent Chris. So there is an 'assessment process', in which the pathologist first determines if there is sufficient material and information at hand. If so, he continues and performs a full tumor grading assay. If not, then he terminates the process and records that given his inputs, he can't give an accurate assessment and records "GX: Cannot be assessed". This is a planned process, and part of the study design / treatment plan. Kind of a wrapper around the actual assay process. Similar to the 'control failure' example I had.

I do believe we can create a superclass: 'reason for lack of data item' = def: An information content entity that provides an explanation why a data item is not provided. Example: "cannot be assessed", "not applicable", "unknown".

In which 'cannot be assessed' is the output of a planned process of 'determination if assay will provide reliable results".

I think we can expand this to other similar types of 'reason for lack of data item' and write a paper about it. I would like title it 'defining known unknowns'. Would love Christian Cocos to be heavily involved in this if he is up for it :)

Bjoern

On Thu, Sep 28, 2017 at 2:47 PM, cstoeckert notifications@github.com wrote:

Thanks for the interesting feedback. I agree that GX (and pTX, pNX, etc.) are not tumor stages/ grades in the same sense as G1, G2, etc. Therefore, to move ahead, I will not add these as instances to the stage and grade classes. However, as Bjoern correctly points out, they are needed and we need to figure out how to best capture them. And as Cristian points out, I think it useful to clarify what exactly GX is about. So I asked a pathologist! Assigning GX is rare and happens when there isn't enough tissue to make a call. The other "cannot be assessed" values (TX, NX, MX) are more common and occur when the timing isn't right to make a call because the tumor size isn't provided or other needed things (e.g., lymph nodes checked) haven't been done. He agreed that theses values are not about the tumor but about the process of grading the tumor and reflect that an attempt was made to make a call but could not be properly done at the time with what was provided. Time was spent making that call so it was more than just a plan. And the plan includes clearly defined instructions for not making a call. Therefore: "GX: Cannot be assessed"

is an ICE

is about the attempt to assess a tumor (i.e. is about a planned process performed at time T)

is about the inability to output a grade for the tumor from the process with the given inputs. So I'm thinking along the same lines as Bjoern but focusing on the planned process rather than the plan.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/obi-ontology/obi/issues/856#issuecomment-332973193, or mute the thread https://github.com/notifications/unsubscribe-auth/ANN9IhGrEzgf4ACubaaoryA13hqIE1mHks5snBPjgaJpZM4O76qr .

Aqua1ung commented 6 years ago

[This is a reply to Bjoern's post three posts above. Apologies for not having refreshed my browser.] My main concern is the hyperinflation/proliferation/deluge of ICEs (Information Content Entities?) to the detriment of non-ICEs. (I am also not a big fan of "about" and "aboutness," as one can make a convincing case that anything can be about everything, and that aboutness is awfully subjective. But I digress.) In my experience, modeling ICEs (aka "representational entities," that are purportedly about things) tends to monopolize ontology development efforts, such that the entities that the ICEs are purportedly about never (or seldomly) get modeled. Instead of being the driving force of ontology development, non-ICEs have been relegated to a mere afterthought; instead of modeling the ding an sich, ontologists end up focusing on the modeling of representations thereof. Non-representational reality thus takes a back seat to representations of reality (quite ironically, in an ontology purporting to primarily model the world of the noumenon). The convenient thing about this procedure is that in the world of ICEs anything goes (pretty much)--again, given that anything (and everything) can be claimed to be about anything (and everything). As such, no one could claim that positing yet another ICE is wrong. I certainly cannot bring any knock down arguments against the framework you just proposed. All I am asking at this point is that you construe my intervention as a plea to restore the natural order of things, namely attempt to represent non-ICEs before indulging in producing a rhapsody of ICEs. My feeling is that once we develop the world of non-ICEs to a satisfactory extent, there will be no need for ICEs, or, at least, the amount of ICEs required will be considerably smaller.

Aqua1ung commented 6 years ago

Thank you both, Chris and Bjoern, for the feedback. I will attempt to gather my thoughts and compare the frameworks you proposed. I will get back to you with my musings, as always.

Aqua1ung commented 6 years ago

Chris and Bjoern, here are two diagrams (https://goo.gl/Qt5kPK and https://goo.gl/CEvh4b) I whipped up in an attempt to illustrate our positions. None of them contains, at this point, any ICEs, although I maintain that they are both capable to capture the issue of tumor grading with room to spare. Both are generalizable to situations involving "known unknowns," to use Bjoern's term. Red arrows are rdf:subClassOf or rdf:subPropertyOf relations. "reveals" and "hasOutcome" are akin to what you know as "has_specified_output" (without the ICE requirement, that is). Indexing qualities with time looks to me to be necessary, though from what I remember, BFO does not do that; as such, I saw fit to represent the four tumor stages G1-G4 as (pairwise disjoint) classes. Should the requirement that qualities be time dependent be dropped, the four tumor stages can very well be captured as instances of the TumorStage class. The main difference, as you will notice, is the inclusion of an "AssessmentProcess" class in the second diagram (TumorStageAssay2), per your suggestion. As such, the major difference is that in TumorStageAssay2 the GX TGAA class emerges as a defined class (namely the class of all attempts whose assessment process outputs "GX"), whereas in TumorStageAssay1 it is a primitive class. That seems to be the root of the difference between my initial proposal, and yours. In other words, there is not much of a difference between the two "positions." Either way should be fine with me, although I find simply accepting a GX TGAA class as a primitive to be the simpler solution, and more easily generalizable to other cases of "known unknowns." You will also notice that in the second diagram I captured the GX "ICE" as a simple string, as I see no reason to complicate the picture with yet another ICE when a simple string does the job very well. Surely enough, should the hasOutcome relation be replaced with has_specified_output, then the GX string will have to turn into an ICE.

Cheers,

P.S. Needles to say, the diagrams should not be regarded as exhaustive, especially with respect to the Assay class. Surely, there are many other relations this and all the other classes are involved in, though they have not been illustrated in the diagrams.

Aqua1ung commented 6 years ago

This (https://goo.gl/gvEV4L) is the diagram that generalizes the issue of Tumor Stage Assay, as presented in today's OBI meeting. As discussed, adopting an "UnsuccessfulAssayAttempt" class has the advantage that it is non-committal vis-a-vis the reason the Assay Attempt failed, thus it is more likely to satisfy Bjoern's requirement that we strive to accommodate various scenarios: there are various reasons why assays can lead to "Cannot be assessed" (or related) results, hence it is more convenient to stay silent about these reasons instead of positing a reason that might not apply universally. Surely enough, assays that rely on histological samples can fail due to the bad quality of the sample, which is where Chris and Bjoern's proposal about separating an "assessment process" sub-process comes into the picture. This, again, happens for assays that rely on input samples, whereas I have striven to come up with a more general framework. With this being said, if the general solution happens to be to OBI members' liking, and should you guys happen to think that this is something that should be shared with the rest of the biomedical ontologies community, I can, of course, volunteer to start writing a paper (as suggested by Bjoern) to present both the general Unsuccessful/AssayAttempt mechanism, as well as the special case of assays-that-rely-on-samples and the associated Chris&Bjoern "assessment process." Just let me know, and I can get started anytime.

Otherwise, the second major issue regarding the above diagram that was discussed today was the issue of tying the Assay class to the Quality that the assay measures. As far as I can tell, a connection to this effect is missing from OBI at this point in time--though I think it was James (Overton) who mentioned similar attempts (in other ontologies?), attempts that I very much applaud. This, in my opinion, is an oversight that needs to be corrected in OBI at the highest level (namely Assay). In this respect, my proposal is to define a relation between the Assay class, and the Quality class, tentatively dubbed "reveals" or "exposes" (though I certainly am not a stickler for any of these labels).

The third major issue that made the object of discussion in relation to the diagram above was the issue of ICEs, though I will take a breather for now, as I think I have abused y'all's patience.

Aqua1ung commented 6 years ago

As promised, here are a few words about ICEs meant to clarify my intervention this past Monday. ICEs are entities characterized by the fact that they are "about" (or "represent," or "stand for") something. The paradigm case in philosophy is that of "mental representations" (aka "intensions," "senses" (Frege's "sinn,") etc.), and, indeed, all ICEs can be reduced one way or another to mental representations (or combinations thereof, LOT sentences etc.). As such, they are, indeed, part of reality--no question about that. What is less pleasant about these referential entities is that the mechanism that confers them "aboutness" is still heavily debated in contemporary philosophy; representing ICEs requires accounting for the "about" relation as well. The outcome, for our purposes, is that "about" is, at this point in time, a very subjective term: it is quite rare that two or more human subjects agree with respect to what most "representations" represent. Given that, ontologists who happen to be aware of the perils of the "aboutness" issue (i.e. mostly those with a philosophy background) have acquired a definite preference for avoiding referential entities unless absolutely necessary. Fortunately, in realist ontologies this preference is at a definite advantage due to the fact that realist ontologies tend to shun fiction and fictive entities, while referential entities that actually refer (to something real) can be dispensed with in favor of the (real) entities that they stand for, thus making appeal to ICEs unnecessary. Be that as it may, there are, indeed, situations in which referential entities, be they fictional or non-fictional, need to be accounted for. As I see it, there are two situations where including ICEs in an ontology is unavoidable:

ICEs that do not actually refer, such as results of measurements that can only be deemed as approximate (hence don't stand for anything real);
ICEs that need to be represented due to design requirements, such as needing to actually refer to some document or form (such as a CRF) that bears an identifier, has a date and an author, or needing to refer to some particular field in some form, also identified using an identifier.

Other than that, I am personally having trouble perceiving the inevitability of ICEs, to say nothing about accepting a deluge of ICEs (... which is where things seem to be heading to in OBI). Again, given that all ICEs that successfully refer can be dispensed with in favor of their referent, there should be no excuse to prefer an ICE over whatever it is that it refers to.

cstoeckert commented 6 years ago

Back to the issue of Xs (as in GX but the tumor grades also include pTX and and pNX). The consensus is that these are not stages or grades which are about the tumor. Instead these are about the process of assessing whether a stage or grade can be determined. Building on Bjoern's proposal above: 'reason for lack of data item' = def: An information content entity that provides an explanation why a data item is not provided. Example of usage: "cannot be assessed", "not applicable", "unknown". 'cannot be assessed' = def: A reason for lack of data item' that is the negative output of 'determination if assay will provide reliable results'. 'determination if assay will provide reliable results' = A planned process that is used to assess whether an assay will provide reliable results based on the conditions or qualities of the inputs, devices, and other participants of the assay.

For the tumor pathology value specifications: GX = def: A cannot be assessed histologic determination for tumor grade. Example of usage: AJCC 7th edition GX: cannot be assessed. pTX = def: A cannot be assessed pathologic determination for primary tumor staging. Example of usage: AJCC 7th edition pTX: cannot be assessed. pNX = def: A cannot be assessed pathologic determination of lymph nodes. Example of usage: AJCC 7th edition pNX: cannot be assessed.

Public-Health-Bioinformatics commented 6 years ago

A case where an assay process is generating data but some encapsulating system knows it is erroneous came to mind, so I offer mainly the last point below. If I was an engineer asked to design a system for generating an assay, it would be a generic process class having:

a 'has output' or 'producesData' component that points to (has value) a single datum or multiplex assay array of data, the observation itself.
a 'has participant' or 'has input' component that includes the class or instance of thing being measured.
a 'has input' or 'has setting' for any experimental level(s) or settings of the assay machinery, if any.
a 'hasContextualData' component that points to datum(s) describing other pertinent components of reality (time, location etc.). This data (produced by other measurement systems external to the assay machinery itself) is coincident with the assay data. (Some contextual datums might instead be treated as assay inputs).
a 'hasMetadata' component that provides a more general indication of success or failure that can be further refined (subclassed). This component would contain datums that arise from sensors internal to the assay system, or the intervention of human judgement calls.

A "dumb" assay instance would always have a 'hasMetadata' some 'successful assay process' with no possibility of 'cannot be assessed'; slightly smarter assay could report 'hasMetadata' some 'failed assay process'; A real-time instance might report 'hasMetadata' some 'assay process underway'; and finally we get into subclasses of the above, e.g. 'hasMetadata' some "cannot be assessed", "not applicable", "unknown", 'system malfunction', 'system malfunction - power outage', etc. subclasses. This anticipates modelling assay machinery IF desired. Data analysts don't need that level of modelling obviously; but equipment manufacturers and lab quality control and maintenance people could really benefit from this in the ontology-driven Internet of Things IoT vision. I can draw up a diagram if desired.

Aqua1ung commented 6 years ago

Ideally, GX, pNX, and pTX should be represented as strings instead of entities. While reasons why we'd want them captured as entities can always be cooked up if one is sufficiently imaginative, I'd be inclined to argue that none of them are essential. Even better, the set {"GX", "pNX", "pTX"} should be represented as a user-defined datatype. And if it absolutely must be an ICE, we can make this datatype a subclass of ICE. Should this proposal fail to elicit much enthusiasm, I'd argue that GX, pNX, and pTX should, at the very least, be captured as individuals as opposed to classes.
I suggest that the proposed 'determination if assay will provide reliable results' process be made part/subprocess of the assay itself.
My guess is that 'reason for lack of data item' is projected to be a subclass of ICE. What is not clear to me is what sort of critters "cannot be assessed", "not applicable", and "unknown" might be. Again, I, for one, would like to see 'reason for lack of data item' represented as a datatype: {"cannot be assessed", "not applicable", "unknown"}.

Public-Health-Bioinformatics commented 6 years ago

Its tough - whether one is creating a logical structure that can be repurposed or expanded, which requires a bit of a general systems anticipatory ("imaginative") guessing game in design, vs. the need of particular projects now, and not making guesses about usage that are at risk of being wrong.

I think a fundamental issue in this situation and others is how to use ontology to model process encapsulation in which underlying processes successfully generate their outputs, or fail for whatever reason, and need an overall process wrapper to capture this general behaviour (especially to accommodate the further design and refinement of underlying processes). A simple approach where some inputs and outputs can be defined with data types as @Aqua1ung suggests above allows states to be specified, and as a datatype, act as a categorical variable, and reused elsewhere I presume. But no ability to annotate - define - the given states individually, and they could be criticized on the multilingual front.

Personally I'm looking for a solution in which output states of component part processes can be elevated and generalized to output states of the encapsulating process.

cstoeckert commented 6 years ago

Thanks for raising these issues but I think they are of a more general nature (when to use ICEs, how to encapsulate assays, etc.) and can be part of an ongoing discussion. I will push ahead with adding the specific requested terms that are needed. Chris