Open kshefchek opened 8 years ago
Thanks for bringing this up again. In addition to formally documenting, I wonder if it is better to just disambiguate the number of distinct relationships, versus the number of distinct subjects and objects. For instance, like so:
Wherein the (i) provides formal documentation--if necessary explaining closure.
I don't feel strongly about having the inclusion of subtypes be something that is configurable. It seems silly to not want that. We could alternatively just say "including subclasses".
Thoughts welcome.
We should functionally distinguish between 'entities' and 'terms'. Even though these may both be modeled as ontology classes, there is an expected difference in behavior. Entities typically form a disjoint set without any primary classification axis. Terms form a subsumption lattice or similar. So it's meaningful to count entities (e.g. number of genes in the "abnormality of CNS" gene set). It's difficult to meaningfully count terms (e.g. number of phenotypes for "SHH") due to redundancy.
This somewhat breaks down for entities like genotypes which subsume in their partonomy, but still useful. It also breaks down with diseases: restricting to OMIM there is (more or less) a disjoint set, but with a hierarchy, where we have associations from more generic disease classes, questions of the form "how many diseases have ..." becomes a bit more problematic.
Every tab should be conceived of as a relation, with the page being either subject or object. Reasoning should always be used. So for a phenotype page P, the query is "has-phenotype some P". In this case the reasoning is trivial so there is no immediate need for an explanation. In other cases we will need to be more explicit about the reasoning (note that the reasoning task is distributed, with some taking place during query that populates golr, see https://github.com/monarch-initiative/dipper/issues/324, and some taking place using the closure indices). For complete explanations, a graph view is probably best.
Broadly, there are two separate responses to the "R some X" query. One is a set of things that satisfy the query, the other is the set of things plus the immediate assertions about those things that lead to the query being satisfied. For example, for the genes tab on the phenotype page P, the set of entities are the genes, and the set of assertions are the set of associations to some subclass of P. Procedurally it can be easiest to think in terms of the closure fields but thinking in terms of reasoning and explanations is more powerful.
This framework can be used for everything, and we can be creative about how we display this. Kent has some ideas about a graphical display. But for the basic table oriented display, some key points:
We attempt to provide a way to switch between these views in amigo
E.g. by default we show associations http://tomodachi.berkeleybop.org/amigo/term/GO:0007417 But there is a link to get the entities.
We have some ideas on how to improve this but haven't had the time. We're kind of exposing the solr denormalization a bit too much. Ideally you would not require the user to switch but you would see something that combined both.
Adding @qjwang2001 as a watcher
Any updates on this? I'm still seeing different numbers displayed on the tabs vs in the data table. See pic (421 phenotypes listed vs 906 in reality):
We're abandoning the bbop tables in favor of a new widget @putmantime is working on. We should make sure we address this.
@lwinfree viewing a disease group, the association count will not always match the distinct number of phenotypes, for example, when two disease subclasses are annotated to the same phenotype.
Proposal for comment:
When interacting directly with solr this is somewhat challenging. We can't page or leverage faceting abilities when operating on the distinct list. However, proxying through biolink may solve some of this.
Future comments on this should really refer to how things are being represented in alpha. eg: https://alpha.monarchinitiative.org/disease/MONDO:0016033#gene
Any given data table has three types of counts
For example, on a phenotype page: Abnormality of the central nervous system, on the gene table we have:
Right now 2 appears in the tab, and 1 appears in the table view. Without documentation, this is confusing. How can we better display these counts? cc @jmcmurry
Issue reported by @cindyJax @sbello, cc @mellybelly