Open pnrobinson opened 5 years ago
@TomConlin perhaps you and Peter could work together on improving gene expression in anatomy in Monarch
Be glad to, suspect some of this uber blossoming will turn out to be post dipper which only outputs 2,290 instances of UBERON_0004819 associated with Ensembl genes in bgee. (which is still alot of genes to be usefully declared as expressed in a single tissue)
The sidebar is misleading. It's 19k gene associations. A gene count is more informative; but still a bit meaningless in a multi-species context. We should at least be clear about what the count is.
Many genes are associated to the structure via separate annotations to subclasses:
given the fact this is multi-species, and the fact the associations are to child terms, this is consistent with dipper counts.
The numbers may not be surprising if we include housekeeping genes, but is this informative? What exactly is the semantics of what is included and excluded here? This needs to be more transparent.
Note in #147 we decided to include the ranked expression from bgee, but we don't see the rank, which makes it less informative. There is an open ticket #406 about ranks
@TomConlin I'm thinking more bigger picture here, how can we use this data in some interesting cross species analysis, how is this useful for various types of users (clinical, basic research). I have only done minor stints of expression analysis in my last position but I would not understand what to do with these ranks (that are apparently floats). Hoping that this will inform the ingest and then the eventual UI display.
As a user, I might want to know about genes that are tissue specific for some structure, and whether they are specific in multiple species. I could imagine a widget that shows me a list of the top x tissue specific genes showing a bar chart with the degree of tissue specificity in various species, and perhaps highlighting species that diverge from a target species such as human.
@pnrobinson how would you feel about us dumping our current keep the twenty best "rank/score" buckets in each group (species/gene/tissue/stage) and adopt the work Bgee has recently done to partition their expression into what they call gold/silver/bronze levels?
from https://bgee.org/?page=doc&action=call_files#single_expr
Quality associated to the call in column Expression (column 7) is this summary quality and is calculated using following rules:
gold quality: 2 or more high quality calls.
silver quality: 1 high quality call or 2 low quality calls
bronze quality: 1 low quality call (for internal use only. Not present in this file).
I would be happy to see us try just the gold and see if we feel it is lacking
Hi Tom, I would sort of like us to think harder about how to provide more analysis options. I do not think that showing the top 20 is the way to go, because (guess) many tissues will have way more than 20 gold quality genes? Also, if a user wants to explore this, then why should they go to our website rather than Begee? We need to provide some added value. The widget I proposed above is one of many possible ideas. In the future, I think we can integrate this with GO analysis on the Monarch UI, then users could use some sort of slider to choose genes that are expressed in a tissue, and possibly intersect list from tissue A with a list from tissue B, and then perform GO analysis. But I think we need to be cooler and slicker!!
looking at the number of distinct genes per species with a given quality level for
your original example example i find
genes | anatomy | txid | quality
7135 | UBERON:0004819 | 9606 | GOLD
3 | UBERON:0004819 | 10090 | GOLD
2373 | UBERON:0004819 | 9606 | SILVER
3 | UBERON:0004819 | 10090 | SILVER
which agrees with your surmise there would be more high confidence results than we promote
We can get a bit finer granularity on the counts by including the stage which we currently do not.
genes | anatomy | txid | quality | stage
7135 | UBERON:0004819 | 9606 | GOLD | UBERON:0000104
1 | UBERON:0004819 | 10090 | SILVER | UBERON:0000112
2 | UBERON:0004819 | 10090 | SILVER | MmusDv:0000029
2373 | UBERON:0004819 | 9606 | SILVER | UBERON:0000104
@pnrobinson I have done some exploring and can find genes only reported in a single anatomy item during a stage per species. The results however did not resonate because I want to differentiate primary observations from propagated/inferred reports as the latter inflates counts. I have been in contact with BGEE and they believe they will produce a release including the data allowing me to make this distinction before the end of the year.
BGEE did produce a new release including additional information on if a quality indicator (silver|gold or "rank" or 1::100 score) is influenced by other (propagated) up/down stream observations. I do not believe we will be able to disentangle direct observation quality indications in the general case. And perhaps I should not be trying.
The absence of any records which are purely propagated alleviates my strongest concerns. (they are likely the absent bronze quality).
https://beta.monarchinitiative.org/anatomy/UBERON:0004819
According to this page, kidney epithelium is linked to 19182 genes. This is meaningless -- where is this coming from?