thechiselgroup / biomixer

BioMixer
http://bio-mixer.appspot.com/
16 stars 13 forks source link

Percentage and Absolute Ontology Mapping Node Areas #466

Open everbeek opened 9 years ago

everbeek commented 9 years ago

When Peter and Cydney were here, I realized that showing the outer circle as total ontology size makes big ontologies look more predominant, where we want to highlight the ratio of mapped nodes probably...that's a possible comparison for the study, showing percentage area vs. the way we have it now. With percentage area, each ontology would have the same size, and the circle within it would still correspond to the portion that maps to the central ontology...hmmm...maybe the user should be able to toggle between those two modes!

Yes. Make a toggle for node area to be representing relative size between ontologies, vs relative size within ontologies. Find the correct wording for this feature.

everbeek commented 9 years ago

I have been having issues with making a percentile sort, which should be simple...and then I realized that the absolute mapping sort, being based on arcs, which themselves are based on the simple mapping call at the beginning of the viz load, works because it is indeed the first REST call, and the only mandatory one. Getting percentiles requires getting all of the ontologies with their size data. This takes a while...timing it...no, actually it is fairly fast as long as I am not in the debugger. Ok...is an instability in sorting by percentile, since we can only get the true result when all of the ontology callbacks have been processed. But we should be resorting....not on every callback returned, but eventually...hmmm...

Given possible latencies in getting data back, and the retrying fetcher trading failures for greater latency...I am not sure if it is friendly to the user to have a percentile sort retriggering as data comes back. It would change the view they are probably already investigating. Contemplating...perhaps we don't want to provide the percentile sort?

everbeek commented 9 years ago

Was having huge problems getting the sorting to work, and it finally dawned on me that I had broken the sort function API and was not returning 1,0,-1 as expected, but was returning 0,1 only.

everbeek commented 9 years ago

I have to deal with the common situation where there are more mappings than there are concepts in an ontology.

Some of the ontologies have no classes in them, and have many mappings, for example, the ontology CCONT has no classes listed, but has 1708 mappings.

In that case, it actually appears to be a problem of missing metrics. This means I should be relying on a different data source for class counts. Finding a REST call with that information...

I think that they need to make sure that metrics are computed for each ontology, rather than Biomixer using additonal REST calls as a contingency to count classes. I could make calls from each root, to the descendants call, then count the number of pages of descendants, and multiply that by the number of classes int he first descendants page. The metric should simply be computed for each ontology though, so this is not a Biomixer problem.

Checking to see what remains to do for this issue.

everbeek commented 9 years ago

I think I have figured out the problem. I used to sort the arcs to place the nodes. Now I am sorting the nodes themselves. Here's the thing...I don't set the mapping counts on the nodes until I have the node data to parse, whereas when I sorted by arc, I already had that count. Seeing if this will be simple to resolve.

everbeek commented 9 years ago

I have things sorted (pun?). Now I have to decide how to cope with the problem that percentile sorting only actually works well when all data has been fetched. Currently, I only fetch what is visible, but when I sort by percentile, I could fetch it all, then complete the sort. Inspecting to see how bad this would be for performance and user experience.

everbeek commented 9 years ago

I think that having the 0 class, large mapping ontologies early in the sort is really unhelpful. I really wish the metrics were done for them. Perhaps I should estimate the number of concepts in said ontologies...again, do a children call, multiple the page count by the number of children per page. Do this for all roots...thinking...yes, I should make some calls for roots, then one for each root, then one descendants call per root, and that's where I get the page count and multiply by the (constant?) descendant number. This assumes that no roots have descendants in common.

I really hate this idea. It already takes long enough to get through the full fetch. Just now, it took 40 seconds to make 893 REST calls, for the full load of metrics for each node. Adding on a bunch of other calls like that will be very noticeable to the user. I really do want to make sure that NCBO knows that performance and quality for this view depends on metrics being up to date.

everbeek commented 9 years ago

Contacted Paul to see about metrics being updated. Leaving this open until I know more.