Add the total number of hits per taxon

jmcmurry commented 8 years ago

Spoiler alert: potentially terrilble idea... It is bugging me a little that it isn't very obvious that there's more data for a given species than we present when in the 3 species view. I'm wondering if it is possible to have a scroll bar for each species as well as the one recently implemented for all species?

yuanzhou commented 8 years ago

I see your point. Right now we only configured to display 10 targets (genes/diseases) per species in the cross comparison mode. And the vertical scrollbar is available if we have more sources (targets) than the default number. I can say it's lots harder to add a horizontal scrollbar for each species in the cross comparison mode, because all the targets are rendered using one target axis group, same as the single species. And displaying three horizontal scroll bars would require us to get rid of the mini map. @harryhoch any insights?

harryhoch commented 8 years ago

yes, I agree. scrolling individually per species could be a pain.

Alternative suggestion - can we change labels for the species to indicate how many models are available?

jmcmurry commented 8 years ago

@harryhoch that's a reasonable possibility, provided it is actionable. Eg. a button that says "see all 28 models"

yuanzhou commented 8 years ago

@harryhoch @jmcmurry after playing with this, I feel adding the total number of available models in the single species mode makes more sense. Because in single species mode when we show that number under the species name, users tent to scroll to the end to explore more models. In cross comparison mode, if we show that number under each species name, but there's no horizontal scrollbar accordingly, it may be confusing.

Another idea is to also show the beginning number and end number of the current visible models/columns in the single species mode. This might be helpful for tracking purpose.

frdougal commented 7 years ago

@harryhoch @jmcmurry @yuanzhou I'd like to weigh in on this. I think we should limit the users to the top 100 results at the most. Currently when we retrieve simsearch data, we limit it to the top 100 matches in the target species. Most disease searches will return far more than 100 hits per taxon (I think there may be 54k hits in most cases). How reliable are the results outside of the top 100? Do we run the risk of giving users unreliable data by presenting them with results 101-1000? This is more of a question for the simsearch folks.

I have mocked up a visualization that quickly allows users to "zoom into" a single taxon using the (+) sign:

When the plus sign is clicked, it mimics the user selecting one organism from the checkbox list:

Would this be an acceptable compromise?

harryhoch commented 7 years ago

@cborromeo, I like the plus sign. Nice addition.

As far as going beyond the top 100, but I will defer to the others. @jmcmurry, what do you think?

jmcmurry commented 7 years ago

I don't think there's a huge benefit to including hits 101-1000 provided that we can operate on the results more easily. For instance, there may be a set of 12 phenotypes and two of these are rare or hallmark. You might want to limit hits to just those that have some kind of match on those two of interest and those might be further down the list. Or maybe we would have to re-run owlsim.

jmcmurry commented 7 years ago

I like the addition of the + very much, but think we need a corresponding - to get back to the multispecies view.

frdougal commented 7 years ago

I agree with the (-) suggestion. I will add that.

Can you provide more information regarding rare or hallmark phenotypes? Is this information I returned by owlsim? I don't think I've seen this information before.

jmcmurry commented 7 years ago

Sorry to be so opaque. Don't worry about the hallmark features thing. We don't yet have the annotations structured in a way that can get us to that point. Nor do we have OwlSim3 wired in to enable the behavior I'm talking about. For future reference I was talking about something like this.

yuanzhou commented 7 years ago

Analyze Phenotypes section allows users to specify the limit of columns. We should also make sure the (+) sign works in this case.

Another thing came to my mind is when we add the (+) by the end of each group label, we also need to make sure that sign will show up correctly when we reduce the number of columns per group. Because when the group label is wider than the actual group grid region, the label will get truncated.

harryhoch commented 7 years ago

@cborromeo , can you check on the analyze phenotypes page and these edge cases?

frdougal commented 7 years ago

@yuanzhou can you give me a screenshot of a good test case? Thanks, Chuck

frdougal commented 7 years ago

@yuanzhou @harryhoch I tried the analyze page (see below). I used about 20 genes (these become columns) and 3 phenotypes (rows). This view does not display the taxon headers. Is there a way to show the taxon headers? If not, I don't think my changes are affected by this case. screen shot 2016-07-21 at 10 59 46 am

harryhoch commented 7 years ago

I think @yuanzhou has to tackle this question.

yuanzhou commented 7 years ago

@harryhoch @cborromeo more precisely we should call it "group names" instead of "taxon headers". And this behavior is totally expected because those group names are defined in the gridskeletondata. As far as I can remember, only the defined group names will display. In the Analyze Phenotypes use case, we can't define a group name for a random gene ID list, thus there's no taxon header.

harryhoch commented 7 years ago

thanks, @yuanzhou

jmcmurry commented 7 years ago

Because in this use case there may be different species in consecutive single columns, it may be very difficult to have taxon headers rendered in the same way as we have done for the standard query. However, we should continue to think about it. In the meantime, species on hover should be doable. Right now it shows up as "compare" which is rather bewildering.

frdougal commented 7 years ago

@jmcmurry good point, I've created a new bug for that issue #253

jmcmurry commented 7 years ago

Great! Just looked at the +/- feature. I love it! Nice work. I would just make one request, that when hovering over the (+) or the (-) that the text get bigger and bolder in order to signal to the user that it is clickable. A tooltip would not be out of place either ("Click to show just the matches for this species"), ("Go back to multi-species view").

Having said this, I'm wondering why we are not showing the number of hits instead of the (+)? I think a number would still be more informative.

jmcmurry commented 7 years ago

Also, in case there was any question, we can clearly bury the multiple sliders idea as this present approach is far better.

frdougal commented 7 years ago

@jmcmurry I'm glad you like it. I'll look into adding the hover and tooltip.

Regarding displaying counts, I would like @harryhoch to weigh in on this. The /compare model has a hard limit of 100 matches per species. This is mostly done for performance reasons. Most of the diseases will return headers that read Homo sapiens (100), Mus musculus (100), Danio rerio (100), Caenorhabditis elegans (X). Where C elegans typically has the least matches. We can add the counts, but I think they will be 100 for most of the species returned in the compare data.

jmcmurry commented 7 years ago

This is true, however, I still think I prefer the numbers. Don't jump on it though; I'm willing to be talked out of it if others disagree.

J

harryhoch commented 7 years ago

we could put in counts or just >100 if we have cut things off.

jmcmurry commented 7 years ago

I like that idea, thanks Harry.

frdougal commented 7 years ago

@jmcmurry I have not added the counts yet, but how does this look regarding the tooltips? screen shot 2016-07-27 at 11 59 59 am

screen shot 2016-07-27 at 12 00 16 pm

jmcmurry commented 7 years ago

Great! thanks Chuck.

frdougal commented 7 years ago

@jmcmurry @harryhoch now with counts: screen shot 2016-07-27 at 4 24 56 pm

jmcmurry commented 7 years ago

Looks great, but let's use Harry's suggestion and go with the text "(>100)" instead of "(100)" as this is more true to the actual data. I'm also suddenly second-guessing the actual text you chose. I'm wondering if introducing the words "targets" makes people think too hard about what we mean by it? Perhaps just "Show only Danio rerio results" and "Show results for all species". @harryhoch thoughts?

frdougal commented 7 years ago

@jmcmurry @harryhoch here are the latest screens

harryhoch commented 7 years ago

Thanks. I think that looks ok. Should way say “> 100 models”?

From: cborromeo Reply-To: monarch-initiative/phenogrid Date: Thursday, July 28, 2016 at 8:06 AM To: monarch-initiative/phenogrid Cc: Harry Hochheiser, Mention Subject: Re: [monarch-initiative/phenogrid] Add the total number of hits per taxon (#224)

@jmcmurryhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fjmcmurry&data=01%7c01%7charryh%40pitt.edu%7c6559612e97434efa120908d3b6dfa321%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=rbyYe3jAYPnSCc31XG5jE09fIWg870Swnzocnpg1b3s%3d @harryhochhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fharryhoch&data=01%7c01%7charryh%40pitt.edu%7c6559612e97434efa120908d3b6dfa321%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=hZi2eq10w4X6Qh%2fDNCshRPmwOUyertZ1eCoki%2fcfC1c%3d here are the latest screens [screen shot 2016-07-28 at 8 05 49 am]https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fcloud.githubusercontent.com%2fassets%2f5503816%2f17211943%2f2e009808-549a-11e6-831a-295ae5e2dbd2.png&data=01%7c01%7charryh%40pitt.edu%7c6559612e97434efa120908d3b6dfa321%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=VWLxPr8R6xD2KAsoHQaMRau2LLP5tJaTfzKc2DWxICg%3d

[screen shot 2016-07-28 at 8 05 28 am]https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fcloud.githubusercontent.com%2fassets%2f5503816%2f17211946%2f338555ac-549a-11e6-8ffe-6774f6449f3b.png&data=01%7c01%7charryh%40pitt.edu%7c6559612e97434efa120908d3b6dfa321%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=gGcr6LpH1zcAWIUi9oRxmlLpr0L2He2Ua%2b43luSMwHE%3d

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fmonarch-initiative%2fphenogrid%2fissues%2f224%23issuecomment-235876156&data=01%7c01%7charryh%40pitt.edu%7c6559612e97434efa120908d3b6dfa321%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=kX0RnYE338cZab6voOgWnp2geqZYc3nH%2bFs4mNv0P8k%3d, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fnotifications%2funsubscribe-auth%2fAAKqc3VfeDWdvddIw02NLLQcP8fjNtJGks5qaJtNgaJpZM4GvNo3&data=01%7c01%7charryh%40pitt.edu%7c6559612e97434efa120908d3b6dfa321%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=QaJTuaTfr%2bdX%2bafQ6CspJEjTNjJN38gKo%2bGzcKjPVvg%3d.

frdougal commented 7 years ago

@harryhoch @jmcmurry We need to be careful with the text on the screen. We have designed the phenogrid to be fairly generic. We have the ability to display several comparisons. I cannot reliably determine what data is being shown in the x axis and y axis. I don't know if the data in the axes represents diseases, phenotypes, genotypes, etc. Therefore, I can't be sure ">100 models" is right. It might be ">100 genotypes". The same goes for the tooltip "Click labels to show results for all species".

harryhoch commented 7 years ago

Ok. fair enough. let's leave it as is.

jmcmurry commented 7 years ago

Agreed. On a different note, the fact that the tooltip is showing when and where it does should make the word "click" be understood already. In fact, even without the word "click" I find it less confusing. Introducing the words "click on label" may unnecessarily introduce the question "which label?" I don't feel super strongly about it, but want to do the best we can to follow the "Don't make me think" paradigm. Thoughts? Am I overthinking this?

frdougal commented 7 years ago

@jmcmurry I'll adjust the tooltip text

monarch-initiative / phenogrid

Add the total number of hits per taxon #224