Redesigning access to the IDR

jrswedlow commented 5 years ago

The IDR has grown substantially since its initial release. As of this writing it has nearly 100 TB of data from over 50 studies. The IDR is heavily used, receiving more than 100,000 hits per day and several journals suggest/require their authors to submit data related to a pending publication to the IDR. With the already large and growing number of submissions to IDR, it’s time to consider how the datasets we hold are presented and how IDR can deliver the most value from its datasets to its users.

Search in the IDR The IDR holds a wide range of different types of studies. Through the efforts of IDR’s curation team, there are a number of concepts that have been consistently annotated in most of the studies, in most cases using controlled vocabularies. The following are a list of concepts that cross several studies.

Genes
Antibodies
Compounds (small molecules, drugs, inhibitors, agonist)
Phenotypes
Organism
Authors
siRNA
Cell lines
Organism part
Sex
Pathological state
Imaging modality (annotations of technology used for imaging have been asked repeatedly but are not yet consistently applied in the IDR).
License?

We expect many different types of users to approach and access data in the IDR either through the web browser-based UI or through the published API.

An experimental biologist might query IDR for genes or phenotypes hoping to gain access to images that represent genes that she/he are interested in.
A pathologist might use IDR to identify samples of images from a specific cancer.
An algorithm developer might use IDR to access images as sample images for developing new algorithm for processing imaging data from a specific imaging modality.

IDR aspires to support all of these different users and their different use cases. The annotation concepts listed above are also potential entry points to the IDR and thus the basis for search queries that different types of users might use to access and use data in the IDR. For these reasons, the access to IDR must be redesigned and redeveloped in order to provide tailored access to each of these types of users.

We are undertaking this process after IDR has achieved a significant acceptance and adoption by journals and is receiving a large and increasing number of submissions. Rather than design data access and UI concepts at the outset of the IDR project, we have waited until IDR held a reasonable sample of datasets to ensure we have a strong practical knowledge of the value that IDR datasets might bring and how we might best maximise that value.

Implementing search The annotation concepts listed above are candidate entry points for querying and identifying IDR data. We expect that as the number of datasets grows, the browser-based IDR UI will not provide sufficient value and will eventually obscure access to valuable IDR datasets (arguably this is happening already). Thus, the primary entry point for accessing IDR data should be search queries based on the annotation concepts listed above.

To identify possible templates for how search might work for IDR datasets, we are collecting examples of standardised UI concepts from commercial shopping and other interfaces where several hundred items have to be filtered and reliably found. As just one example, Figure 1 shows a left to right design where query terms and ranges are identified on the left side of the screen and results and further drill downs can be placed either in the top or middle pane. By no means is this the only approach that should be used and we should definitely explore alternatives for valuable search interface designs.

Figure 1. Carphone warehouse. Note category selectors as Tabs across top (In IDR, possibly different classes of datasets?), left side metadata filter (in IDR, possibly genes, antibodies and other Mapr queries), and rich, scrollable presentation of each choice in middle pane. Click “full plan details” open widget with more metadata, probably not applicable?

A different approach is at PhenoImageShare. Entry screen is minimalist and the search query accepts all types of queries in a single dialogue (Figure 2).

Figure 2. Entry Screen at phenoimageshare.org

Entering gene name, organism part, phenotype (note no auto-complete available) gives result table (Figure 3):

Figure 3. PhenoImageShare Result pane. Note that database is apparently down. Slides from Helen Parkinson are available.

Figure 4: Human Protein Atlas front portal Mix of search, browse by category and ‘image of the day’

[Note: Can continue to add examples of different approaches]

Usable Browsing With the volume of datasets in the IDR, simple browsing IDR studies is too difficult to gain sufficient exposure to IDR datasets. Nonetheless, we anticipate users might want to browse through datasets searching for different types of data. The current tree based browsing is less than desirable as it exposes the underlying PDI or SPW OMERO data model to the user and gives no visual queues until the user clicks below datasets/plates to see thumbnails. A visual design based on the (now defunct) JCB Data Viewer is worth considering here where each study is indicated by a large high resolution thumbnail with standardised text information super-imposed that provides a quick summary of the study data. Infinite pagination (or possibly “Load More” button) can be used to make this presentation scalable.

Classes of IDR Data With over 50 studies collected, it is now apparent that IDR is receiving datasets that can be collected into one of 3 different classes, namely Cells, Tissue, Embryo. To the best of our knowledge, these umbrellas can cover all of the different datasets so far received by IDR and thus by extension the datasets we are likely to receive in the future. This raises the question of the best way to leverage the similarity of the datasets in the IDR.

As a first attempt, we should consider building separate UI’s represented by cell-IDR, tissue-IDR , embryo-IDR to make the browsing of the datasets as straightforward as possible. Could these three URLs be 3 different categories that land user at the appropriate subset of the data (see the classes of phones and SIMs in Fig 1)? These high level titles would also provide users with a reasonable expectation of the types of data they were likely to see when either searching or browsing IDR data. The definition of which datasets appear in under the different categories will require appropriate annotation of IDR data.

UI as Reusable Infrastructure? The embeddable viewer and thumbnails, the reusable UI plug-ins (Mapr, etc) and others are examples of our commitment to building tools that can be re-used by others. Perhaps stating the obvious, but with others adopting IDR tech (Mapr, etc), any extensions to the UI should be reusable components (pip installable etc).

francesw commented 5 years ago

For front page, could have gallery view like Allen Brain Atlas.

will-moore commented 5 years ago

Some notes from today's discussion:

Landing page gives you a choice of "Cells" or "Tissues" categories (and a Search box)
The Cells and Tissues pages (url e.g. /cells/) then allow you to filter, e.g. by Author, Gene etc.
- Q: can you filter by Author, Gene etc from the landing page?
In the first implementation, from (filtered) list of studies, a link takes you to e.g. webclient UI /webclient/?show=project-123
If we filter by mapr, we instead want to link to the equivalent state in Mapr, e.g. /mapr/gene/?value=CDC14&show=project-123 with the clicked project selected.
In later implementations, we can add pages to show 'studies' and to support more of mapr functionality. Eventually this could extend so that we don't need to use webclient UI (so we need to plan for this becoming a large, complex UI. Not just a landing page).
We want this UI/app to be generally useful for any OMERO install (similar to https://github.com/ome/omero-gallery) so we don't want to hard-code anything into the UI.
Could use "OMERO.figures" to illustrate a study and provide links to the most interesting images.

will-moore commented 5 years ago

Started playing with some ideas on an omero-gallery branch: https://github.com/ome/omero-gallery/pull/25

This was how it looked at the first meeting above:

Screen Shot 2019-04-26 at 13 40 14

Features:

List ALL studies (screens and projects, loaded via JSON api) sorted by name in reverse - most recent at the top.
Each study is shown with Title and extra text derived from the Description parsed by regex etc. since the Project/Screen names are not nicely human readable.
Also load Map Annotations linked to each study. Use these to filter studies by Key (see screenshot, filtering by Publication Authors)
When filtering by a Key, the Key: Value is shown for each study.
We use mapr config to add extra keys for filtering (see this in action in other screenshots below).

will-moore commented 5 years ago

Now at https://github.com/ome/omero-gallery/pull/25/commits/483de92a6d14f9225198a759d60933cf49170e95 The main feature changes are:

Show a more condensed view for each study. Includes thumbnail but doesn't show Description or other info such as Authors.
Links to /cells or /tissues sections
Screenshot shows filtering by mapr. Choosing to filter by "Gene", the input uses auto-complete to pick known gene (same as mapr behaviour) and the list of studies gets filtered using mapr API.
Each study now links to e.g. /mapr/gene/?value=CDC14&show=screen-1551 (see https://github.com/ome/omero-mapr/pull/45) to show the study within a Gene filter.

Screen Shot 2019-04-26 at 13 44 06

When you go to the /cells or /tissues sections, we filter (hard-coded at the moment). NB: not sure where lightsheet imaging of embryos etc should go?

Screen Shot 2019-04-26 at 13 45 13

NB: We can still filter by study Key: Value or by mapr on this page, just as on the home-page above.

Discussion with @jburel, @pwalczysko & @rgozim:

Instead of showing ALL studies on the home-page, we could show by categories. E.g. most recent, light-sheet, EM, good-for-segmentation, most-downloaded, etc. similar to netflix or youtube home pages.
however, this wouldn't lend itself to filtering ALL studies by Key:Value (e.g. Author) or mapr as above.
Maybe have links to filter by mapr (as we do in webclient)?

will-moore commented 5 years ago

Showing a bunch of categories: https://github.com/ome/omero-gallery/pull/25/commits/b8b7dd3ddf98d4403878fde6f68951dcf8e64d91

Screen Shot 2019-04-29 at 15 37 48

will-moore commented 5 years ago

Questions to consider in today's meeting:

Do we need any backend changes for the first 'release'?
Does the current prototype functionality cover most of what we want for first release?
When are we aiming for first release?
Do we want to exactly match look of http://idr.openmicroscopy.org/about/ (same layout, css etc)?
New github repo? Name?

will-moore commented 5 years ago

PR now available for testing at http://web-dev-merge.openmicroscopy.org/gallery/idr/

I'm using Map Annotations on the studies to select them for inclusion in each category, using these queries, where each Key:Value clause simply checks for the presence of the "Value" string in the full value text. Supports AND and OR:

const CATEGORIES = [
  {"label": "Time-lapse", "query": "Study Type:time OR Study Type:5D OR Study Type:3D-tracking"},
  {"label": "Light sheet", "query": "Study Type:light sheet"},
  {"label": "Protein localization", "query": "Study Type:protein localization"},
  {"label": "Histology", "query": "Study Type:histology"},
  {"label": "Yeast", "query": "Organism:Saccharomyces cerevisiae OR Organism:Schizosaccharomyces pombe"},
  {"label": "Human Cell Screen", "query": "Organism:Homo sapiens AND Study Type:high content screen"},
]

Also, mousing over each study zooms the image and shows a link to the image viewer and lists the authors below:

Screen Shot 2019-05-03 at 10 43 54

joshmoore commented 5 years ago

Posted to image.sc: https://forum.image.sc/t/idr-update-includes-gallery-of-imaging-data/26729

ome / design

Redesigning access to the IDR #100