ome / design

OME Design proposals
http://ome.github.io/design/
1 stars 15 forks source link

Redesigning access to the IDR #100

Closed jrswedlow closed 5 years ago

jrswedlow commented 5 years ago

The IDR has grown substantially since its initial release. As of this writing it has nearly 100 TB of data from over 50 studies. The IDR is heavily used, receiving more than 100,000 hits per day and several journals suggest/require their authors to submit data related to a pending publication to the IDR. With the already large and growing number of submissions to IDR, it’s time to consider how the datasets we hold are presented and how IDR can deliver the most value from its datasets to its users.

Search in the IDR The IDR holds a wide range of different types of studies. Through the efforts of IDR’s curation team, there are a number of concepts that have been consistently annotated in most of the studies, in most cases using controlled vocabularies. The following are a list of concepts that cross several studies.

  1. Genes
  2. Antibodies
  3. Compounds (small molecules, drugs, inhibitors, agonist)
  4. Phenotypes
  5. Organism
  6. Authors
  7. siRNA
  8. Cell lines
  9. Organism part
  10. Sex
  11. Pathological state
  12. Imaging modality (annotations of technology used for imaging have been asked repeatedly but are not yet consistently applied in the IDR).
  13. License?

We expect many different types of users to approach and access data in the IDR either through the web browser-based UI or through the published API.

IDR aspires to support all of these different users and their different use cases. The annotation concepts listed above are also potential entry points to the IDR and thus the basis for search queries that different types of users might use to access and use data in the IDR. For these reasons, the access to IDR must be redesigned and redeveloped in order to provide tailored access to each of these types of users.

We are undertaking this process after IDR has achieved a significant acceptance and adoption by journals and is receiving a large and increasing number of submissions. Rather than design data access and UI concepts at the outset of the IDR project, we have waited until IDR held a reasonable sample of datasets to ensure we have a strong practical knowledge of the value that IDR datasets might bring and how we might best maximise that value.

Implementing search The annotation concepts listed above are candidate entry points for querying and identifying IDR data. We expect that as the number of datasets grows, the browser-based IDR UI will not provide sufficient value and will eventually obscure access to valuable IDR datasets (arguably this is happening already). Thus, the primary entry point for accessing IDR data should be search queries based on the annotation concepts listed above.

To identify possible templates for how search might work for IDR datasets, we are collecting examples of standardised UI concepts from commercial shopping and other interfaces where several hundred items have to be filtered and reliably found. As just one example, Figure 1 shows a left to right design where query terms and ranges are identified on the left side of the screen and results and further drill downs can be placed either in the top or middle pane. By no means is this the only approach that should be used and we should definitely explore alternatives for valuable search interface designs.

image Figure 1. Carphone warehouse. Note category selectors as Tabs across top (In IDR, possibly different classes of datasets?), left side metadata filter (in IDR, possibly genes, antibodies and other Mapr queries), and rich, scrollable presentation of each choice in middle pane. Click “full plan details” open widget with more metadata, probably not applicable?

A different approach is at PhenoImageShare. Entry screen is minimalist and the search query accepts all types of queries in a single dialogue (Figure 2).

image Figure 2. Entry Screen at phenoimageshare.org

Entering gene name, organism part, phenotype (note no auto-complete available) gives result table (Figure 3):

image Figure 3. PhenoImageShare Result pane. Note that database is apparently down. Slides from Helen Parkinson are available.

image Figure 4: Human Protein Atlas front portal Mix of search, browse by category and ‘image of the day’

[Note: Can continue to add examples of different approaches]

Usable Browsing With the volume of datasets in the IDR, simple browsing IDR studies is too difficult to gain sufficient exposure to IDR datasets. Nonetheless, we anticipate users might want to browse through datasets searching for different types of data. The current tree based browsing is less than desirable as it exposes the underlying PDI or SPW OMERO data model to the user and gives no visual queues until the user clicks below datasets/plates to see thumbnails. A visual design based on the (now defunct) JCB Data Viewer is worth considering here where each study is indicated by a large high resolution thumbnail with standardised text information super-imposed that provides a quick summary of the study data. Infinite pagination (or possibly “Load More” button) can be used to make this presentation scalable.

Classes of IDR Data With over 50 studies collected, it is now apparent that IDR is receiving datasets that can be collected into one of 3 different classes, namely Cells, Tissue, Embryo. To the best of our knowledge, these umbrellas can cover all of the different datasets so far received by IDR and thus by extension the datasets we are likely to receive in the future. This raises the question of the best way to leverage the similarity of the datasets in the IDR.

As a first attempt, we should consider building separate UI’s represented by cell-IDR, tissue-IDR , embryo-IDR to make the browsing of the datasets as straightforward as possible. Could these three URLs be 3 different categories that land user at the appropriate subset of the data (see the classes of phones and SIMs in Fig 1)? These high level titles would also provide users with a reasonable expectation of the types of data they were likely to see when either searching or browsing IDR data. The definition of which datasets appear in under the different categories will require appropriate annotation of IDR data.

UI as Reusable Infrastructure? The embeddable viewer and thumbnails, the reusable UI plug-ins (Mapr, etc) and others are examples of our commitment to building tools that can be re-used by others. Perhaps stating the obvious, but with others adopting IDR tech (Mapr, etc), any extensions to the UI should be reusable components (pip installable etc).

francesw commented 5 years ago

For front page, could have gallery view like Allen Brain Atlas.

Screen Shot 2019-04-17 at 11 15 05
will-moore commented 5 years ago

Some notes from today's discussion:

will-moore commented 5 years ago

Started playing with some ideas on an omero-gallery branch: https://github.com/ome/omero-gallery/pull/25

This was how it looked at the first meeting above:

Screen Shot 2019-04-26 at 13 40 14

Features:

will-moore commented 5 years ago

Now at https://github.com/ome/omero-gallery/pull/25/commits/483de92a6d14f9225198a759d60933cf49170e95 The main feature changes are:

Screen Shot 2019-04-26 at 13 44 06

When you go to the /cells or /tissues sections, we filter (hard-coded at the moment). NB: not sure where lightsheet imaging of embryos etc should go?

Screen Shot 2019-04-26 at 13 45 13

Discussion with @jburel, @pwalczysko & @rgozim:

will-moore commented 5 years ago

Showing a bunch of categories: https://github.com/ome/omero-gallery/pull/25/commits/b8b7dd3ddf98d4403878fde6f68951dcf8e64d91

Screen Shot 2019-04-29 at 15 37 48

will-moore commented 5 years ago

Questions to consider in today's meeting:

will-moore commented 5 years ago

PR now available for testing at http://web-dev-merge.openmicroscopy.org/gallery/idr/

I'm using Map Annotations on the studies to select them for inclusion in each category, using these queries, where each Key:Value clause simply checks for the presence of the "Value" string in the full value text. Supports AND and OR:

const CATEGORIES = [
  {"label": "Time-lapse", "query": "Study Type:time OR Study Type:5D OR Study Type:3D-tracking"},
  {"label": "Light sheet", "query": "Study Type:light sheet"},
  {"label": "Protein localization", "query": "Study Type:protein localization"},
  {"label": "Histology", "query": "Study Type:histology"},
  {"label": "Yeast", "query": "Organism:Saccharomyces cerevisiae OR Organism:Schizosaccharomyces pombe"},
  {"label": "Human Cell Screen", "query": "Organism:Homo sapiens AND Study Type:high content screen"},
]

Also, mousing over each study zooms the image and shows a link to the image viewer and lists the authors below:

Screen Shot 2019-05-03 at 10 43 54

joshmoore commented 5 years ago

Posted to image.sc: https://forum.image.sc/t/idr-update-includes-gallery-of-imaging-data/26729