loculus-project / loculus

An open-source software package to power microbial genomic databases
https://loculus.org
GNU Affero General Public License v3.0
36 stars 2 forks source link

Improve sequence details page's UI #1465

Open chaoran-chen opened 7 months ago

chaoran-chen commented 7 months ago

@corneliusroemer, @theosanderson - please share your ideas! :)

Summary of suggested improvements from the comments below:

theosanderson commented 7 months ago

Result

So I'm imagining subheadings for various things.

(unheaded) frontmatter:

Sequence info:

Additional metadata:

Implementation

I'd be imagining the config yaml would have

detailsPage:
      frontmatter:
            isolate_name
            insdc_id
      Sequence_Info:
            length: 

etc.

theosanderson commented 7 months ago

Also, we should

chaoran-chen commented 7 months ago

See also #100 from July last year.

rneher commented 7 months ago

Here are a few ideas on how to improve the sequence page.

Authors

For author lists, we probably want something like journals do: image

i.e. abbreviated author lists can be expanded on click. I imagine we need the same features for the datasets page. Authors might have an orcid or email associated. So we need something that renders a list of structured author data with optional features like links to orcid etc. Ingested data from NCBI is going to be messy, but a subset of these features will still work.

Host

For the host, we can aggregate information like

Into a field that looks like Homo Sapiens (9606) ({surveillance,laboratory,pool}) and links to NCBI Taxonomy data base. There is probably a dictionary to look-up common names which would be very useful (in particular if we target internationalization at some point).

Another group of fields could be on virus, lineage/clade/serotype etc.

INSDC

Yet another group of fields would be INSDC which would be based on the raw data in

I'd imagine a header INSDC and them something like

Alignment states and QC metrics.

There will be several quality metrics like

And things like alignment length. The LANL HIV database for example includes little previews like this image

(they actually put these into the table to search and browse).

Mutations, insertions, deletions

For mutations, I would follow a similar approach to authors: truncated lists that by default only span one line. Mutations could be rendered as little badge which makes them easier to parse than plain text C87665T. One line could be nucleotide mutations, then one line for each for each gene/CDS. This way uses can quickly find mutations in a particular gene (most of the time, people only care about a specific gene. Alternatively, the amino acid mutations could have a drop down in which you select the gene of interest (with a sensible default for each pathogen).

Insertions and deletions can be handled similarly, though they are typically fewer.

corneliusroemer commented 6 months ago

This PR is quite a good template for similar improvements - it shows how to pipe through new config options from values.yaml (kubernetes) to website: https://github.com/loculus-project/loculus/pull/1442/files

anna-parker commented 6 months ago

I started looking into this, from a short discussion with @corneliusroemer and @bh-ethz it would appear best to split this milestone into a couple sub-tasks.

Update: Added the tasks to the description to use github's subtask feature.

corneliusroemer commented 6 months ago

Great idea to split it up in chunks! I've added an extra list item to show originally submitted data somehow, e.g. in tooltip. I think this is something @emmahodcroft suggested. We always process user submitted metadata, it can stay unchanged but in general we might reformat, so it's good to have the original data around to make the processing transparent.