Improve sequence details page's UI

chaoran-chen commented 7 months ago

@corneliusroemer, @theosanderson - please share your ideas! :)

Summary of suggested improvements from the comments below:

[x] #1615
[ ] #1616
[x] #1618
[ ] #1617
[ ] #1619
[x] #1620
[ ] #1638
[x] #1699
[ ] #1767
[x] #1768
[ ] #1770
[x] #1772
[x] #1792
[ ] #1797
[ ] #1424

theosanderson commented 7 months ago

Result

So I'm imagining subheadings for various things.

(unheaded) frontmatter:

Isolate name
INSDC id
Released at
Data user terms
Country
Authors

Sequence info:

Length
Nucleotide substitutions
Nucleotide deletions
Amino acid substitutions

Additional metadata:

host
patient status/etc.

Implementation

I'd be imagining the config yaml would have

detailsPage:
      frontmatter:
            isolate_name
            insdc_id
      Sequence_Info:
            length:

etc.

theosanderson commented 7 months ago

Also, we should

display the sequence by default for small sequences (with useEffect() so it doesn't delay page-load)
but display the sequence in a scrollable box, not taking up infinite space
display the sequence in FASTA format with a header

chaoran-chen commented 7 months ago

Authors

For author lists, we probably want something like journals do:

i.e. abbreviated author lists can be expanded on click. I imagine we need the same features for the datasets page. Authors might have an orcid or email associated. So we need something that renders a list of structured author data with optional features like links to orcid etc. Ingested data from NCBI is going to be messy, but a subset of these features will still work.

Host

For the host, we can aggregate information like

Ncbi_host: Homo Sapiens
Ncbi_host_taxon: 9606
Ncbi_is_lab_host:

Into a field that looks like Homo Sapiens (9606) ({surveillance,laboratory,pool}) and links to NCBI Taxonomy data base. There is probably a dictionary to look-up common names which would be very useful (in particular if we target internationalization at some point).

Another group of fields could be on virus, lineage/clade/serotype etc.

INSDC

Yet another group of fields would be INSDC which would be based on the raw data in

Insdc accession base: OR084932
Insdc version: 1
INSDC accession: OR084932.1
NCBI_release_date: 2022-02-15
SRA accession
BioProject

I'd imagine a header INSDC and them something like

OR084932 (version 1, released on 2022-02-15)
SRA: unknown
BioProject: XXXXXX

Alignment states and QC metrics.

There will be several quality metrics like

completeness
mixed-sites
stop/frameshifts

And things like alignment length. The LANL HIV database for example includes little previews like this

(they actually put these into the table to search and browse).

Mutations, insertions, deletions

For mutations, I would follow a similar approach to authors: truncated lists that by default only span one line. Mutations could be rendered as little badge which makes them easier to parse than plain text C87665T. One line could be nucleotide mutations, then one line for each for each gene/CDS. This way uses can quickly find mutations in a particular gene (most of the time, people only care about a specific gene. Alternatively, the amino acid mutations could have a drop down in which you select the gene of interest (with a sensible default for each pathogen).

Insertions and deletions can be handled similarly, though they are typically fewer.

corneliusroemer commented 6 months ago

This PR is quite a good template for similar improvements - it shows how to pipe through new config options from values.yaml (kubernetes) to website: https://github.com/loculus-project/loculus/pull/1442/files

anna-parker commented 6 months ago

I started looking into this, from a short discussion with @corneliusroemer and @bh-ethz it would appear best to split this milestone into a couple sub-tasks.

[ ] Allow the metadata to be split into subsections with (optional) subheadings and further display options for individual subsections
[ ] Format authors lists using ORCID
[ ] Add links to the NCBI and INSDC data bases
[ ] Display sequences e.g. with alignment states and QC metrics in a scrollable form (in FASTA format with a header) with useEffect()
[ ] Display mutations, insertions and deletions from the reference sequence in more parsable format (see Richard's suggestions: https://github.com/loculus-project/loculus/issues/1465#issuecomment-2019823764). (This will potentially need to be further split into subtasks.)
[ ] Show originally submitted data somehow, e.g. in tooltip

Update: Added the tasks to the description to use github's subtask feature.

corneliusroemer commented 6 months ago

Great idea to split it up in chunks! I've added an extra list item to show originally submitted data somehow, e.g. in tooltip. I think this is something @emmahodcroft suggested. We always process user submitted metadata, it can stay unchanged but in general we might reformat, so it's good to have the original data around to make the processing transparent.

loculus-project / loculus