microbiomedata / nmdc-metadata

Managing metadata and policy around metadata in NMDC
https://microbiomedata.github.io/nmdc-schema/
Other
2 stars 0 forks source link

Add descriptions for Data Portal tool tips #262

Open kfagnan opened 3 years ago

kfagnan commented 3 years ago

@jbeezley @subdavis @jeffbaumes @wdduncan @dwinston @dehays Tagging you all as I'm not sure how you all will want to implement this.

I'm including tables that map some of the terms on the website to more human-readable descriptions that we're hoping to incorporate into "tool tips" or some kind of alternate text.

Metagenome Output Existing entry | new text

filterStats.txt | Reads QC summary statistics 1781_86101.filtered.fastq.gz | Reads QC result fastq (clean data) mapping_stats.txt | Assembled contigs coverage information assembly_contigs.fna | Final assembly contigs fasta assembly_scaffolds.fna | Final assembly scaffolds fasta assembly.agp | An AGP format file describes the assembly pairedMapped_sorted.bam | Sorted bam file of reads mapping back to the final assembly KO TSV | Tab delimited file for KO annotation. EC TSV | Tab delimited file for EC annotation. Protein FAA | FASTA amino acid file for annotated proteins.

Metaproteome files Existing entry | new text

MSGFjobs_MASIC_resultant.tsv | Tab delimited file of unfiltered metaproteomics results, both identifications and abundances 500088_1781_100336_Peptide_Report.tsv | Tab delimited file of peptide results filtered to ~5% FDR, including protein and abundance information 500088_1781_100336_Protein_Report.tsv | Tab delimited file of protein results derived from ~5% FDR filtered peptide data, including aggregated abundance information 500088_1781_100336_QC_metrics.tsv | Tab delimited file of aggregate statistics derived from workflow results

subdavis commented 3 years ago

Thanks! I'll incorporate this and hopefully provide some options.

I don't know how easy it will be to match to individual data objects, but at a minimum we can have these in a table in a dialog callable from the data object list area.

kfagnan commented 3 years ago

One more thing - see https://github.com/microbiomedata/nmdc-server/issues/274

This contains an image suggesting these descriptions be added to a column.

dwinston commented 3 years ago

@jbeezley there is a new MongoDB collection notes with documents of the form

{
    "_id" : ObjectId("602d502525261d62addcd830"),
    "sh:pattern" : "filterStats.txt",
    "skos:note" : "Reads QC summary statistics",
    "@context" : {
        "skos" : "http://www.w3.org/2004/02/skos/core#",
        "sh" : "http://www.w3.org/ns/shacl#"
    }
}

that is referenced by a new field _note in a data_object_set document, e.g.

{
    "_id" : ObjectId("602551d225261d62add17d66"),
    "id" : "nmdc:ae40d7ae535c92b6d347915d8b1ac125",
    "name" : "filterStats.txt",
    "description" : "Filtered read data stats for gold:Gp0061273",
    "file_size_bytes" : 290,
    "url" : "https://data.microbiomedata.org/data/1472_51277/qa/filterStats.txt",
    "type" : "nmdc:DataObject",
    "_note" : {
        "ref" : "notes",
        "id" : ObjectId("602d502525261d62addcd830")
    }
}

The _note field has this schema:

{
    "type": "object",
    "description": "https://docs.mongodb.com/manual/reference/database-references/#dbrefs",
    "required": ["ref", "id"],
    "properties": {
        # XXX $jsonSchema incompatible with $ref and $id convention
        "ref": {"type": "string"},
        "id": {"bsonType": "objectId"},
    }
}

@dehays @wdduncan The workflow, from pattern-note map definition to db update, is in this notebook (its commit references this issue). What is a more stable place in this repo for the pattern-note map (@kfagnan's tables above) to live, that the workflow can source?

dwinston commented 3 years ago

@kfagnan to accommodate the left-most sticky note on microbiomedata/nmdc-server#274, we would want an additional attribute, a preferred label, to associate with a data object name pattern, in addition to associating the note.

wdduncan commented 3 years ago

@dwinston I'm not opposed to this approach, but perhaps instead of calling it "note" could call it "tooltip" instead?

Or you can make use of annotations from the semantic web space for refining various kinds of notes. E.g.:

dwinston commented 3 years ago

@wdduncan yes, I love the move to generalized annotation for various UI contexts, so e.g. could rename to _anno in data_object_set and reference an annotations collection instead of notes? I'll leave as is for now, but can refactor once you, @cmungall, et al. decide on an approach.

wdduncan commented 3 years ago

Let's discuss more post GSP. I've added this item to ticket #257
Given the short time span we have until GSP, I suggest we go with the approach that Kitware thinks will work best.

dehays commented 3 years ago

I feel we are considering 1) the short term - 'have the feature for GSP' and the 2) actual solution that will require discussing how to include display strings in the JSON documents.

For now (until GSP) - I don't want to derail Brandon's intended implementation plans.

subdavis commented 3 years ago

A version of a fix was deployed.

Screenshot from 2021-02-17 15-56-55

pvangay commented 3 years ago

@subdavis - here's the promised content! I hope this is the right place to put this. And sorry, as I was writing some of this out, I realized I might be asking for more features - let me know if they need to go into their own issues. Please correct the text if this is not indeed what it represents.

Visualization Help icon pop-up text Notes for Kitware
image Displays the number of samples for each data type available. Click on a bar to filter by data type.
image Displays geographical location (latitude, longitude) and sample size (as indicated by the size of the point). Click on a point to filter by a group of samples. Please add legend with values for color and point sizes. Are the colors supposed to represent different sample types? If Brodie’s data had more resolution with their lat/long values, would you still show his samples as 1 large point (and then disaggregate the points when the user zooms in?)
image Scroll the slider to narrow in on a sample collection date range. add x and y labels (x = sample collection date, y = number of samples) I was expecting the dates to match up with where the sliders were - but it seems as though they’re matching up with the dates at the edges of the visualizations which felt a little non-intuitive.
image This upset plot shows the number of samples with corresponding omic data associated. For example: there are 43 samples from 1 study that have metagenomics, metatranscriptomics, and natural organic matter characterizations. Can a legend be added? If not, you can probably lump this into the help text: MB = metabolomicsMG = metagenomicsMP = metaprotomicsMT = metatranscriptomicsNOM = natural organic matter characterizations
subdavis commented 3 years ago

Thanks, I'll get these updated soon. I think many of the items from our meeting today will spin off other issues. We're planning to get those in order at our next internal planning meeting.