Open kfagnan opened 3 years ago
Thanks! I'll incorporate this and hopefully provide some options.
I don't know how easy it will be to match to individual data objects, but at a minimum we can have these in a table in a dialog callable from the data object list area.
One more thing - see https://github.com/microbiomedata/nmdc-server/issues/274
This contains an image suggesting these descriptions be added to a column.
@jbeezley there is a new MongoDB collection notes
with documents of the form
{
"_id" : ObjectId("602d502525261d62addcd830"),
"sh:pattern" : "filterStats.txt",
"skos:note" : "Reads QC summary statistics",
"@context" : {
"skos" : "http://www.w3.org/2004/02/skos/core#",
"sh" : "http://www.w3.org/ns/shacl#"
}
}
that is referenced by a new field _note
in a data_object_set
document, e.g.
{
"_id" : ObjectId("602551d225261d62add17d66"),
"id" : "nmdc:ae40d7ae535c92b6d347915d8b1ac125",
"name" : "filterStats.txt",
"description" : "Filtered read data stats for gold:Gp0061273",
"file_size_bytes" : 290,
"url" : "https://data.microbiomedata.org/data/1472_51277/qa/filterStats.txt",
"type" : "nmdc:DataObject",
"_note" : {
"ref" : "notes",
"id" : ObjectId("602d502525261d62addcd830")
}
}
The _note
field has this schema:
{
"type": "object",
"description": "https://docs.mongodb.com/manual/reference/database-references/#dbrefs",
"required": ["ref", "id"],
"properties": {
# XXX $jsonSchema incompatible with $ref and $id convention
"ref": {"type": "string"},
"id": {"bsonType": "objectId"},
}
}
@dehays @wdduncan The workflow, from pattern-note map definition to db update, is in this notebook (its commit references this issue). What is a more stable place in this repo for the pattern-note map (@kfagnan's tables above) to live, that the workflow can source?
@kfagnan to accommodate the left-most sticky note on microbiomedata/nmdc-server#274, we would want an additional attribute, a preferred label, to associate with a data object name pattern, in addition to associating the note.
@dwinston I'm not opposed to this approach, but perhaps instead of calling it "note" could call it "tooltip" instead?
Or you can make use of annotations from the semantic web space for refining various kinds of notes. E.g.:
@wdduncan yes, I love the move to generalized annotation for various UI contexts, so e.g. could rename to _anno
in data_object_set
and reference an annotations
collection instead of notes
? I'll leave as is for now, but can refactor once you, @cmungall, et al. decide on an approach.
Let's discuss more post GSP. I've added this item to ticket #257
Given the short time span we have until GSP, I suggest we go with the approach that Kitware thinks will work best.
I feel we are considering 1) the short term - 'have the feature for GSP' and the 2) actual solution that will require discussing how to include display strings in the JSON documents.
For now (until GSP) - I don't want to derail Brandon's intended implementation plans.
A version of a fix was deployed.
@subdavis - here's the promised content! I hope this is the right place to put this. And sorry, as I was writing some of this out, I realized I might be asking for more features - let me know if they need to go into their own issues. Please correct the text if this is not indeed what it represents.
Visualization | Help icon pop-up text | Notes for Kitware |
---|---|---|
Displays the number of samples for each data type available. Click on a bar to filter by data type. | ||
Displays geographical location (latitude, longitude) and sample size (as indicated by the size of the point). Click on a point to filter by a group of samples. | Please add legend with values for color and point sizes. Are the colors supposed to represent different sample types? If Brodie’s data had more resolution with their lat/long values, would you still show his samples as 1 large point (and then disaggregate the points when the user zooms in?) | |
Scroll the slider to narrow in on a sample collection date range. | add x and y labels (x = sample collection date, y = number of samples) I was expecting the dates to match up with where the sliders were - but it seems as though they’re matching up with the dates at the edges of the visualizations which felt a little non-intuitive. | |
This upset plot shows the number of samples with corresponding omic data associated. For example: there are 43 samples from 1 study that have metagenomics, metatranscriptomics, and natural organic matter characterizations. | Can a legend be added? If not, you can probably lump this into the help text: MB = metabolomicsMG = metagenomicsMP = metaprotomicsMT = metatranscriptomicsNOM = natural organic matter characterizations |
Thanks, I'll get these updated soon. I think many of the items from our meeting today will spin off other issues. We're planning to get those in order at our next internal planning meeting.
@jbeezley @subdavis @jeffbaumes @wdduncan @dwinston @dehays Tagging you all as I'm not sure how you all will want to implement this.
I'm including tables that map some of the terms on the website to more human-readable descriptions that we're hoping to incorporate into "tool tips" or some kind of alternate text.
Metagenome Output Existing entry | new text
filterStats.txt | Reads QC summary statistics 1781_86101.filtered.fastq.gz | Reads QC result fastq (clean data) mapping_stats.txt | Assembled contigs coverage information assembly_contigs.fna | Final assembly contigs fasta assembly_scaffolds.fna | Final assembly scaffolds fasta assembly.agp | An AGP format file describes the assembly pairedMapped_sorted.bam | Sorted bam file of reads mapping back to the final assembly KO TSV | Tab delimited file for KO annotation. EC TSV | Tab delimited file for EC annotation. Protein FAA | FASTA amino acid file for annotated proteins.
Metaproteome files Existing entry | new text
MSGFjobs_MASIC_resultant.tsv | Tab delimited file of unfiltered metaproteomics results, both identifications and abundances 500088_1781_100336_Peptide_Report.tsv | Tab delimited file of peptide results filtered to ~5% FDR, including protein and abundance information 500088_1781_100336_Protein_Report.tsv | Tab delimited file of protein results derived from ~5% FDR filtered peptide data, including aggregated abundance information 500088_1781_100336_QC_metrics.tsv | Tab delimited file of aggregate statistics derived from workflow results