microbiomedata / nmdc-server

Data portal client and server for NMDC.
https://data.microbiomedata.org
Other
9 stars 0 forks source link

Study page buttons #290

Closed pvangay closed 3 years ago

pvangay commented 3 years ago

These buttons on the Stegen study page are clickable but don't do anything. Not sure what the original intention was? image

The numbers on the study page also don't seem to quite match up with what's in the search results: image

subdavis commented 3 years ago

Buttons are a known issue and will be fixed, just wanted to get something up quick to help with #263 discussion.

Second issue may require @jbeezley to comment, unless you just mean that the titles aren't consistent, which should be fixed soon.

pvangay commented 3 years ago

@subdavis Thanks. Second issue - I meant that the number of each omic type seems off (i.e. 42 metabolomics on the study page versus 34 metabolomics from the results page).

jbeezley commented 3 years ago

The backend is counting slightly different things. The 42 is the number of omics_processings with omics_type = Metabolomics. The 34 are the number of metabolomics analysis workflow executions associated with those omics_processings. In other words, 8 omics_processings have no associated metabolomics analysis workflow executions. These are:

emsl:499399
emsl:499400
emsl:499433
emsl:499449
emsl:499610
emsl:499615
emsl:499616
emsl:499626

@dehays Is this an error in the data? If not, which of the two numbers would you expect to see on the UI?

dehays commented 3 years ago

@pvangay As @jbeezley pointed out - you are comparing EMSL instrument / JGI sequencer runs with analysis workflow executions which don't necessarily need to match up BUT when they do not it is very suspicious. Given the Stegen example above - I'd say it is strange beyond metabolomics (42 instrument runs vs 34 analysis execution) - JGI did sequencing for 50 metagenomes which aligns with 50 Read QC, Read-based taxonomy, assemblies, and MAGs. But why would there only be 35 metaG annotation workflow activities. I'd expect 50 metaG annotations. Big mismatch on NOM 1111 instrument runs vs 230 analysis activities. (@subdavis The Study info page still displays "Organic Matter Characterization" rather than "Natural Organic Matter")

I'll step back and see look at JSON in the Mongo instance that ingest pulls from and then at the JSON supplied from A2. Feels like for metaG annotation and metaB analysis there are analysis activity docs missing. For NOM, my first guess based on the numbers is that we still have a bunch of extraneous Stegen instrument runs included beyond those for which analysis was performed.

dehays commented 3 years ago

Ok - what I see in Mongo is that the number of metaG annotations displayed does not match what I see in Mongo, but the metaB analysis does.

@jbeezley for Stegen metaG annotations - I would expect to see 50 based on what I see in Mongo. Reloading your Postgres tables from dwinston_share.metagenome_annotation_activity_set should fix that.

I will investigate why we only have 34 metaB analysis metadata docs for Stegen rather than the expected 42 (Why do we not have analysis for all 42 metaB instrument runs.)

>>> stegen_metag_projects_docs = list(db.omics_processing_set.find({"part_of": ["gold:Gs0114663"], "omics_type":"Metagenome"}, ["id", "has_input", "has_output", "omics_type"]))
>>> len(stegen_metag_projects_docs)
50
>>> stegen_metaB_projects_docs = list(db.omics_processing_set.find({"part_of": ["gold:Gs0114663"], "omics_type":"Metabolomics"}, ["id", "has_input", "has_output", "omics_type"]))
>>> len(stegen_metaB_projects_docs)
42
>>> stegen_metaG_project_ids = [d["id"] for d in stegen_metag_projects_docs]
>>> len(stegen_metaG_project_ids)
50
>>> stegen_metaB_project_ids = [d["id"] for d in stegen_metaB_projects_docs]
>>> len(stegen_metaB_project_ids)
42

>>> stegen_mg_annotations = list(db.metagenome_annotation_activity_set.find({"was_informed_by": {"$in": stegen_metaG_project_ids}}))
>>> len(stegen_mg_annotations)
50
>>> stegen_metaB_analysis = list(db.metabolomics_analysis_activity_set.find({"was_informed_by": {"$in": stegen_metaB_project_ids}}))
>>> len(stegen_metaB_analysis)
34
dehays commented 3 years ago

All of Yuri's Stegen metaB analysis docs are in Mongo. But when looking at metaB instrument runs for Stegen - there are 34 with sample input and 8 without.

{"part_of": ["gold:Gs0114663"], "omics_type":"Metabolomics", "has_input":null} FYI pymongo doesn't like that filter due to the null. But MongoDB Compass has no problem with it.

Bottom line - I think we have a much larger set of omics_processing for metaB, NOM and metaP than we do metaB analysis, NOM analysis and metaP analysis. Those EMSL datasets used as input for analysis workflows may be consistent with those EMSL datasets for which input samples have been identified (at least in the metaB case). I'll look at NOM and metaP and at those related to the Brodie study - I expect that a number of omics_processing (those without known inputs) can be removed. I'll check to see if any cases exist where analysis was done omics_processing that lacked input samples.