hurwitzlab / imicrobe-lib

General purpose scripts and libraries for iMicrobe.
http://imicrobe.us
Other
1 stars 1 forks source link

CameraMetadata_ENVO_working_copy and iMicrobe site #1

Open cmungall opened 7 years ago

cmungall commented 7 years ago

I'm seeing inconsistencies between https://raw.githubusercontent.com/hurwitzlab/imicrobe-lib/master/docs/mapping_files/CameraMetadata_ENVO_working_copy.csv and the site. E.g.

          SAMPLE_ACC: CAM_SMPL_002668
  SAMPLE_DESCRIPTION: ALVINELLA - Alvinella Pompejana Epibionts
         DESCRIPTION: 
    SITE_DESCRIPTION: Hydrothermal Vent
              REGION: Eastern Pacific Ocean
        HABITAT_NAME: hydrothermal vent
         biome_label: marine hydrothermal vent biome
            biome_id: ENVO:01000030
environmental_material_label: water
environmental_material_id: ENVO:00002006
environmental_feature_label: marine hydrothermal vent
environmental_feature_id: ENVO:01000122
  suggest new terms^M: 

this makes sense, however what is presumably the same record:

https://www.imicrobe.us/sample/view/3

has some odd annotations:

image

There are some annotations on the site that are not in the file, e.g.

https://www.imicrobe.us/sample/view/1

image

I think the file is correct but there is a bug in the site, not sure how to report this.

cmungall commented 7 years ago

This is for https://github.com/EnvironmentOntology/envo/issues/318

cmungall commented 7 years ago

OK, this is odd:

          SAMPLE_ACC: ANTARCTICAAQUATIC_SMPL_SITE1
  SAMPLE_DESCRIPTION: 
         DESCRIPTION: 
    SITE_DESCRIPTION: saltwater manmade
              REGION: Arizona
        HABITAT_NAME: saline water
         biome_label: anthropogenic terrestrial biome
            biome_id: ENVO:01000219
environmental_material_label: saline water
environmental_material_id: ENVO:00002010
environmental_feature_label: anthropogenic abiotic mesoscopic feature
environmental_feature_id: ENVO:00003075

Which I think corresponds to https://www.imicrobe.us/sample/view/44

image

totally confused... is this sample in Arizona, Australia or Antartica? Is it a manmade lake? Made by Australians in Antartica...?

jklynch commented 7 years ago

@cmungall thank you for pointing this out! The developer who handles this aspect of the site is out until Monday. We'll sit down to look at it then.

bhurwitz33 commented 7 years ago

Hi Chris,

Thanks Chris. Ken will check into this on Monday. Appreciate you pointing out!

Bonnie

cmungall commented 7 years ago

Hi @bhurwitz33 - any luck tracking down what's going on here?

bhurwitz33 commented 7 years ago

Ken,

Any ideas? Bonnie

Get Outlook for iOShttps://aka.ms/o0ukef


From: Chris Mungall notifications@github.com Sent: Wednesday, October 11, 2017 7:51 PM Subject: Re: [hurwitzlab/imicrobe-lib] CameraMetadata_ENVO_working_copy and iMicrobe site (#1) To: hurwitzlab/imicrobe-lib imicrobe-lib@noreply.github.com Cc: Bonnie Hurwitz bonnie.hurwitz@gmail.com, Mention mention@noreply.github.com

Hi @bhurwitz33https://github.com/bhurwitz33 - any luck tracking down what's going on here?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/hurwitzlab/imicrobe-lib/issues/1#issuecomment-336005729, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEe1FpaOratazdptSwCTxdbX_E7YKybbks5srX7JgaJpZM4OXsb-.

kyclark commented 7 years ago

First off, I can honestly say that I have no recollection of how the ontology terms were created in the imicrobe db. I must not have used this file as the only source, because a check of all the records in the file listed at the beginning show that 451 samples out of the total 2813 have a discrepancy b/w what's in the db and what's in the file. I could easily make the db reflect just what is in the file, but I would like to get some confirmation that this is the right move. Maybe specifically Bonnie and Ramona could verify that I should do this?

kyclark commented 7 years ago

The first example is "CAM_SMPL_002668" which is this record:

https://www.imicrobe.us/sample/view/1929

And in that instance, all the db ontology ids match exactly with those in the file:

1658: CAM_SMPL_002668 (1929): {
  biome_id => "ENVO:01000030",
  biome_label => "marine hydrothermal vent biome",
  description => "",
  environmental_feature_id => "ENVO:01000122",
  environmental_feature_label => "marine hydrothermal vent",
  environmental_material_id => "ENVO:00002006",
  environmental_material_label => "water",
  habitat_name => "hydrothermal vent",
  region => "Eastern Pacific Ocean",
  sample_acc => "CAM_SMPL_002668",
  sample_description => "ALVINELLA - Alvinella Pompejana Epibionts",
  site_description => "Hydrothermal Vent",
}
DB = ENVO:00002006, ENVO:01000030, ENVO:01000122
File = ENVO:00002006, ENVO:01000030, ENVO:01000122

As for the ANTARCTICAAQUATIC_SMPL_SITE1 sample (https://www.imicrobe.us/sample/view/44), I don't know why the file says "Arizona" when it's clearly in the Antarctic. The fact that the "Region: Arizona" didn't make it into the imicrobe db (and hence I can't find it in the web display) is a weird bonus-bug of some sort? Like, it's a good thing I missed importing that? If I "grep Arizona CameraMetadata_ENVO_working_copy.csv" then I get 23 hits, one for "ALVINELLA_SMPL_20041130" and the rest for "ANTARCTICAAQUATIC_SMPL_SITE*." How very weird.

bhurwitz33 commented 7 years ago

Here is a basic overview:

Ontology development for CAMERA metadata. As part of the iMicrobe project, we developed a new ontology called: Microbial Environments described using OWL (MEOWL) ontology. The first step toward ontologizing the CAMERA data was to clean up and organize existing data. To do this, we mapped all CAMERA metadata labels to the Minimum Information for any (x) Genome (MIxS) vocabulary, which both standardized and reduced the number of terms. To go from a controlled vocabulary to an ontology, we categorized existing terms into a hierarchy based on classes such as environmental parameter, chemical parameter, location, and habitat. Where possible, classes were mapped to the existing BCO-DMO vocabulary to obtain textual definitions. The ME-OWL ontology is currently available at XXX and is used on the iMicrobe data site to streamline the metadata search interface (http://data.imicrobe.us/sample/search). Specifically, users can combine multiple search parameters (e.g., salinity greater/less than/between two values, Longhurst province including several regions, sample depth) to find samples, view them on a map, and download associated files. As such, datasets are discoverable and available for re-use.

Ramona – can you fill in where the github repo is for ME-OWL?

Bonnie

From: Ken Youens-Clark notifications@github.com Reply-To: hurwitzlab/imicrobe-lib reply@reply.github.com Date: Thursday, October 12, 2017 at 3:57 PM To: hurwitzlab/imicrobe-lib imicrobe-lib@noreply.github.com Cc: Bonnie Hurwitz bonnie.hurwitz@gmail.com, Mention mention@noreply.github.com Subject: Re: [hurwitzlab/imicrobe-lib] CameraMetadata_ENVO_working_copy and iMicrobe site (#1)

First off, I can honestly say that I have no recollection of how the ontology terms were created in the imicrobe db. I must not have used this file as the only source, because a check of all the records in the file listed at the beginning show that 451 samples out of the total 2813 have a discrepancy b/w what's in the db and what's in the file. I could easily make the db reflect just what is in the file, but I would like to get some confirmation that this is the right move. Maybe specifically Bonnie and Ramona could verify that I should do this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

ramonawalls commented 7 years ago

Hi all. The MEOWL repo is at https://github.com/hurwitzlab/meowl. I am looking into the discrepancies now. I have to give a talk on MEOWL next week, so I will be diving into it a bit this week.

ramonawalls commented 7 years ago

I found the problem. On commit 2e672bcec7c89dd150f55b22c3afd14749bc181f, column A stayed the same while the rest of the file was sorted on column B. I'm fixing it now. Thank you for noticing this, @cmungall !

bhurwitz33 commented 7 years ago

Thanks Ramona! -Bonnie

From: Ramona Walls notifications@github.com Reply-To: hurwitzlab/imicrobe-lib reply@reply.github.com Date: Wednesday, October 18, 2017 at 3:32 PM To: hurwitzlab/imicrobe-lib imicrobe-lib@noreply.github.com Cc: Bonnie Hurwitz bonnie.hurwitz@gmail.com, Mention mention@noreply.github.com Subject: Re: [hurwitzlab/imicrobe-lib] CameraMetadata_ENVO_working_copy and iMicrobe site (#1)

I found the problem. On commit 2e672bc, column A stayed the same while the rest of the file was sorted on column B. I'm fixing it now. Thank you for noticing this, @cmungall !

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

cmungall commented 6 years ago

Thanks @ramonawalls!

Have these changes propagated to the site?

For example https://www.imicrobe.us/#/samples/44

still says Australia

I note that the new site doesn't have an ontology tab anymore which is a shame

It looks like the files are richer. For example, this row:

          SAMPLE_ACC: ALVINELLA_SMPL_20041130
  SAMPLE_DESCRIPTION: ALVINELLA - Alvinella Pompejana Epibionts
         DESCRIPTION: 
    SITE_DESCRIPTION: Hydrothermal Vent
              REGION: Eastern Pacific Ocean
        HABITAT_NAME: hydrothermal vent
         biome_label: marine hydrothermal vent biome
            biome_id: ENVO:01000030
environmental_material_label: water
environmental_material_id: ENVO:00002006
environmental_feature_label: marine hydrothermal vent
environmental_feature_id: ENVO:01000122

corresponds to https://www.imicrobe.us/#/samples/44

which is missing most of the above, and says the biome is "Polar Biome (ENVO_01000339)". I think both the csv and the site are correct, it's both a polar and marine hydrothermal vent biome, but the discrepancy is still puzzling