Open cmungall opened 4 years ago
Where do we see the GH invites? I am not seeing any under my profile?
@StantonMartin the invite was for Bruce and Hannah, it looks like they have accepted the invite.
Can we make a start on this task? Is there anything unclear here. You can ping me on the aim1 slack if there are any questions.
Uploading IGBP JSON
UMD.json.TXT UMD Json
Note IGBP is a legacy standard: http://www.igbp.net/
Thanks! So I assume despite being a legacy standard it is still used by some of the data layers we will access via the API, so we'll want to map them.
Note the header of the UMD file says it is "IGBP_V6", not UMD. Is this expected?
I assume that when the API is functional there will be an unambiguous way to map to one of these tables?
The two are largely identical except code 15 has a different meaning in both, and IGBP has an extra code, 16.
I noted a discrepancy with https://lpdaac.usgs.gov/documents/101/MCD12_User_Guide_V6.pdf
In this version, there is no 0
code for IGBP
I'm inclined not to trust the codes and just map the labels
The discrepancy is due to different versions of the standard, the current identify tool was using version 005 which I cannot find documentation for. I found some old documentation for 5.1 but even it has a slight discrepancy with the legend that is surfaced through the current identify tool. My opinion is that we should adopt the latest version (v6) as in the user guide above as the standard, and surface the classifications from that standard. So the JSON files should map identically to the MCD12_User_Guide_V6.pdf tables regardless of what the current identify tool does. From the doc:
The product contains 13 Science Data Sets (SDS; Table 1), including 5 legacy classification schemes (IGBP, UMD, LAI, BGC, and PFT; Tables 3- 7) and a new three layer legend based on the Land Cover
So if we want to be complete we would do all 5 legacies using the version in the tables as well as the new three layer legend. I would expect that the three layer legend would be the "default" option that is surfaced if the classification argument is not passed as a parameter to the function call.
List of all potential data products from the MODIS satellite that the new identify tool could query is here:
Identify_layers.zip The table is organized by layer, code, and definition. All of which were pulled directly from the identify tool. I assumed that the only information pulled by the tool would be from the datasets that are already highlighted if you go through the Identify Tool to SDAT.
I feel it's important to point out that I had to pull the legends directly from the tool. For the majority of the layers I couldn't find any documentation about the classification systems at all.
Perfect, thanks! We'll do a first-pass automated alignment then let's talk about curating the mappings (should not be a large task)
On Fri, Jun 19, 2020 at 11:25 AM Blancohl notifications@github.com wrote:
Identify_layers.zip https://github.com/microbiomedata/nmdc-metadata/files/4806177/Identify_layers.zip The table is organized by layer, code, and definition. All of which were pulled directly from the identify tool. I assumed that the only information pulled by the tool would be from the datasets that are already highlighted if you go through the Identify Tool to SDAT.
I feel it's important to point out that I had to pull the legends directly from the tool. For the majority of the layers I couldn't find any documentation about the classification systems at all.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/microbiomedata/nmdc-metadata/issues/66#issuecomment-646805650, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOKFFJR3RV6DCJKA2X3RXOURLANCNFSM4MOP7XEQ .
Sounds good.
From: Chris Mungall notifications@github.com Sent: Friday, June 19, 2020 2:37 PM To: microbiomedata/nmdc-metadata nmdc-metadata@noreply.github.com Cc: Martin, Stanton martins@ornl.gov; Mention mention@noreply.github.com Subject: [EXTERNAL] Re: [microbiomedata/nmdc-metadata] Map individual ORNL/DAAC data layer vocabularies (#66)
Perfect, thanks! We'll do a first-pass automated alignment then let's talk about curating the mappings (should not be a large task)
On Fri, Jun 19, 2020 at 11:25 AM Blancohl notifications@github.com wrote:
Identify_layers.zip https://github.com/microbiomedata/nmdc-metadata/files/4806177/Identify_layers.zip The table is organized by layer, code, and definition. All of which were pulled directly from the identify tool. I assumed that the only information pulled by the tool would be from the datasets that are already highlighted if you go through the Identify Tool to SDAT.
I feel it's important to point out that I had to pull the legends directly from the tool. For the majority of the layers I couldn't find any documentation about the classification systems at all.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/microbiomedata/nmdc-metadata/issues/66#issuecomment-646805650, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOKFFJR3RV6DCJKA2X3RXOURLANCNFSM4MOP7XEQ .
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/microbiomedata/nmdc-metadata/issues/66#issuecomment-646809932, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AO4B7EL7WCFOP7PNL4GUTHTRXOV6NANCNFSM4MOP7XEQ.
Can you shine any light on this, is there something further that differentiates these:
Bailey Ecoregion Province,1,ice,
Bailey Ecoregion Province,2,ice,
On Fri, Jun 19, 2020 at 11:36 AM Chris Mungall cjmungall@lbl.gov wrote:
Perfect, thanks! We'll do a first-pass automated alignment then let's talk about curating the mappings (should not be a large task)
On Fri, Jun 19, 2020 at 11:25 AM Blancohl notifications@github.com wrote:
Identify_layers.zip https://github.com/microbiomedata/nmdc-metadata/files/4806177/Identify_layers.zip The table is organized by layer, code, and definition. All of which were pulled directly from the identify tool. I assumed that the only information pulled by the tool would be from the datasets that are already highlighted if you go through the Identify Tool to SDAT.
I feel it's important to point out that I had to pull the legends directly from the tool. For the majority of the layers I couldn't find any documentation about the classification systems at all.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/microbiomedata/nmdc-metadata/issues/66#issuecomment-646805650, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOKFFJR3RV6DCJKA2X3RXOURLANCNFSM4MOP7XEQ .
@cmungall I'm sorry, but no. There was no documentation I could easily pull about how the legends were put together or how the classifications were designated. My only thought is that most of the legends were numerical values associated with a color gradient; if the legend is supposed to be read primarily by color, then I think it could be that the two colors were both ice, but because they weren't identical they couldn't use the same number.
That's the best logical guess I can make with my rudimentary understanding of remote sensing and imaging. But otherwise, no. I don't have a solid explanation.
We will start with Zobler soil layers. This is "Global Soil Type" in the csv
@StantonMartin added metadata about the Zobler layers in #133 -- this is similar to what is in the Identify_layers.csv provided by @Blancohl but includes additional metadata about the layer itself:
Additionally, each class has a mnemonic associated with it, e.g.AF for ferric acrisol:
Any updates on this thread? I see the Zobler types were mapped to the ontology. What about the Land use characterization from Modis? Have these terms been mapped or is it still outstanding?
T
on aim1 call yesterday honing in on 1.2 deliverables we identified the individual data layer vocabs for Identify as higher priority than GCMD
For each vocab there are 3 phases:
For the extraction, any format is fine. I suggest either skos/rdf, or simply a TSV, e.g.
any metadata about the layer/vocab also welcome
all files should be checked into github, this repo. Ideally a makefile for orchestrating any wget/curl steps
For 2, we will use our mapping framework and produce mappings in SSSOM format, and deposited in this repo
For 3 the procedure will be to manually spot-check the SSSOM files in this repo and make requested changes via PR/ticket
Not sure which of @usethedata and @stantonmartin and @Blancohl will do tasks 1 and 3. Note I can't assign Bruce or Stan, they need to accept my GH invites first