Closed GregGuerin closed 3 years ago
I guess at a very basic level we could provide links that direct people to descriptions of the metadata? But I wonder if that would send people down a rabbit hole of readme files,websites, and pdfs. If we want to try to keep some of this "in R" so that people don't have to go looking on a website, could we add a new function which allows you to call for descriptions of the metadata. I am not sure how practical this is but something like [get_ausplots_metadata (data_table="veg.voucher")], and it could return a list that we have created of the column names and their descriptions for that table. Not sure if that is a good idea but it saves people having to follow a trail of links.
Can we rely on the R help system? The benefit of that is:
The drawback is if they don't know how to use the help/docs feature of R, then they'll be really stuffed. I guess we can assume a certain level of proficiency though.
Internal delivery as help pages eg would be ideal but at present we have nothing to populate that with as far as I am aware, so needs a job for someone to write a short description of each field in each table.
Just picking this up now that we are working on V1.2, could we ask Emrys and Christina to write the descriptions, and just convert it to a readme? Then link that to one of the more relevant R help pages?
Hi guys, I just a chatted with Emrys, and there are already "look-up" tables as spreadsheets that include all the various codes for different properties that could be converted to tables or readme files with little effort.
Sounds promising. Can you follow up and circulate the material or some examples?
Yes, will do when he is back from the field
@Sammunroe Would be nice to have this in v1.2 - do we know yet if documentation exists that we could present without a lot of drafting?
@tomsaleeba, I think you were going to make these lookup tables accessible? We could just create a new function, that just calls the lookup tables?
There's two things to talk about here:
First is the fact that the data frames contain codes as opposed to nice, pretty labels. For example, we would have FLO
when we could instead put Floodplain
. We have look up tables so I can replace all these acronym/codes with labels that are much friendlier to humans. The labels will still act like the codes in that they'll be consistent and unique, but they'll also be human readable. Here's an example of a lookup table:
id | landform_pattern
-----+--------------------------
FLO | Floodplain
HIL | Hills
KAR | Karst
LAC | Laclustrine plain
LAV | Lava plain
LON | Longitudinal dunefield
LOW | Low hills
MAD | Made land
MAR | Marine plain
MEA | Meander plain
...
I've started this work already (a long time ago) and that's not a problem to finish off, unless we don't want this change? I'm tracking those other changes in this issue: https://github.com/ternandsparrow/swarm-rest/issues/4.
The second thing is making more description for the values available. That same table above has more columns that I didn't include, such as a description:
id | landform_pattern | description |
---|---|---|
FLO | Floodplain | Alluvial plain characterised by frequently active erosion and aggradation by channelled or overbank stream flow. Unless otherwise specified; 'frequently active' is to mean that flow has an Average Recurrence Interval of 50 years or less. Included types of landform pattern are: bar plain; meander plain; covered plain; anastomotic plain. Related relict landform patterns are: stagnant alluvial plain; terrace; terraced land (partly relict). |
HIL | Hills | Landform pattern of high relief (90-300 m) with gently inclined to precipitous slopes. Fixed; shallow; erosional stream channels; closely to very widely spaced; form a non-directional or convergent; integrated tributary network. There is continuously active erosion by wash and creep and; in some cases; rarely active erosion by landslides. |
KAR | Karst | Landform pattern of unspecified relief and slope typically with fixed; deep; erosional stream channels forming a non-directional; disintegrated tributary pattern and many closed depressions without stream channels. It is eroded by continuously active solution and rarely active collapse; the products being removed through underground channels. |
I think this is more what this issue is about. In order to make this available, I could make a function that when called would retrieve all the description information that we have available. The user can then dump the dataframe to the screen and read it. I guess we'd actually need a bunch of dataframes, one for each lookup table. Does this sound ok?
I think having the description table is key to a smooth experience. People could be made to look elsewhere for this info, like our manual, but there will be others who want to stay in the R environment. So my vote would be to change the codes to labels, and make a function that retrieves the descriptions. Covers all our bases. Does it need to be a entirely new function? Could we make it an additional argument to call in get_ausplots? like description=T?
@Sammunroe @tomsaleeba
That looks like just what is needed and has been sorely missing (several people have asked me where this info is).
Codes versus labels We need to be a little careful if this is across many columns in the data tables.
For that reason I'd lean towards codes with a dictionary - but it would be easier to judge if we saw all the tables.
Presentation method
Should we look at some of the existing data frame metadata functionality in R to see whether anything fits? That way you'd attach the information to the data tables themselves
Agree, you could add to get_ausplots (I still like one gateway, unless it just gets too complex and unwieldy), and could be metadata counterpart tables to each data module table, as long as descriptions of codes for each variable can be pooled into one data frame per data module (i.e., add to your table above a column to identify the variable so for site.info metadata: VARIABLE | CODE | NAME | DESCRIPTION || bioregion_name | MDD | Murray Darling Depression | NA ... || state | SA | South Australia | NA ...
Agree internal access best (allowing user to pull out a specific item) but it would be nice to compile them all into a master pdf somewhere for reference too (even if just available on GitHub)
@tomsaleeba @Sammunroe
Example: https://cran.r-project.org/web/packages/dataMeta/vignettes/dataMeta_Vignette.html
Schema is a possible fit here - and I like that it automatically gives ranges for numeric variables as well as defining categorical ones.
Manual: https://cran.r-project.org/web/packages/dataMeta/dataMeta.pdf
Hello All, We have vocabularies all these things.look at linkeddata.tern.org.au and ausplots vocabularies.
regards Guru
From: Greg Guerin notifications@github.com Sent: Friday, August 28, 2020 12:29:32 PM To: ternaustralia/ausplotsR ausplotsR@noreply.github.com Cc: smguru smguru@gmail.com; Mention mention@noreply.github.com Subject: Re: [ternaustralia/ausplotsR] Variable dictionary (#12)
@tomsaleebahttps://github.com/tomsaleeba @Sammunroehttps://github.com/Sammunroe
Example: https://cran.r-project.org/web/packages/dataMeta/vignettes/dataMeta_Vignette.html
Schema is a possible fit here
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ternaustralia/ausplotsR/issues/12#issuecomment-682289598, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AASKNJBYKNKSKCLYBXIXF3TSC4JAZANCNFSM4HCG7PWQ.
It seems we could bypass dataMeta if the metadata already exists as tables and just use the functions 'attr' to assign metadata (data frames of variables codes and descriptions etc) which are then attached to a data object such as mydata$site.info and then retrieve the metadata as a user with 'attributes(mydata$site.info)', e.g. attributes(mydata$site.info)$dictionary
The benefit of this approach is there are less items in the returned data list and metadata sits right with the data but 'out of the way'
@smguru thanks for letting us know about the vocabulary repository. I don't think it fits what we're doing here as it's easier for us to pull the data direct from the source DB but it's great to know it's out there. It might come in handy.
I've added a new endpoint on the server to provide the metadata dictionary (in this commit). I've just pushed a commit to the v1.2_species_names
branch in this repository that adds a function to use that new endpoint and get the metadata dictionary. You can test it right now by doing:
devtools::install_github('ternaustralia/ausplotsR@v1.2_species_names')
options("ausplotsR_api_url" = "http://dev2.inat.techotom.com") # the data is only available on this test server right now
md = ausplotsR:::.get_metadata_dictionary() # this function is *not* exported so you need the triple colons to call it
head(md)
You should see something like:
variable code label description
1 basal_point E1 East 1 Distance from SW corner: 10 m north; 100 m east
2 basal_point E2 East 2 Distance from SW corner: 30 m north; 100 m east
3 basal_point E3 East 3 Distance from SW corner: 50 m north; 100 m east
4 basal_point E4 East 4 Distance from SW corner: 70 m north; 100 m east
5 basal_point E5 East 5 Distance from SW corner: 90 m north; 100 m east
6 basal_point N1 North 1 Distance from SW corner: 100 m north; 10 m west
Now, some points about this data:
lut_
. I'm fairly certain we have some variables in the metadata dictionary that don't appear in the ausplotsR dataframes (see below for list) and we may not have all the variables that do appear in the dataframes, I haven't yet checked.code
column as it's the only one that uses numbers. I don't think this will cause issues because R seems to consider a string version of a number as being identical to the number version, e.g. '1' == 1
is TRUE
. If you find yourself trying to join codes in the metadata dictionary to the data frames to look up values, and things aren't matching, it's probably this. If it happens, I think our only option is for the ausplotsR logic to cast strings that are only numbers back into a number.pedality_grade
has four columns and I chose to make grade
the label
value but I didn't want to lose the pedality
value so I joined it into the description. We have the choice to change this to be however it needs to be but I've made an executive decision for everything so we have a starting point. The variables I've done this for are: pedality_grade, texture_grade and pit_marker_mga_zones.outcrop_lithology
and other_outcrop_lithology
. This also affects observer_veg/observer_soil/described_by
and smallest_size_1/smallest_size_2
too. Right now, I've only included the former in the metadata dictionary but we have options here too:
outcrop_lithology
under the name other_outcrop_lithology
. It's a little redundant but it saves any confusion.lithology
and the logic in ausplotsR will match it up to where it needs to go.Variables that (possibly) aren't used in ausplotsR dataframes:
Try it out and let me know any changes that are required.
I've just pushed a new commit to the v1.2_species_names
branch that uses the data from the TERN LinkedData repo. @smguru I have to eat my words from my previous comment, we're using your service :heart_eyes: . This is how we're using it: a light transformation to make it easier to consume in R.
Follow the same instructions as my previous comment (after pulling the new commit) and you'll see what we've got, example:
> head(md)
variableCode variableLabel variableDefinition variableValueCode variableValueLabel
1 FIXME, maybe erosion_state Erosion State The particular condition(s) observed. S Stabilised
2 FIXME, maybe erosion_state Erosion State The particular condition(s) observed. P Partially stabilised
3 FIXME, maybe erosion_state Erosion State The particular condition(s) observed. Z Absent
4 FIXME, maybe erosion_state Erosion State The particular condition(s) observed. A Active
5 FIXME, maybe erosion_state Erosion State The particular condition(s) observed. n/a N/A
6 FIXME, maybe erosion_state Erosion State The particular condition(s) observed. NC Not Collected
variableValueDefinition
1 One or both of the following conditions apply: no evidence of sediment movement; sides and/or floors of erosion form are revegetated.
2 Evidence of some active erosion and some evidence of stabilisation.
3 <NA>
4 One or both of the following conditions apply: evidence of sediment movement; sides and/or floors of erosion form are relatively bare of vegetation.
5 Not applicable.
6 Not collected.
There's points I need to make:
basal_point
. They only have the pretty names like Basal Point
. Somehow we're going to have to figure out that mapping. Right now I take the pretty name, make it lowercase and replace spaces with an underscore in the hope that it'll match everything. I haven't checked though.Hello @Tom Saleeba tom.saleeba@adelaide.edu.au , Not sure what you are trying to do, But good to catch up to understand your needs. All the Ausplots terminologies have been created for the intent to reuse them.
regards Guru
On Fri, Sep 4, 2020 at 5:42 PM Tom Saleeba notifications@github.com wrote:
I've just pushed a new commit to the v1.2_species_names branch that uses the data from the TERN LinkedData repo. @smguru https://github.com/smguru I have to eat my words from my previous comment, we're using your service 😍 . This https://github.com/ternandsparrow/swarm-rest/blob/master/ausplots-metadata-dictionary-server/index.js#L55 is how we're using it: a light transformation to make it easier to consume in R.
Follow the same instructions as my previous comment (after pulling the new commit) and you'll see what we've got, example:
head(md)
variableCode variableLabel variableDefinition variableValueCode variableValueLabel
1 FIXME, maybe erosion_state Erosion State The particular condition(s) observed. S Stabilised
2 FIXME, maybe erosion_state Erosion State The particular condition(s) observed. P Partially stabilised
3 FIXME, maybe erosion_state Erosion State The particular condition(s) observed. Z Absent
4 FIXME, maybe erosion_state Erosion State The particular condition(s) observed. A Active
5 FIXME, maybe erosion_state Erosion State The particular condition(s) observed. n/a N/A
6 FIXME, maybe erosion_state Erosion State The particular condition(s) observed. NC Not Collected
variableValueDefinition
1 One or both of the following conditions apply: no evidence of sediment movement; sides and/or floors of erosion form are revegetated.
2 Evidence of some active erosion and some evidence of stabilisation.
3
4 One or both of the following conditions apply: evidence of sediment movement; sides and/or floors of erosion form are relatively bare of vegetation.
5 Not applicable.
6 Not collected.
There's points I need to make:
- as far as I can tell, the LinkedData repo doesn't have the names we use for the variables like basal_point. They only have the pretty names like Basal Point. Somehow we're going to have to figure out that mapping. Right now I take the pretty name, make it lowercase and replace spaces with an underscore in the hope that it'll match everything. I haven't checked though.
- it's Friday arvo and I'm going home just as this has deployed, so I haven't double checked anything. There may be issues with the data, so if you see anything, let me know.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ternaustralia/ausplotsR/issues/12#issuecomment-686974805, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASKNJBCEOJCD46UTNX56FDSECK7VANCNFSM4HCG7PWQ .
@smguru We're using the data from the LinkedData repo to provide more context in ausplotsR. Currently a user will look at the bioregion_name
column in their R dataframe and wonder what DAC
means. By including the data from LinkedData, the user will be able to see the column contains IBRA codes and what the full name for DAC
is. Basically providing a better user experience.
Hello Tom, DAC is just a code, each bioregion has a code and it is represented in a vocabulary.
regards Guru
From: Tom Saleeba notifications@github.com Sent: Monday, September 7, 2020 2:10:19 PM To: ternaustralia/ausplotsR ausplotsR@noreply.github.com Cc: smguru smguru@gmail.com; Mention mention@noreply.github.com Subject: Re: [ternaustralia/ausplotsR] Variable dictionary (#12)
@smguruhttps://github.com/smguru We're using the data from the LinkedData repo to provide more context in ausplotsR. Currently a user will look at the bioregion_name column in their R dataframe and wonder what DAC means. By including the data from LinkedData, the user will be able to see the column contains IBRA codes and what the full name for DAC is. Basically providing a better user experience.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ternaustralia/ausplotsR/issues/12#issuecomment-688019555, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AASKNJHAMJSX5SWWSO4E42TSERMKXANCNFSM4HCG7PWQ.
Hi @tomsaleeba, please if you can use the IBRA codes list from http://linked.data.gov.au/dataset/bioregion/IBRA7 instead of http://linked.data.gov.au/def/ausplots-cv/a9754a72-c2f7-4a9d-9686-9df78fb65e62. The latter was generated from the AusPlots Rangelands database while the former was generated from the authoritative source. 👍
Thanks @edmondchuc, I'll make that change. :heart_eyes:
I'll close - @tomsaleeba has created the dictionary and it can be made more complete over time
The output data tables (i.e. from _getauplsots call and relating to Ausplots modules like vouchers or soil properties) are described in the help files but the individual variables/columns are not defined anywhere (e.g. what they mean, their units etc). While some of this information is in the field manual and some of it is obvious or intuitive, ideally there would be a document (or link to one) that explains each data column/variable returned in the raw data from the package. The metadata that comes with an aekos download of TERN Ausplots can't be used as the data presentation is quite different.
This may need a wider discussion of how to handle this. Improving the metadata is pretty fundamental and have had a user request for this information.
@smguru @tomsaleeba @Sammunroe