Open sckott opened 9 years ago
cc @hlapp @xu-hong
@sckott any progress or conclusions on this yet? @xu-hong is about at that step (leaving the issues with XML namespaces aside). Suggestions as to additional decorations that should be added to the data.frame
object returned by RNeXML::get_characters()
?
@hlapp sorry, hadn't looked at this yet. Will do so today
Run down of what data objects functions currently output;
function | output |
---|---|
betydb_trait |
data.frame |
betydb_search |
data.frame |
betydb_citation |
data.frame |
betydb_site |
data.frame |
betydb_specie |
data.frame |
birdlife_habitat |
data.frame |
birdlife_threats |
data.frame |
coral_locations |
data.frame |
coral_methodologies |
data.frame |
coral_resources |
data.frame |
coral_species |
data.frame |
coral_taxa |
data.frame |
coral_traits |
data.frame |
eol_invasive_ |
data.frame |
fe_native |
list |
g_invasive |
data.frame |
is_native |
data.frame |
leda |
data.frame |
ncbi_byid |
data.frame |
ncbi_byname |
data.frame/named list of data.frames |
ncbi_searcher |
data.frame/named list of data.frames |
traitbank |
list |
bety_*()
functions that return lists I think could be easily made to return data.frame's, I'll check. DONECommon fields among functions that could be standardized:
id
- identifier for the record/taxon/etc. - the meaning of this id could vary between providers, howeverdate
- date collected/updated, can be more than one of theselatitude
- latitude, if spatially explicit record, and avail.longitude
- longitude, if spatially explicit record, and avail.name
- taxonomic name - combine any separate fields to make this one (not done yet), could be additional name columnsThe above aren't real columns in outputs yet, but the ones I think could be standard across most of the data sources in this package.
Part of what makes this hard is that we have a diverse set of data sources, from morphological trait data, to nativity status, to molecular data
I think a way forward could be to provide a suite of functions that do some set of transformations to the data to standardize column names/etc. to allow them to be easily combined across taxa and data sources - at least across the standard fields - and other fields could be included as additional columns at the end
Common fields among functions that could be standardized:
You mean columns among data frames?
yes
Suggestions as to additional decorations that should be added to the data.frame
@hlapp I don't know what's available to add
name - taxonomic name - combine any separate fields to make this one (not done yet), could be additional name columns
@sckott do you mean the classification, or family and genus for taxa that are species?
@hlapp I mean name
could be ideally genus + epithet + any subspecific epithets OR previous + authority
I favor leaving authority off the name, and having in a separate column, if provided.
If there data record has lowest ID to family e.g,. then I don't know what best practice is. Perhaps we'd leave name
blank
Right now, we haven't thought about outputs of each function. I believe all are data.frames. I'll look at each and see what they currently output and see what is shared among them, and see what standard format we can use that will also make combining outputs easier