ropensci / traits

R package for accessing species trait data from multiple databases
Other
41 stars 13 forks source link

Standardize data.frame's across data sources #38

Open sckott opened 9 years ago

sckott commented 9 years ago

Right now, we haven't thought about outputs of each function. I believe all are data.frames. I'll look at each and see what they currently output and see what is shared among them, and see what standard format we can use that will also make combining outputs easier

sckott commented 9 years ago

cc @hlapp @xu-hong

hlapp commented 9 years ago

@sckott any progress or conclusions on this yet? @xu-hong is about at that step (leaving the issues with XML namespaces aside). Suggestions as to additional decorations that should be added to the data.frame object returned by RNeXML::get_characters()?

sckott commented 9 years ago

@hlapp sorry, hadn't looked at this yet. Will do so today

sckott commented 9 years ago

Run down of what data objects functions currently output;

function output
betydb_trait data.frame
betydb_search data.frame
betydb_citation data.frame
betydb_site data.frame
betydb_specie data.frame
birdlife_habitat data.frame
birdlife_threats data.frame
coral_locations data.frame
coral_methodologies data.frame
coral_resources data.frame
coral_species data.frame
coral_taxa data.frame
coral_traits data.frame
eol_invasive_ data.frame
fe_native list
g_invasive data.frame
is_native data.frame
leda data.frame
ncbi_byid data.frame
ncbi_byname data.frame/named list of data.frames
ncbi_searcher data.frame/named list of data.frames
traitbank list
sckott commented 9 years ago

Common fields among functions that could be standardized:

The above aren't real columns in outputs yet, but the ones I think could be standard across most of the data sources in this package.

Part of what makes this hard is that we have a diverse set of data sources, from morphological trait data, to nativity status, to molecular data

I think a way forward could be to provide a suite of functions that do some set of transformations to the data to standardize column names/etc. to allow them to be easily combined across taxa and data sources - at least across the standard fields - and other fields could be included as additional columns at the end

hlapp commented 9 years ago

Common fields among functions that could be standardized:

You mean columns among data frames?

sckott commented 9 years ago

yes

sckott commented 9 years ago

Suggestions as to additional decorations that should be added to the data.frame

@hlapp I don't know what's available to add

hlapp commented 9 years ago

name - taxonomic name - combine any separate fields to make this one (not done yet), could be additional name columns

@sckott do you mean the classification, or family and genus for taxa that are species?

sckott commented 9 years ago

@hlapp I mean name could be ideally genus + epithet + any subspecific epithets OR previous + authority

I favor leaving authority off the name, and having in a separate column, if provided.

If there data record has lowest ID to family e.g,. then I don't know what best practice is. Perhaps we'd leave name blank