ropensci / neotoma

Programmatic R interface to the Neotoma Paleoecological Database.
https://docs.ropensci.org/neotoma
Other
30 stars 16 forks source link

Add extractor functions to neotoma #141

Closed gavinsimpson closed 7 years ago

gavinsimpson commented 10 years ago

The ongoing writing of the neotoma paper has demonstrated to me that the package would benefit from some extractor functions to get parts of the various objects returned by the main functions.

Examples

Extracting counts from objects

Users need to use constructs like

western.comp[[1]]$counts

to access the counts for the first data set in western.comp.

It would be a nicer user experience if they could just do

counts(western.comp[[1]])
counts(western.comp)

The former would return an object of class c("neo_counts", "data.frame"), essentially a data frame containing the counts data for that site/ The latter would return an object of class c("neo_counts_list", "list"), which would be a list of neo_counts objects, one per site/data set in western.comp.

Extracting sample Ages

This requires

western.comp[[1]]$sample.meta$Age

How about

ages(western.comp[[1]]) ## or Ages()
ages(western.comp)

instead?

This also raises the issue of extracting the sample meta data. That should have an extractor too.

Thoughts?

List of extractor functions to implement

  1. [x] counts() to extract count data from objects. Methods for single objects and lists of objects. Track this in Issue #146.
  2. [x] ages() (or Ages()) to extract sample age data from objects. Methods for single objects and lists of objects.
  3. [x] metadata() to extract forms of metadata from objects. Methods for single objects and lists of objects. Perhaps has argument type allowing one to choose which of the sets of metadata is returned.

If you agree that these would be useful and have suggestions for other extractors we should add, list them and I'll add them to the list so we can track progress here for the general issue and have separate issues to track details of each extractor function in more detail.

karthik commented 10 years ago

Sounds great to me. I can also see additional clean up that could happen. get_datasets can also name the return objects with the sites which it currently doesn't do. I had to explicitly add those names back so we could reference the list with names instead of subscripts.

gavinsimpson commented 10 years ago

@karthik I discussed that issue with @SimonGoring and there is some discussion with @andydawson as well in #139 so I also agree with you on this one.

After our chat I left @SimonGoring with lots of things we could do to improve or tweak things in neotoma. Obviously he shouldn't shoulder all of this so Simon was going to think about things a little and then set up issues here and delegate the work to interested parties.

karthik commented 10 years ago

Obviously he shouldn't shoulder all of this so Simon was going to think about things a little and then set up issues here and delegate the work to interested parties.

This is pretty much the exact same conversation we had yesterday too. I also suggested delegating tasks that we could all quickly finish up without burdening him too much. Please assign stuff to me and I'll get those done promptly.

SimonGoring commented 10 years ago

I agree that these are really useful, but I'd like to proceed a bit slowly on this because I think the package is close enough for release at this point that I don't want to add new things before we upload to CRAN.

My thought is:

  1. Clean up the existing code so it's all standardized
  2. Fix all the notes & warnings in the Checks (which I think @karthik said he'd do, I'll ping him when the previous step is done
  3. Upload to CRAN
  4. Submit paper with changes based on 1 and co-author notes
  5. Work on new methods, including the above (which again is a great thing).

That said, @gavinsimpson if you or @andydawson or @karthik or anyone wants to add these I'd be happy to see them.

I thought that it might be helpful to call all the direct API calls get_*() and all the extraction calls pull_*(). That helps differentiate them, and helps make them unique to neotoma so there's no accidental conflicts in the environment if we're loading other packages.

gavinsimpson commented 10 years ago

@SimonGoring My concern with waiting to add trivial things like these extractors is that we'd have a paper out there with example usage that will become almost instantly out of date.

Re: naming, adding to the function name just to try to insulate your functions from clashes is the wrong solution. You force everyone to type multiple extra characters from now on for ever when really those clashes are going to be rare. R also has a mechanism for resolving such clashes: the :: operator. so it would be better to inflict neotoma::counts() on the few people that will run into problems because they are using the package with another one that also supplies a counts() function.

I've complained (read whinged) to several ropensci contributors about this creep in naming conventions just to avoid naming clashes. The solution is ::, not longer names.

SimonGoring commented 10 years ago

Alright. We end the whinging here. I just didn't know any better.

Let me push my changes now so that the naming conventions are standardized and then we can add the extractor functions.

On Thu, Sep 25, 2014 at 3:06 PM, Gavin Simpson notifications@github.com wrote:

@SimonGoring https://github.com/SimonGoring My concern with waiting to add trivial things like these extractors is that we'd have a paper out there with example usage that will become almost instantly out of date.

Re: naming, adding to the function name just to try to insulate your functions from clashes is the wrong solution. You force everyone to type multiple extra characters from now on for ever when really those clashes are going to be rare. R also has a mechanism for resolving such clashes: the :: operator. so it would be better to inflict neotoma::counts() on the few people that will run into problems because they are using the package with another one that also supplies a counts() function.

I've complained (read whinged) to several ropensci contributors about this creep in naming conventions just to avoid naming clashes. The solution is ::, not longer names.

— Reply to this email directly or view it on GitHub https://github.com/ropensci/neotoma/issues/141#issuecomment-56875647.

SimonGoring commented 7 years ago

I'm going to close this issue. The metadata() extractor exists as get_site() or get_dataset().