Bowerbird - Githubissues

raymondben commented 7 years ago

Summary

What does this package do? (explain in 50 words or less):

A package for maintaining a local collection of data sets from a range of data providers. Bowerbird can mirror an entire remote collection of files, using wget's recursive download functionality. Bowerbird also provides some functions around data provenance and versioning (it doesn't fundamentally solve these issues, but goes some way towards solutions).

Paste the full DESCRIPTION file inside a code block below:

Package: bowerbird
Type: Package
Title: Keep a Collection of Sparkly Data Resources
Version: 0.3.4
Authors@R: c(person("Ben","Raymond",email="ben.raymond@aad.gov.au",
       role=c("aut","cre")),
       person("Michael","Sumner",role="aut"))
Description: Tools to get and maintain a data repository from third-party data
    providers.
URL: https://github.com/AustralianAntarcticDivision/bowerbird
BugReports: https://github.com/AustralianAntarcticDivision/bowerbird/issues
License: MIT + file LICENSE
Imports:
    assertthat,
    dplyr,
    openssl,
    R.utils,
    rmarkdown,
    rvest,
    stringr,
    xml2
LazyLoad: yes
RoxygenNote: 6.0.1
Suggests:
    archive,
    knitr,
    testthat,
    covr
Remotes: jimhester/archive
VignetteBuilder: knitr

URL for the package (the development repository, not a stylized html page): https://github.com/AustralianAntarcticDivision/bowerbird
Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.): Data retrieval, but also venturing into reproducibility. Primarily it is intended as a mechanism for maintaining a local data collection (from remote data providers), but could also be used as a wrapper to allow others to reproduce your work (e.g. "you'll need these 100GB of files installed locally; here's the bowerbird script to do so"). Bowerbird also has a few functions to help with data provenance, see vignette("data_provenance")
Who is the target audience?
Research scientists/technicians/data managers who want to maintain a local library of data files (either for their own use, or perhaps a single shared library on behalf of a number of local users, as we do). Researchers who want to share work that relies on local copies of data. Also potentially package developers who need some sort of data retrieval that isn't easily accomplished by existing tools (e.g. recursive download of a whole collection of data files from a satellite data provider).
Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

Nothing to our knowledge that really does the same thing. Some similarity to https://github.com/ropensci/rdataretriever, though rdataretriever seems to be angled towards biodiversity data sets in particular and creating sensible local database structures for them. Bowerbird is focused on mirroring remote data to a local file system, and providing some functions around data provenance. Passing overlap with http packages (httr, crul) but these are generally intended for single-transaction sort of usage. Jeroen's curl package (not an ropensci one?) is also similar to bowerbird in that it wraps a underlying http client: bowerbird typically uses wget under the hood to accomplish its web traffic, whereas curl binds to libcurl. AFAIK curl doesn't support mirroring of external sites (which wget does, and which bowerbird relies heavily on).

Requirements

Confirm each of the following by checking the box. This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] has a CRAN and OSI accepted license.
[x] contains a README with instructions for installing the development version.
[x] includes documentation with examples for all functions. More needed, but the important ones are covered for now. The more-sparsely-documented functions at this stage are the ones that the average user is unlikely to need to interact with directly.
[x] contains a vignette with examples of its essential functions and uses.
[x] has a test suite.
[x] has continuous integration, including reporting of test coverage, using services such as Travis CI, Coeveralls and/or CodeCov. More test coverage still to be added - codecov is currently reporting around 80%, goodpractice around 70%, not sure which is right!
[x] I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

[x] Do you intend for this package to go on CRAN?
[no] Do you wish to automatically submit to the Journal of Open Source Software? If so:
- [ ] The package contains a paper.md with a high-level description in the package root or in inst/.
- [ ] The package is deposited in a long-term repository with the DOI:
- (Do not submit your package separately to JOSS)

Detail

[x] Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:
[more or less] Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
- no NEWS file yet
- some function docs need expanding
- we use cat() for printing progress, despite the packaging guide suggestions to use message instead. This is because (a) progress information doesn't really strike me as a "condition", which message is intended for, (b) all cat-issues messages can be turned off by specifying bb_sync(...,verbose=FALSE), (c) an anticipated common use for bowerbird is for unattended (cron-job) updates to a local data library, in which case it's likely the user will want to sink() all output to a log file. Using cat() means that a simple sink() will catch everything, including output from wget calls (if they are made). I think this becomes less reliable if message is used (you'd have to sink(...type="message") but then I'm not sure it'd catch wget output, but admittedly haven't tried this)
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

Additional notes re: our presubmission enquiry

General note: the package is not in a final polished state, but we think far enough advanced (and stable enough) to be a good point for onboarding consideration.

It would be helpful to actually separate out the core mechanism and additional sources. This could go as far as having separate packages (which we could handle together).

Fair suggestion, and one that we've considered - and maybe that split is still reasonable to consider down the track. But for now, at least, we think it's better to keep core-functionality and the data-source-definitions bundled together.

Have you considered using rappdirs for default data directories?

It's up to the user where they want to put their data. We do make a suggestion in the README and vignette for users to consider rappdirs.

maelle commented 6 years ago

Thanks a lot @lwasser! Yes styler looks super useful (I'm yet to actually use it!)

@raymondben I forgot to tell you to add the rOpenSci review badge to the README! 🙈

[![](https://badges.ropensci.org/139_status.svg)](https://github.com/ropensci/onboarding/issues/139)

MilesMcBain commented 6 years ago

Hi All, just making a note that I'm picking up this up again now. I should be able to respond fully within the next week. :+1:

maelle commented 6 years ago

Thanks @milesmcbain !

MilesMcBain commented 6 years ago

Thanks @raymondben and @mdsumner!

An excellent job on the documentation and vignettes. The main vignette is now a stand out - one of the best I have seen. It gives this package a great chance of getting some use about the place. :clap: :smile:

I am also happy with the way you addressed my other comments. Thankyou for the explanation on the use of cat(). I think the verbose option is in the spirit of the rule as you suggest, I also agree that within the context of try-catching, this is a reasonable option and I take no issue with it.

I've updated my review block above with the final :heavy_check_mark:

One minor comment: When reviewing some of the changes in the code I noticed a lot of old code sitting around in commented out blocks. This was true in maybe two thirds of bowerbird R files. This is definitely a personal thing, but I find these distracting when reviewing code and they also slightly decrease my trust in the code. "What was/is the bug here that's not fully resolved?" I suggest you remove as many of these as possible.

Otherwise, congratulations on polishing up this package and it has been my pleasure to review it.

raymondben commented 6 years ago

Thanks indeed @MilesMcBain. Re: commented-out code, yes, I'll take the blame for that, I do rather have a tendency to leave it littered around. I'll have a purge ...

maelle commented 6 years ago

Thanks a lot @MilesMcBain!

@raymondben please update this thread when you've done that, so that I might take a last look before approval.

Reg. purging comments you could count the number of lines you've suppressed by using cloc at different commits. 😉

raymondben commented 6 years ago

@maelle , I've already cleaned them out. (master branch)

maelle commented 6 years ago

Awesome, I'll have a look later today/this week!

maelle commented 6 years ago

I have started looking, really great docs as Miles say!

I was wondering whether it'd be good to split the README into a more minimal README and a few vignettes linked from the README? E.g. "Defining data sources" could be a vignette. Or add a table of contents at the top of the README? (but the website might still be easier to browse?)

maelle commented 6 years ago

Approved! 👏

Thanks a lot @raymondben @mdsumner @lwasser @MilesMcBain for your work! A very productive review process IMO!

I have three suggestions:

improving readability/browsability of the README via splitting it/adding a table of contents (see previous comment)
you still have a few long lines flagged by goodpractice::gp, consider making them shorter.
I don't think the origin of the name is in the docs? It'd be cool.

Now here is the list of things you have to do before I close this issue 😉

[] Transfer the repo to the rOpenSci organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.
[] Add this badge to your README

[![](https://badges.ropensci.org/139_status.svg)](https://github.com/ropensci/onboarding/issues/139)

[] Add the rOpenSci footer to the bottom of your README

[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)

[] Fix any links in badges for CI and coverage to point to the ropensci URL. (We'll turn on the services on our end as needed)

Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. ((https://ropensci.org/blog/). If you are, @stefaniebutland will be in touch about content and timing.

stefaniebutland commented 6 years ago

@raymondben @mdsumner We'd love to publish a blog post about bowerbird. Given @MilesMcBain's comment about your vignette, it's bound to be good. Here are some editorial and technical guidelines: https://github.com/ropensci/roweb2#contributing-a-blog-post.

Was just looking back at a discussion of "the journey" from code for my own use, to code that I want others to find useful, where @noamross suggested a blog post using bowerbird as example. I was thinking that a post from a pkg submitter perspective (submit earlier vs later dilemma, docs, challenges of moving beyond personal cr*p code) would be very well received. However! That sounds more like two posts - one on bowerbird itself, and one about process.

Either way, this is optional and only if you have the capacity and interest to do this. Discussion reminded me that I need to think about how we can help authors and reviewers with this process

Let me know what you think. No rush.

maelle commented 6 years ago

👋 @raymondben @mdsumner could you please soon do the different items of the checklist above including transferring your repo? Thanks!

raymondben commented 6 years ago

@maelle - done. Sorry, was waiting for a revised footer to be finalized, but looks like that may take a while so I've gone ahead and transferred now.

maelle commented 6 years ago

Cool, thanks! Could you also add the review badge mentioned in the checklist?

I've activated the repo in Appveyor.

maelle commented 6 years ago

Note that the badges do not render now, but this issue will soon be fixed, so please add it to your README before I close this issue. :-)

raymondben commented 6 years ago

The review badge is in the readme, just not rendering. It was showing under our org, but not now.

maelle commented 6 years ago

Aaah thanks and sorry! Perfect! I'm closing the issue but this doesn't prevent the discussion of blog posts to continue. 😸

ropensci / software-review

Bowerbird #139

Summary

Requirements

Publication options

Detail

Additional notes re: our presubmission enquiry