ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

Bowerbird #139

Closed raymondben closed 6 years ago

raymondben commented 7 years ago

Summary

A package for maintaining a local collection of data sets from a range of data providers. Bowerbird can mirror an entire remote collection of files, using wget's recursive download functionality. Bowerbird also provides some functions around data provenance and versioning (it doesn't fundamentally solve these issues, but goes some way towards solutions).

Package: bowerbird
Type: Package
Title: Keep a Collection of Sparkly Data Resources
Version: 0.3.4
Authors@R: c(person("Ben","Raymond",email="ben.raymond@aad.gov.au",
       role=c("aut","cre")),
       person("Michael","Sumner",role="aut"))
Description: Tools to get and maintain a data repository from third-party data
    providers.
URL: https://github.com/AustralianAntarcticDivision/bowerbird
BugReports: https://github.com/AustralianAntarcticDivision/bowerbird/issues
License: MIT + file LICENSE
Imports:
    assertthat,
    dplyr,
    openssl,
    R.utils,
    rmarkdown,
    rvest,
    stringr,
    xml2
LazyLoad: yes
RoxygenNote: 6.0.1
Suggests:
    archive,
    knitr,
    testthat,
    covr
Remotes: jimhester/archive
VignetteBuilder: knitr

Nothing to our knowledge that really does the same thing. Some similarity to https://github.com/ropensci/rdataretriever, though rdataretriever seems to be angled towards biodiversity data sets in particular and creating sensible local database structures for them. Bowerbird is focused on mirroring remote data to a local file system, and providing some functions around data provenance. Passing overlap with http packages (httr, crul) but these are generally intended for single-transaction sort of usage. Jeroen's curl package (not an ropensci one?) is also similar to bowerbird in that it wraps a underlying http client: bowerbird typically uses wget under the hood to accomplish its web traffic, whereas curl binds to libcurl. AFAIK curl doesn't support mirroring of external sites (which wget does, and which bowerbird relies heavily on).

Requirements

Confirm each of the following by checking the box. This package:

Publication options

Detail

Additional notes re: our presubmission enquiry

General note: the package is not in a final polished state, but we think far enough advanced (and stable enough) to be a good point for onboarding consideration.

It would be helpful to actually separate out the core mechanism and additional sources. This could go as far as having separate packages (which we could handle together).

Fair suggestion, and one that we've considered - and maybe that split is still reasonable to consider down the track. But for now, at least, we think it's better to keep core-functionality and the data-source-definitions bundled together.

Have you considered using rappdirs for default data directories?

It's up to the user where they want to put their data. We do make a suggestion in the README and vignette for users to consider rappdirs.

maelle commented 6 years ago

Thanks a lot @lwasser! Yes styler looks super useful (I'm yet to actually use it!)

@raymondben I forgot to tell you to add the rOpenSci review badge to the README! 🙈

[![](https://badges.ropensci.org/139_status.svg)](https://github.com/ropensci/onboarding/issues/139)

MilesMcBain commented 6 years ago

Hi All, just making a note that I'm picking up this up again now. I should be able to respond fully within the next week. :+1:

maelle commented 6 years ago

Thanks @milesmcbain !

MilesMcBain commented 6 years ago

Thanks @raymondben and @mdsumner!

An excellent job on the documentation and vignettes. The main vignette is now a stand out - one of the best I have seen. It gives this package a great chance of getting some use about the place. :clap: :smile:

I am also happy with the way you addressed my other comments. Thankyou for the explanation on the use of cat(). I think the verbose option is in the spirit of the rule as you suggest, I also agree that within the context of try-catching, this is a reasonable option and I take no issue with it.

I've updated my review block above with the final :heavy_check_mark:

One minor comment: When reviewing some of the changes in the code I noticed a lot of old code sitting around in commented out blocks. This was true in maybe two thirds of bowerbird R files. This is definitely a personal thing, but I find these distracting when reviewing code and they also slightly decrease my trust in the code. "What was/is the bug here that's not fully resolved?" I suggest you remove as many of these as possible.

Otherwise, congratulations on polishing up this package and it has been my pleasure to review it.

raymondben commented 6 years ago

Thanks indeed @MilesMcBain. Re: commented-out code, yes, I'll take the blame for that, I do rather have a tendency to leave it littered around. I'll have a purge ...

maelle commented 6 years ago

Thanks a lot @MilesMcBain!

@raymondben please update this thread when you've done that, so that I might take a last look before approval.

Reg. purging comments you could count the number of lines you've suppressed by using cloc at different commits. 😉

raymondben commented 6 years ago

@maelle , I've already cleaned them out. (master branch)

maelle commented 6 years ago

Awesome, I'll have a look later today/this week!

maelle commented 6 years ago

I have started looking, really great docs as Miles say!

I was wondering whether it'd be good to split the README into a more minimal README and a few vignettes linked from the README? E.g. "Defining data sources" could be a vignette. Or add a table of contents at the top of the README? (but the website might still be easier to browse?)

maelle commented 6 years ago

Approved! 👏

Thanks a lot @raymondben @mdsumner @lwasser @MilesMcBain for your work! A very productive review process IMO!

I have three suggestions:

Now here is the list of things you have to do before I close this issue 😉

[![](https://badges.ropensci.org/139_status.svg)](https://github.com/ropensci/onboarding/issues/139)

[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)

Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. ((https://ropensci.org/blog/). If you are, @stefaniebutland will be in touch about content and timing.

stefaniebutland commented 6 years ago

@raymondben @mdsumner We'd love to publish a blog post about bowerbird. Given @MilesMcBain's comment about your vignette, it's bound to be good. Here are some editorial and technical guidelines: https://github.com/ropensci/roweb2#contributing-a-blog-post.

Was just looking back at a discussion of "the journey" from code for my own use, to code that I want others to find useful, where @noamross suggested a blog post using bowerbird as example. I was thinking that a post from a pkg submitter perspective (submit earlier vs later dilemma, docs, challenges of moving beyond personal cr*p code) would be very well received. However! That sounds more like two posts - one on bowerbird itself, and one about process.

Either way, this is optional and only if you have the capacity and interest to do this. Discussion reminded me that I need to think about how we can help authors and reviewers with this process

Let me know what you think. No rush.

maelle commented 6 years ago

👋 @raymondben @mdsumner could you please soon do the different items of the checklist above including transferring your repo? Thanks!

raymondben commented 6 years ago

@maelle - done. Sorry, was waiting for a revised footer to be finalized, but looks like that may take a while so I've gone ahead and transferred now.

maelle commented 6 years ago

Cool, thanks! Could you also add the review badge mentioned in the checklist?

I've activated the repo in Appveyor.

maelle commented 6 years ago

Note that the badges do not render now, but this issue will soon be fixed, so please add it to your README before I close this issue. :-)

raymondben commented 6 years ago

The review badge is in the readme, just not rendering. It was showing under our org, but not now.

maelle commented 6 years ago

Aaah thanks and sorry! Perfect! I'm closing the issue but this doesn't prevent the discussion of blog posts to continue. 😸