Closed raymondben closed 6 years ago
Thanks a lot @lwasser! Yes styler
looks super useful (I'm yet to actually use it!)
@raymondben I forgot to tell you to add the rOpenSci review badge to the README! 🙈
[![](https://badges.ropensci.org/139_status.svg)](https://github.com/ropensci/onboarding/issues/139)
Hi All, just making a note that I'm picking up this up again now. I should be able to respond fully within the next week. :+1:
Thanks @milesmcbain !
Thanks @raymondben and @mdsumner!
An excellent job on the documentation and vignettes. The main vignette is now a stand out - one of the best I have seen. It gives this package a great chance of getting some use about the place. :clap: :smile:
I am also happy with the way you addressed my other comments. Thankyou for the explanation on the use of cat()
. I think the verbose option is in the spirit of the rule as you suggest, I also agree that within the context of try-catching, this is a reasonable option and I take no issue with it.
I've updated my review block above with the final :heavy_check_mark:
One minor comment: When reviewing some of the changes in the code I noticed a lot of old code sitting around in commented out blocks. This was true in maybe two thirds of bowerbird
R files. This is definitely a personal thing, but I find these distracting when reviewing code and they also slightly decrease my trust in the code. "What was/is the bug here that's not fully resolved?" I suggest you remove as many of these as possible.
Otherwise, congratulations on polishing up this package and it has been my pleasure to review it.
Thanks indeed @MilesMcBain. Re: commented-out code, yes, I'll take the blame for that, I do rather have a tendency to leave it littered around. I'll have a purge ...
Thanks a lot @MilesMcBain!
@raymondben please update this thread when you've done that, so that I might take a last look before approval.
Reg. purging comments you could count the number of lines you've suppressed by using cloc
at different commits. 😉
@maelle , I've already cleaned them out. (master branch)
Awesome, I'll have a look later today/this week!
I have started looking, really great docs as Miles say!
I was wondering whether it'd be good to split the README into a more minimal README and a few vignettes linked from the README? E.g. "Defining data sources" could be a vignette. Or add a table of contents at the top of the README? (but the website might still be easier to browse?)
Approved! 👏
Thanks a lot @raymondben @mdsumner @lwasser @MilesMcBain for your work! A very productive review process IMO!
I have three suggestions:
improving readability/browsability of the README via splitting it/adding a table of contents (see previous comment)
you still have a few long lines flagged by goodpractice::gp
, consider making them shorter.
I don't think the origin of the name is in the docs? It'd be cool.
Now here is the list of things you have to do before I close this issue 😉
[] Transfer the repo to the rOpenSci organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.
[] Add this badge to your README
[![](https://badges.ropensci.org/139_status.svg)](https://github.com/ropensci/onboarding/issues/139)
[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)
Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. ((https://ropensci.org/blog/). If you are, @stefaniebutland will be in touch about content and timing.
@raymondben @mdsumner We'd love to publish a blog post about bowerbird
. Given @MilesMcBain's comment about your vignette, it's bound to be good. Here are some editorial and technical guidelines: https://github.com/ropensci/roweb2#contributing-a-blog-post.
Was just looking back at a discussion of "the journey" from code for my own use, to code that I want others to find useful, where @noamross suggested a blog post using bowerbird
as example. I was thinking that a post from a pkg submitter perspective (submit earlier vs later dilemma, docs, challenges of moving beyond personal cr*p code) would be very well received. However! That sounds more like two posts - one on bowerbird itself, and one about process.
Either way, this is optional and only if you have the capacity and interest to do this. Discussion reminded me that I need to think about how we can help authors and reviewers with this process
Let me know what you think. No rush.
👋 @raymondben @mdsumner could you please soon do the different items of the checklist above including transferring your repo? Thanks!
@maelle - done. Sorry, was waiting for a revised footer to be finalized, but looks like that may take a while so I've gone ahead and transferred now.
Cool, thanks! Could you also add the review badge mentioned in the checklist?
I've activated the repo in Appveyor.
Note that the badges do not render now, but this issue will soon be fixed, so please add it to your README before I close this issue. :-)
The review badge is in the readme, just not rendering. It was showing under our org, but not now.
Aaah thanks and sorry! Perfect! I'm closing the issue but this doesn't prevent the discussion of blog posts to continue. 😸
Summary
A package for maintaining a local collection of data sets from a range of data providers. Bowerbird can mirror an entire remote collection of files, using
wget
's recursive download functionality. Bowerbird also provides some functions around data provenance and versioning (it doesn't fundamentally solve these issues, but goes some way towards solutions).URL for the package (the development repository, not a stylized html page): https://github.com/AustralianAntarcticDivision/bowerbird
Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.): Data retrieval, but also venturing into reproducibility. Primarily it is intended as a mechanism for maintaining a local data collection (from remote data providers), but could also be used as a wrapper to allow others to reproduce your work (e.g. "you'll need these 100GB of files installed locally; here's the bowerbird script to do so"). Bowerbird also has a few functions to help with data provenance, see
vignette("data_provenance")
Who is the target audience?
Research scientists/technicians/data managers who want to maintain a local library of data files (either for their own use, or perhaps a single shared library on behalf of a number of local users, as we do). Researchers who want to share work that relies on local copies of data. Also potentially package developers who need some sort of data retrieval that isn't easily accomplished by existing tools (e.g. recursive download of a whole collection of data files from a satellite data provider).
Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
Nothing to our knowledge that really does the same thing. Some similarity to https://github.com/ropensci/rdataretriever, though rdataretriever seems to be angled towards biodiversity data sets in particular and creating sensible local database structures for them. Bowerbird is focused on mirroring remote data to a local file system, and providing some functions around data provenance. Passing overlap with http packages (httr, crul) but these are generally intended for single-transaction sort of usage. Jeroen's
curl
package (not an ropensci one?) is also similar to bowerbird in that it wraps a underlying http client: bowerbird typically useswget
under the hood to accomplish its web traffic, whereas curl binds to libcurl. AFAIK curl doesn't support mirroring of external sites (whichwget
does, and which bowerbird relies heavily on).Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
with a high-level description in the package root or ininst/
.Detail
[x] Does
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:[more or less] Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
cat()
for printing progress, despite the packaging guide suggestions to usemessage
instead. This is because (a) progress information doesn't really strike me as a "condition", whichmessage
is intended for, (b) allcat
-issues messages can be turned off by specifyingbb_sync(...,verbose=FALSE)
, (c) an anticipated common use for bowerbird is for unattended (cron-job) updates to a local data library, in which case it's likely the user will want tosink()
all output to a log file. Usingcat()
means that a simplesink()
will catch everything, including output fromwget
calls (if they are made). I think this becomes less reliable ifmessage
is used (you'd have tosink(...type="message")
but then I'm not sure it'd catchwget
output, but admittedly haven't tried this)If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
Additional notes re: our presubmission enquiry
General note: the package is not in a final polished state, but we think far enough advanced (and stable enough) to be a good point for onboarding consideration.
Fair suggestion, and one that we've considered - and maybe that split is still reasonable to consider down the track. But for now, at least, we think it's better to keep core-functionality and the data-source-definitions bundled together.
It's up to the user where they want to put their data. We do make a suggestion in the README and vignette for users to consider rappdirs.