swcarpentry / DEPRECATED-bc

DEPRECATED: This repository is now frozen - please see individual lesson repositories.
Other
299 stars 382 forks source link

Added a lesson on making packages in R. #507

Closed jainsley closed 10 years ago

jainsley commented 10 years ago

I created a lesson for making packages in R for the 05/12/14-05/13/14 bootcamp in Toronto. I used the temperature conversion functions as examples for creating a package.

jdblischak commented 10 years ago

Thanks for this, @jainsley. This fulfills issue #365 for R.

A few bookkeeping details to start:

The YAML header is:

---
layout: lesson
root: ../..
---
jainsley commented 10 years ago

Done. As this is my first time submitting material for Software Carpentry, please let me know if there are any other details I need to take care of.

jdblischak commented 10 years ago

Good work, @jainsley. See below for my some suggestions:

On line 94, you demonstrate how to document a function using roxygen2. I think it would be worthwhile to add some extra description. I understand you don't want to go into too much detail since it is a novice bootcamp, but you could explain that roxygen2 reads comments that start with #' and that the various keywords starting with @ are used to translate the function description into a help file.

I don't understand line 128.

Notice there is now a tempConvert environment that is the parent environment to the global environment.

Are you trying to say the tempConvert environment is now part of the global environment? The way it reads it sounds like you are saying that global environment is nested within the tempConvert environment. What am I misunderstanding?

After you install tempConvert, you should look at the help file for the functions. Writing documentation is always a burden, so you should show them how much it will pay off in the future when a simple ?fahr_to_celsius will give them a nicely formatted help file.

I also sent a PR to your fork with small edits.

barryrowlingson commented 10 years ago

I wouldn't use install at this point. Better to use devtools' load_all function. This loads the package directly from the source without doing an installation to a library directory. The cycle is Edit R Code - load_all - Test Your R Code.

In python terms, load_all is like import, and install is like python setup.py install.

You might also want to mention one of the package skeleton generators, like create in devtools or the pkgkitten package which you can find on github.

stephenturner commented 10 years ago

Thanks for starting this @jainsley. I was thinking of teaching a short workshop here on building R packages, and I'll definitely use this as a starting point and add back to it when I get around to it.

A few suggestions:

  1. Line 76-77: why use the roxygen2 version on GH instead of using roxygen2 on cran?
  2. Introduction: a nice bit of motivation would be to show how easy it is to push a package to github and use devtools::install_github() to install a package.
  3. This is difficult to convey in text, but in a live workshop after introducing how to build/check on the command line I would demonstrate how easy package creation, roxygenization, building, and checking can be using RStudio.
  4. I plan on eventually expanding this to cover creating vignettes using knitr. @yihui just sent out an email to package authors warning us that the newest version of knitr could potentially break something in packages that depend/suggest knitr, so I want to make sure what I'm doing now still works before continuing.

If you wanted to add further resources, I found these to be helpful:

yihui commented 10 years ago

@stephenturner I think the probability of knitr 1.6 breaking its reverse dependencies should be less than 5% :) I sent out the warning just to let other package authors know that I might need their help if CRAN yells at me for anything that I did not notice after several hours of testing.

stephenturner commented 10 years ago

Thanks @yihui - and I'll confirm, I'm happily in the 95%, and thank you for the email. I was only using knitr to build my vignette. I hope to contribute to this lesson shortly.

gvwilson commented 10 years ago

Is this ready to merge?

jdblischak commented 10 years ago

No, it is not ready to merge.

@jainsley, when do you think you will have time to respond to the comments/suggestions from me, @barryrowlingson, and @stephenturner? Also, I sent a small PR to your fork that you need to decide on.

We're excited to merge this lesson, so thanks for all your hard work on this new material!

jainsley commented 10 years ago

I apologize. Between submitting the final draft of a paper for publication and preparing for a move, I haven't had much time. I should be able to get to it later this week.

davclark commented 10 years ago

For reference, I touched base with @karthik - and while he's busy, he has some materials here we can source from:

https://github.com/karthik/dlab-advanced-r

He also mentioned that Hadley Wickam apparrently has some PDFs on building R packages.

karthik commented 10 years ago

Thanks @davclark

A few more thoughts on this PR. I gave the Rmd a quick read and it looks good. Here are a few challenges I see. First off, it would be impossible to teach this at an introductory/intermediate R bootcamp (there is too much going on with too little time and we can't cram more in there). The current iteration of SWC does not have an advanced part-2 bootcamp, which makes it difficult to build upon a foundation. If this is meant as a self-learning exercise, there are better guides out there and those people (who are comfortable enough to teach themselves package development) wouldn't need one buried deep inside the BC folder.

For self learners, here is a detailed and well written guide on R package development (from the fine folks at Simply Statistics, who also teach the Coursera Data Science courses)

https://github.com/jtleek/rpackages/blob/master/README.md

Hadley Wickham's guides are also really nice (not Rmd, just PDFs)

e.g. Package Development by Hadley: http://courses.had.co.nz/11-csiro/

Yes this version a bit dated but I've seen Hadley teach it before and we can certainly ask him to share more current material. It's well written and covers all the major issues surrounding package development.

If you are going to teach package development you should also teach testing in the same context because they go hand in hand especially in R. I wrote a short guide here but I'm sure there are others that are better. https://github.com/karthik/dlab-advanced-r/blob/master/02-testing/README.md#software-testing

PS: The link above (from Dav) is not very helpful for swc. I taught package development (and how to retrieve data from an API) over 3 hours to people who knew R programming. It was all hands on (which is why that repo is mostly just headings for package dev). It was still a challenge and less than half the class successfully built a working package.

jainsley commented 10 years ago

Thanks for the comments everyone. I tried to integrate as many of the suggestions as I could while still keeping the lesson simple.

@jdblischak, when I talked about how the environments change during the bootcamp, I used a drop down menu in RStudio's environment tab that shows how the relationship between environments changes when you load a new package. This is hard to convey in text, so I added a call to search() to make it a bit easier to explain.

@barryrowlingson, I used install() instead of load_all() to avoid potential issues with looking at the documentation for the function using ?.

@stephenturner, I installed roxygen2 from github simply to show them it was possible. It could easily be altered if people think it's a bit too complicated.

@karthik, it was a bit of a rush trying to get everything done in one day. I made this lesson just because the flyer for the bootcamp said we would cover this. I tried to keep it really simple anticipating time would be short. The results were mixed as this was the lesson that received the most praise and the most complaints from the students. For the more advanced students, this was the only R lesson where they learned something new. However, we had problems getting devtools installed for some students who were running older versions of R. By the time we figured out the issues, it was the end of the day and not everyone caught up.

karthik commented 10 years ago

Awesome, thanks for the summary @jainsley! Thanks for addressing the other topic I forgot to mention. As you correctly noted, most folks come with older windows laptops without any development environment. I've usually asked people to run devtools::has_devel() and let me know several days in advance if they get a FALSE. But as you know, people don't really take care of installation issues until the day of.

jdblischak commented 10 years ago

+1 to merge

karthik commented 10 years ago

I'll second it.

jdblischak commented 10 years ago

Last call for votes and/or comments.

gavinsimpson commented 10 years ago

I know this has been merged and closed, but I've only just come across it otherwise I would have commented at the time.

Whilst I have absolutely nothing against using devtools, I think it is important to convey the (perhaps old fashioned) R Core-approved means of building and checking your package. I know Hadley will send you a personal apology if using devtools brings upon you the wrath of CRAN, but even so it is good to know the "correct" way as well as the easy, modern way. Same with teaching git in a shell even though RStudio will now do a lot of the git things a UseR will need to do.

Objections to adding a small section with R CMD build path/to/pkg, R CMD check --as-cran path/to/tarball, and R CMD INSTALL path/to/tarball usage? Should I start this as a new issue rather than follow up here?

jdblischak commented 10 years ago

Interesting point, @gavinsimpson. My worry though is that this is not necessary for novices. This lesson on creating packages is already being jammed in with lots of other information. It is meant to be just a very quick introduction to how you could organize your code. I don't think the intention of the lesson is to get the learner anywhere close to creating a package that is ready to submit to CRAN. It also purposely omits, for example, how to add other packages as dependencies or creating unit tests.

In other words, I think the scope of the lesson is this: "If you write a few custom functions for your analysis, it is not too much extra work to convert them into a package that has nice documentation."