lmguzman / CoffeeCoop

the coffee coop at UBC biodiv building
3 stars 1 forks source link

maker review #3

Open richfitz opened 9 years ago

richfitz commented 9 years ago

Seems a good a place as any to put this.

First up: it almost worked first time. This is the second independent maker project I've tried running (Will Cornwell made one a couple of weeks ago) and both ran first time on my foreign environment so that's nice. Everything else is gravy.

Issues I ran into:

maker::install_missing_packages()
maker::make()

and missing packages will get installed and everything runs. Adding missing_only=FALSE to this will install all packages that the project depends on (there's also a new option maker.install.missing.packages that will automatically try to install missing packages on startup, but I don't know how wise that is so it's not set by default).

  PaymentSheet/payment.pdf:
    depends:
      - PaymentSheet/longtablepreamble.tex
      - SignupSheet/signuptable.tex
    command: render("PaymentSheet/payment.Rmd")
    quiet: true
  consumption:
    command:
      - read.csv(file="coffee_database/consumption.csv",stringsAsFactors=FALSE)
      - fix_date(.)

(or with the other suggestion)

  consumption:
    command:
      - read_csv("coffee_database/consumption.csv")
      - fix_date(.)

these build extra rules so will be identical to what you have but with less typing. They're the most fragile feature though, so you might not want to go there. And they avoid the whole thing1, thing2 problem. I still haven't caught the big chaining buzz, but you're obviously a big fan.

maker:::utility_gitignore(maker::maker())

(but I found a bug in this today so make sure you have the most recent version)

  info:
    command: read_csv(file="coffee_database/info.csv")

you might want to write:

  info:
    command: read_csv(file="coffee_database/{{target_name}}.csv")

or something (I presume that is where your twitter comments came from). I'll think a bit about this some more. My main hesitation is that it breaks the link from plain R commands quite a lot. I wish R had a natural string interpolation syntax.

ls() # nothing
m <- maker::maker(envir=.GlobalEnv)
ls() # all your things!
info2 <- 1:10 # won't work: your things are read-only
head(info2) # maker will build it if it's not up to date

travis.yml:

language: c

env:
  global:
    - CRAN: http://cran.rstudio.com
    - BOOTSTRAP_PANDOC: ""

script:
    - ./maker

before_install:
    - curl -OL http://raw.github.com/craigcitro/r-travis/master/scripts/travis-tool.sh
    - chmod 755 ./travis-tool.sh
    - ./travis-tool.sh bootstrap
    - ./travis-tool.sh install_r optparse
    - ./travis-tool.sh install_github richfitz/maker
    - Rscript -e 'maker::install_maker(".")'

install:
    - Rscript -e 'options(repos="http://cran.rstudio.com"); maker::install_missing_packages()'

(edit to correct the function name above sorry :crying_cat_face:)

aammd commented 9 years ago

Hey Rich! thanks so much for this! Here are some responses:

I should probably set up some ability to specify required version numbers

  • I'd even go so far as to suggest you make it required. Perhaps something like the R package DESCRIPTION file, where version numbers are specified as being >= a particular version?

you forgot to add SignupSheet/signuptable.tex as a dependency of PaymentSheet/payment.pdf

  • corrected. How did you know? Did you find that out from reading my codez, or do you have a means of finding missing dependencies?

you might like to use quiet: true as an option to targets to make them be quiet.

  • That's certainly convenient! What happens if it throws an error?

I can add rmarkdown::render support in the same way as knitr works if that's how the cool kids are doing things these days.

  • Indeed it does appear to be how the cools do. Certainly the advent of rmarkdown has meant that people don't have to roll their own conversion from .md to .whatever with pandoc (since it comes included with the package). I also like that I can control the behaviour of the pandoc conversion through arguments in the YAML of the .Rmd document. That means that I only have to look in one place to see how a file is formatted. I'd love to talk about this (and more than this) sometime.
  • creating read_csv() seems like a good idea. It occurs to me that I might have combined the reading and joining steps into a single function, which would have directly created another target (say accounts). This target would then depend on all in input files. It might be simply a matter of taste, but I like the approach of many small functions (which maker seems to favour)
  • I think it is really cool that you are allowing your commands to be sequential, imitating the pipe infix operator. is there a reason why you can't just allow people to use the pipe within a maker rule? related: it's interesting that you've created a tool that works so smoothly with the pipe workflow, despite not have drunk of that particular koolaid. i'm thinking specifically of the ease of using the pipe to compose functions ( by starting a pipeline with .). I used that technique frequently to convert my script into a series of functions which I could source into maker.
  • about that business of using target_name in a rule:
  info:
    command: read_csv(file="coffee_database/{{target_name}}.csv")

I don't think that's quite what I had in mind, since this would only save me writing the name of file ("info") twice. I was trying to imagine a way to create everything with a single rule (consumption, payments, goods, info, people). Is there a way to do that? I am thinking of another project that contains hundreds of such files. Perhaps I should look into whisker?

richfitz commented 9 years ago

Moar comments:

I'd even go so far as to suggest you make it required. Perhaps something like the R package DESCRIPTION file, where version numbers are specified as being >= a particular version?

I'm trying to avoid recreating packrat. It'd be great if rather than being an all singing solution it would let you do this sort of stuff. If I further down that rabbit hole I'll farm that stuff off into its own package most likely.

corrected. How did you know? Did you find that out from reading my codez, or do you have a means of finding missing dependencies?

When I built it, the .tex file wasn't there so your code didn't work I think. Putting the file in depends changes the traversal order so the .tex file exists by the time it's used.

That's certainly convenient! What happens if it throws an error?

Errors will appear, but any other information won't be, which is not ideal. I've opened an issue (see reference above) to think about dealing with this.

aammd commented 9 years ago

Related: it appears that there's trouble with Travis. I suspect it is that rmarkdown doesn't automatically bring in a version of pandoc on installation? I thought it did, but perhaps a different package is needed?

richfitz commented 9 years ago

Yeah, I think you need to set a few variables:

From here

env:
  global:
    - BOOTSTRAP_LATEX="1"    

(via the travis recipes wiki).

I think you'll also need BOOTSTRAP_PANDOC="1" too. Can you try and let me know?

richfitz commented 9 years ago

Hey Andrew, try remake::diagram() with the current version of remake (install DiagrammeR first)

aammd commented 9 years ago

exquisite!

richfitz commented 9 years ago

Heads up about a breaking change, as I think you're the only person it affects.

From the next version (which I'll push tonight or so - one day things will get far enough to warrant version numbers) package loading will be delayed until the last possible minute. This is because lots of packages are very slow to load, and because I want people to be able to make the graphs without installing every package (and because I set the graphs up shade by whether or not a target is up-to-date).

Anyway, this affects you because of the dead fancy way of writing magrittr pipe functions that you use:

fix_date <- . %>% 
  mutate(Date=ymd(Date))

Very cool, but it requires the pipe operator to exist :cry:

So, for now you'll need to add library(magrittr) to top of that file, as well as within the packages: section. The latter is technically optional but it makes it heaps easier for other people to install things, and for some of the other things that I have brewing.

But -- a penny for your thoughts: what do you think about an additional packages:-like packages_source: that would indicate the packages that are required for sourcing functions? If you listed them there you would not have to list them anywhere else. Better? Ugly?

Have fun drinking whisky in Baltimore. If you ever come to Sydney I will take you to the place I tweeted.

aammd commented 9 years ago

Oh dear. this got lost in the shuffle of github notifications. I really must learn how to manage that better.

When I corrected my .R files after this change, there were only two packages that needed to be added: magrittr's pipe and pryr, for pryr::partial. I feel like these are going to be pretty common companions of remake, because they seem to really fit the style that remake favours. Function composition via the pipe is a great way to create a function that has only one input and one output - perfect for creating a rule to transform a dataframe, for example. pryr::partial() is handy for setting default arguments of a function. Remake favours this style because you can't specify arguments in the remakefile (it thinks they're dependencies). I found myself writing functions with the intention passing them through partial first.

Which is all to say, yes I think we need the ability to load packages before sourcing scripts.

How about a value for packages to be sourced nested under sources? that might be cleaner:

sources: 
  packages: 
    - magrittr
    - pryr
  scripts:
    - wicked_functions.R
    - kludge.R

On the other hand, there is something to be said for the way you currently have it. Right now, all packages must be listed under packages: in the YAML, as well as in any scripts which require them to build functions. This means that if I move/lose/email my .R file, it will still work in another context and doesn't need the remakefile to load packages first. How would somebody know, just from looking at kludge.R, that partial() was from a package called "pryr"?

richfitz commented 9 years ago

You can specify arguments to functions - you just have to wrap strings with I(), remember. That said if you use pryr::partial with the pryr:: part, everything will just work.

Perhaps with reference to DESCRIPTION there could be a depends: field (for packages that must be loaded) and an imports: field (packages used the way that we currently use them). Perhaps the packages list could be separated into these two fields:

packages:
  depends:
    - magrittr
    - pryr
  imports:
    - rgdal # dear god
    - diversitree # do you have no mercy man

or without structure if all packages are in imports:

packages:
   - testthat
   - whatever