Open eghuang opened 5 years ago
I think the correct style guide is Advanced R by Hadley Wickham at http://adv-r.had.co.nz/Style.html. the google style guide I got by googling uses different conventions.
I agree that Wickham's conventions are more appropriate for our purposes and more closely resembles our current code. However, the Google style guide covers a few topics not found in Wickham's guide so I suggest we defer to Google's guide for anything not already in Wickham's guide. I have edited my post to include a link to Wickham's guide and instructions for when to use which guide.
10 Jun 2019: Added links to resources for Git, branching, and general R.
21 Jul 2019: Minor rewording, added links to dataframe subsetting and tryCatch
.
30 May 2019: Copied from NASAmouseHG.
0. GitHub & RStudio setup and issues
Most common issues with Github and RStudio can be resolved after cursory searching, but sometimes we are unaware that there is an issue with our current setup. We should at least be familiar with and able to access the following list of git, Github, and RStudio functions.
0.1 Branches
Refer to this subsection of Wickham's R Packages for information on basic Git and branching.
1. Style guidelines
We will follow Hadley Wickham's style guide for our scripts. It accurately reflects the style conventions of the R community. It additionally addresses most of the style issues in our script. Wickham's guidelines are derived from the Google style guide, which is more detailed and should be consulted for topics not covered in Wickham's guide.
1.1. Reminders
Use
<-
to assign variables instead of=
. This is mostly convention, but there is a small technical difference. See this post for more details.Keep lines under 80 characters, including comments. Try to break lines before operators. When in doubt, break lines where it would be the most convenient to readers.
Our variable names should be concise and descriptive.
Names should use snake_case instead of camelCase because many of our abstractions are best described with acronyms. Examples:
foo_bar
,harderian_gland
,iea
. Capitalized acronyms shall be used as convention in radiation research literature (e.g.nte_HZE_ider
,low_LET_ider
).2. General programming tips:
Wickham's Advanced R is strongly recommended as a general resource.
2.1. Environment Management
To clear the global environment (erase all variables, functions, data, etc.) use:
rm(list = ls())
2.2. Debugging with breakpoints
2.2.1.
browser()
breakpointsTo better understand why your code may be raising error messages, you may add
browser()
to a new line above the code you suspect is buggy. When you run your code, the debugger will be raised at the line withbrowser()
and drop you into the current environment1 of the script. This means that everything that has been created or changed by the code up to that line will be available to you to view and call in the console. Withbrowser()
, you can easily check or test objects created in function environments. Hint: usestr()
to check the type of an object.For example, you can use
browser()
to check how the value of a variable changes in a loop. Consider the following function:Suppose you want to examine the behavior of
loop
. If you runloop
without thebrowser()
call in the third line, you simply get the output ofloop
. With thebrowser()
call, you can closely examine the environment ofloop
. Whenloop()
is called,browser()
drops you into the debugger when it is evaluated. If we check the value ofx
, we can see that it is 0.If we let the debugger continue to the next
browser()
(press the continue button or runc
) and check the value ofx
, then we can see thatx
is 1 in the second loop.We can continue to run the debugger to see how
x
changes asloop
runs.In this particular example,
loop
is clearly simple, but with more complicated functions or loops,browser()
can shed much more insight.2.2.2. Editor breakpoints
RStudio allows users to set breakpoints without changing the existing code by clicking directly to the left of a line of code. A red dot should appear. If a red circle appears instead, then the breakpoint is deferred. This can happen for a number of reasons, but saving the file or running
source()
in the console or with the editor should change the circle to a dot. Editor breakpoints, unlikebrowser()
breakpoints, can only be used withsource()
. They are generally less versatile thanbrowser()
.2.2.3. Debugger console
Running or sourcing code with active breakpoints will halt execution at the first encountered breakpoint. At this point, the console will display several new commands:
foo(x)
then the halted point of execution would be moved to the source code of 'foo'.The console also can run most
R
code, which is useful for checking the values of variables or writing test functions within the debugger.2.2.4. Additional resources
RStudio documentation for debugging resources can be found here.
2.3. Locating source code
Run
getAnywhere(function)
. However, it's usually more useful to step into source code when using breakpoints and the debugger, especially for complicated functions.2.4. Reducing runtime
Sometimes we would like our programs to run faster. Here are various methods to locate and rewrite slow code to be more efficient.
2.4.1. Finding slow code
Use
proc.time()
if you suspect that a certain part of your code is abnormally slow. Callingproc.time()
before and after your code allows you to find the actual runtime as the difference between theproc.time()
calls. As a simple example:Note that the results are given in units of a second.
2.4.2. Writing faster code
Most of our inefficient code results from bad design. Make sure that your higher-order functions and algorithms are not making unnecessary calls and that you thoroughly understand what your code is doing. Try to preallocate calculations.
See this StackOverflow post for further reading.
2.5. Dataframe subsetting
See this link for basic examples.
2.6.
tryCatch
Useful for error handling. See this link for a great primer.
3. Footnotes
1 An environment is essentially a space in which objects such as variables and functions are defined. The global environment is the the default environment and the outermost environment we work in. Anything defined or loaded outside of a function call exists in the global environment. Each time a function is called, a new environment called a "frame" is opened. Objects created inside a function call, including other functions, will be defined in the new frame. The environment or existing frame in which the new frame is opened is the new frame's "parent environment". Objects defined in the parent environment can be used in their child frames, but a child frame cannot redefine variables in the parent environment. The code below demonstrates what happens when one attempts to redefine a variable in the parent environment.
If another function is called inside of the first function body, then a second frame is created such that the parent environment of the second frame is the first frame. This implies that the second function has access to any objects created in the first frame or the global environment. Any further nested functions behave similarly.
When the function call terminates, the frame is closed and all the objects defined within it are discarded. Only the output of the function call (the
return()
call in a frame) is passed from the child frame to the parent environment. In the example above,b
andc
are discarded after callingfoo
. However,c
is the output offoobar()
, so a call tofoo()
would return the value ofc
, or 3.