sachsURAP / LSSR-2019

Data and code for the Life Sciences in Space Research 2019 paper. Concerns modeling murine Harderian gland tumorigenesis induced by mixed radiation fields.
GNU General Public License v3.0
1 stars 1 forks source link

Style and General Tips #4

Open eghuang opened 7 years ago

eghuang commented 7 years ago

0. GitHub & RStudio setup and issues

Most common issues with Github and RStudio can be resolved after some googling, but there are times that we are unaware that there is an issue with our current setup. For the purposes of our project, we should at least be familiar and able to access the following list of Github/RStudio functions.

Let me know if any of these are unfamiliar to you.

1. Style guidelines

Here is a well written guide by google that accurately reflects the style conventions of the R community. It addresses most of the style "issues" in our script.

1.1. Reminders

2. General programming tips:

2.1. Environment Management

To clear the global environment (undefine all variables, functions, data, etc.) you can use:

rm(list=ls())

2.2. Debugging with breakpoints

2.2.1. browser() breakpoints

To better understand why your code may be raising error messages, you may add browser() to a new line above the code you suspect is buggy. When you run your code, the debugger will be raised at the line with browser() and drop you into the current environment1 of the script. This means that everything that has been created or changed by the code up to that line will be available to you to view and call in the console. With browser(), you can easily check or test objects created in function environments. Hint: use str() to check the type of an object.

For example, you can use browser() to check how the value of a variable changes in a loop. Consider the following function:

loop <- function(x) {
  for (i in seq(100)) {
    browser()
    x <- x + 1
  }
  return(x)
}

Let's say you want to examine the behavior of loop. If you run loop without the browser() call in the third line, you simply get the output of loop. With the browser() call, you can closely examine the environment of loop. When loop() is called, browser() drops you into the debugger when it is evaluated. If we check the value of x, we can see that it is 0.

> loop(0)
Called from: loop(0)
Browse[1]> x
[1] 0

If we let the debugger continue to the next browser() (press the continue button or run c) and check the value of x, then we can see that x is 1 in the second loop.

Browse[1]> c
Called from: loop(0)
Browse[1]> x
[1] 1

We can continue to run the debugger to see how x changes as loop runs.

Browse[1]> c
Called from: loop(0)
Browse[1]> x
[1] 2

In this particular example, loop is clearly simple, but with more complicated functions or loops, browser() can shed much more insight.

2.2.2. Editor breakpoints

RStudio allows users to set breakpoints without changing the existing code by clicking directly to the left of a line of code. A red dot should appear. If a red circle appears instead, then the breakpoint is deferred. This can happen for a number of reasons, but saving the file or running source() in the console or with the editor should change the circle to a dot. Editor breakpoints, unlike browser() breakpoints, can only be used with source(). They are generally less versatile than browser().

2.2.3. Debugger console

Running or sourcing code with active breakpoints will halt execution at the first encountered breakpoint. At this point, the console will display several new commands:

The console also can run most R code, which is useful for checking the values of variables or writing test functions within the debugger.

2.2.4. Additional resources

RStudio documentation for debugging resources can be found here.

2.3. Locating source code

Try getAnywhere(function). However, it's usually more useful to step into source code when using breakpoints and the debugger, especially for complicated functions.

2.4. Reducing runtime

Sometimes we find that we would like our programs to run faster. Here are various methods to locate and rewrite slow code to be more efficient.

2.4.1. Finding slow code

We can use proc.time() like to so examine whole code blocks or individual lines in a function if we suspect a certain part of our code is running much slower than the rest. proc.time() allows us to "time" our code by calling it before and after our code blocks to calculate the actual runtime of our code as the difference between the proc.time() calls. As a simple example:

> startTime <- proc.time()
> n = 0
> for (i in seq(100, .01)) n = n + i
> endTime <- proc.time()
> endTime - startTime

   user  system elapsed 
  0.005   0.001   0.041 

Note that the results are given in units of a second.

2.4.2. Writing faster code

Most of our inefficient code results from bad design. Make sure that your higher-order functions and algorithms are not making unnecessary calls and that you thoroughly understand what your code is doing. Try to preallocate calculations.

For more details and other issues, this stackoverflow post puts it better than I can.

3. Footnotes

1 An environment is essentially a space in which objects such as variables and functions are defined. The global environment is the the default environment and the outermost environment we work in. Anything defined or loaded outside of a function call exists in the global environment. Each time a function is called, a new environment called a "frame" is opened. Objects created inside a function call, including other functions, will be defined in the new frame. The environment or existing frame in which the new frame is opened is the new frame's "parent environment". Objects defined in the parent environment can be used in their child frames, but a child frame cannot redefine variables in the parent environment. The code below demonstrates what happens when one attempts to redefine a variable in the parent environment.

> a <- 1 #  Not in function call, defined in global environment
> foo <- function() { #  Creates a frame F1 inside the global environment. F1 can use anything defined in the global environment.
>   a <- a + 1 #  Defines the new variable a inside F1, not the parent environment.
>   return(a)
> }

> foo() 
[1] 2

> a #  Note that a is not changed in the global environment.
[1] 1

If another function is called inside of the first function body, then a second frame is created such that the parent environment of the second frame is the first frame. This implies that the second function has access to any objects created in the first frame or the global environment. Any further nested functions behave similarly.

a <- 1 #  a is defined in global environment
foo <- function() { #  Creates a frame F1 inside of the global environment.
  b <- a + 1 #  b is defined in F1. foo can use variables defined in the global environment.
  foobar <- function() { #  Creates a frame F2 inside of F1. 
    c <- a + b #  c is defined in F2. foobar can use variables in both F1 and the global environment.
    return(c)
    }
  return(foobar())
}

When the function call terminates, the frame is closed and all the objects defined within it are discarded. Only the output of the function call (the return() call in a frame) is passed from the child frame to the parent environment. In the example above, b and c are discarded after calling foo. However, c is the output of foobar(), so a call to foo() would return the value of c, or 3.

rainersachs commented 7 years ago

thanks Edward. I read the Google guide and your comments; I started to implement them. But in some cases it didn't work yet.

eghuang commented 7 years ago

September 6, 2017:

Other Updates (most recent at bottom):

May 29, 2019: