swcarpentry / DEPRECATED-bc

DEPRECATED: This repository is now frozen - please see individual lesson repositories.
Other
299 stars 383 forks source link

first pass on novice r reference sheet #672

Closed chendaniely closed 9 years ago

chendaniely commented 9 years ago

addresses #663

First pass on the reference sheet.

almost just typed in any function/code I saw

Lots of work to do, but it's a start.

gavinsimpson commented 9 years ago

Thanks for taking a lead on this.

You've written this in HTML in a .md document. Looking at the sources for the other reference cards it appears these should be written in github markdown which the github will parse to HTML for us. The HTML will need converting to markdown. You could try running the HTML through Pandoc to convert it to Markdown so you don't have to manually edit the HTML out. For reference, the .md files are (here)[https://github.com/swcarpentry/bc/tree/master/novice/ref] so you can see how the others have been setup and formatted. The basic structure for the *shell reference sheet is:

---
layout: lesson
root: ../..
title: Shell Reference
---

#### Basic Commands

*   `cat` displays the contents of its inputs.
*   `cd path` changes the current working directory.
*   `cp old new` copies a file.
*   `find` finds files with specific properties that match patterns.
*   `grep` selects lines in files that match patterns.
*   `head` displays the first few lines of its input.
*   `ls path` prints a listing of a specific file or directory; `ls` on its own lists the current working directory.
*   `man command` displays the manual page for a given command.
*   `mkdir path` creates a new directory.
*   `mv old new` moves (renames) a file or directory.
*   `pwd` prints the user's current working directory.
*   `rm path` removes (deletes) a file.
*   `rmdir path` removes (deletes) an empty directory.
*   `sort` sorts its inputs.
*   `tail` displays the last few lines of its input.
*   `touch path` creates an empty file if it doesn't already exist.
*   `wc` counts lines, words, and characters in its inputs.
*   `whoami` shows the user's current identity.

Perhaps base the reference sheet on the Python one and have similar sections in the same order. Then we can locate the R-specific things that we want to cover that are important but which don't have natural counterparts to in Python or aren't considered in the Python Reference sheet?

I haven't looked closely yet because of the HTML vs MD issue, but I note you are propagating a false assumption that in R for() loops are i) slow, and ii) that apply() is faster. If you look at the code for apply() you'll see a for() loop in there, invalidating your observation. Object copying is slow in R, especially as objects grow in size; a for() loop that doesn't fill in a pre-allocated object will be slow, but not because looping per se is slow. apply() takes care of the pre-allocation for you but it is often not convenient to force a loop into the apply()ed function format.

jdblischak commented 9 years ago

Thanks for getting this started, @chendaniely. And also thanks to @gavinsimpson for starting the review.

+1 to writing in Markdown and not html +1 to removing the section on "slow" loops.

@chendaniely, I think you would really benefit from looking at the reference sheets for the shell and Python (links below). First, they will show you how to add the YAML header and to use Markdown syntax. Second, you will see they are much more limited in scope. The purpose of the reference sheet is to have some basics for a novice to glance at as they are trying to complete bootcamp exercises. The goal is not to fully document everything one can do with R.

shell: rendered, source R: rendered, source

chendaniely commented 9 years ago

oops, wasn't reading carefully... I was following the header from the loop lesson that said

Loops in R are slow

No, they are not! If you follow some golden rules.>

Don't use a loop when a vectorised alternative exists Don't grow objects (via c(), cbind(), etc) during the loop - R has to create a new object and copy across the information just to add a new element or row/column Allocate an object to hold the results and fill it in during the loop

Should I change the line to just say that loops are good and copy those rules into the document?

Thanks for pointing out the github markdown vs markdown. Clearly I was not paying any attention to anything, just looked at the rendered documents and typed away.

gavinsimpson commented 9 years ago

To be honest I don't think the reference sheet needs notes on how to write a good or not write a bad loop. Just list the various looping function; essentially for () and while () (there is a do-like one, repeat but that is far less used so I don't think we need it).

If in doubt, don't add commentary on good vs bad practice - leave that to the lessons where the instructor can explain things better.

BernhardKonrad commented 9 years ago

I know that means more work, but I think examples for each bullet point would be great (the current version already has some).

chendaniely commented 9 years ago

@BernhardKonrad I'm fine with putting in examples, but that seems like the document would be best written in an Rmd document?

Also by 'examples' do you mean full functioning code? The python ref sheet is mostly pseudo-code

jdblischak commented 9 years ago

I'd prefer if we could keep this within the same scope and style as the other reference sheets. These are not designed to be that long.

$ wc -l novice/ref/*md
   53 novice/ref/01-shell.md
  146 novice/ref/02-git.md
   61 novice/ref/03-python.md
  114 novice/ref/04-sql.md
  119 novice/ref/05-prompts-exits.md
   17 novice/ref/index.md
  510 total

I think @gavinsimpson suggested a good plan:

Perhaps base the reference sheet on the Python one and have similar sections in the same order. Then we can locate the R-specific things that we want to cover that are important but which don't have natural counterparts to in Python or aren't considered in the Python Reference sheet?

BernhardKonrad commented 9 years ago

By examples I mean minimal working code, like

if (x > 0){ print('value is positive') } else if (x < 0){ print('value is negative') } else{ print('value is neither positive nor negative') }

instead of

`if (condition_1){

do if contition_1 is TRUE

} else if (condition_2){

do if condition_1 is False

  # and if contition_2 is True

} else{

otherwise do this

}`

Same with an example of ==, apply(), list.files(pattern='_plot.R'). If well chosen, I think this transports more useful information than the generic version, without being longer.

gvwilson commented 9 years ago

What's the status of this one?

chendaniely commented 9 years ago

@gvwilson I believe I addressed everything above line 75. Things below need to be decided whether or not it needs to be taken out.

@BernhardKonrad @jdblischak @gavinsimpson

jdblischak commented 9 years ago

For multi-line code blocks, you need to indent the code. Run make site to see your changes.

Please remove the following sections: R, Working with data, Plotting, Processing multiple files.

jdblischak commented 9 years ago

Thanks for the PR, @chendaniely. Thanks for the reviews, @gavinsimpson and @BernhardKonrad.