programminghistorian / ph-submissions

The repository and website hosting the peer review process for new Programming Historian lessons
http://programminghistorian.github.io/ph-submissions
138 stars 113 forks source link

review Ticket: Simple Calculations with R #19

Closed acrymble closed 8 years ago

acrymble commented 8 years ago

The Programming Historian has received the following tutorial on 'Simple Calculations with R' by @taryndewar. This lesson is now under review and can be read at:

http://programminghistorian.github.io/ph-submissions/lessons/Simple-Calculations-with-R

I will act as editor for the review process. This is a re-submission. My role is to solicit two reviews from the community and to manage the discussions, which should be held here on this forum. I have already read through the lesson and provided feedback, to which the author has responded.

Members of the wider community are also invited to offer constructive feedback which should post to this message thread, but they are asked to first read our Reviewer Guidelines (http://programminghistorian.org/reviewer-guidelines) and to adhere to our anti-harassment policy (below). We ask that all reviews stop after the second formal review has been submitted so that the author can focus on any revisions. I will make an announcement on this thread when that has occurred.

I will endeavour to keep the conversation open here on Github. If anyone feels the need to discuss anything privately, you are welcome to email me. You can always turn to @ianmilligan1 if you feel there's a need for an ombudsperson to step in.

Anti-Harassment Policy _

This is a statement of the Programming Historian's principles and sets expectations for the tone and style of all correspondence between reviewers, authors, editors, and contributors to our public forums.

The Programming Historian is dedicated to providing an open scholarly environment that offers community participants the freedom to thoroughly scrutize ideas, to ask questions, make suggestions, or to requests for clarification, but also provides a harassment-free space for all contributors to the project, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age or religion, or technical experience. We do not tolerate harassment or ad hominem attacks of community participants in any form. Participants violating these rules may be expelled from the community at the discretion of the editorial board. If anyone witnesses or feels they have been the victim of the above described activity, please contact our ombudsperson (Ian Milligan - http://programminghistorian.org/project-team). Thank you for helping us to create a safe space.

_

nabsiddiqui commented 8 years ago

I will review this project if needed.

drjwbaker commented 8 years ago

Specific Comments

General Comments

This is much better. The top half is especially strong: if felt the lesson loses momentum as it goes along, both in terms of narrative and presentation. The latter is easy to fix (more careful editing). But on the former I do wonder about the ordering of the lesson. The narrative from that outset seems to be moving from smaller data to bigger data, but then we go to matrices (for small data) after the larger dataset example, followed by uploading a small dataset as a .csv (which, although more manageable in terms of learner interaction is odd given that in reality if you had a small dataset you'd make a matrix, right?)

For me then, what we have going on in this lesson are two distinct things: a lesson on simple calculation in R and a lesson on input methods for R (crudely speaking, matrices vs import via csv, and why you might choose one over the other). I'd suggest that these two either need to be separated into different lessons or those different learning outcomes reflected in the organisation of the lesson. I have a preference for the latter, given that as stand-alone lessons they wouldn't get a historian from no R and no idea what R can do for them to some R and some idea what R can do for them. Best of luck and happy to discuss if something isn't clear!

acrymble commented 8 years ago

Thanks @drjwbaker. We'll let @taryndewar respond to these once we have the reviews in.

@nabsiddiqui the lesson is currently with another reviewer, but if you'd like to contribute to the open review stage, please feel free to do so. We'll close that phase once we've heard back from the other reviewer.

nabsiddiqui commented 8 years ago

If the reviewers are already chosen, I will just wait until something else is in need of review. If I do end up doing the tutorial, I will provide comments if they are needed. Congratulations on the tutorial, looks great.

histlib commented 8 years ago

Introduction P 2-5: these paragraphs all deal with, to varying degrees, why R rather than manually compute; as such would be better to merge to reduce redundancy (especially with the repetitions around the word "manually"). P 2,3-4: these two sentences communicate the same thing I might approach the introduction more like:

Installing R P 1: I favor being slightly more complex in this first paragraph: "R is a programming language and environment for working with data. R can be run using the R console, which is what this tutorial will focus on, as well as on the command line or the more user-friendly interface of RStudio." Then you can continue on with "To get started with R..."

Using the R Console P 1: I'd junk the command line sentences in the first paragraph, especially since you can also write savable scripts in the R console (File > New Script). P 2: "...or by selecting GUI preferences in the Edit menu" or something like that; the way you have it now is a little unclear P 3: This paragraph doesn't fit in here. I think you could cut it entirely.

Using Data Sets P 1: What if you got rid of the first sentence and started with "Before working with your own data, it helps to practice [or perhaps, "to get a feel for R"] using the datasets included with R." P 1: Final 3 sentences (starting with "These are great for practicing") should be cut. Introduce importing when it's time to introduce importing. P 2: the sentence referring to who compiled the data would be better as a parenthetical at the end of the previous sentence P 2: why repeat the data(AirPassengers) and AirPassengers commands twice? P 3: Something more like: "You can now use R to answer a number of questions based on this data, for example, the most popular months to fly or if there was an increase in international travel over time. You could probably find the answers to such questions simply by scanning this table, but not as quickly as the computer. And what if there was a lot more data?"

Basic Functions In your introduction of variables, it might be useful to link out to a trusted tutorial about naming conventions/best practices for variables in R.

In the solutions for this section, the fourth solution (for: What is the total number of people who flew in 1950?) has fallen out of the formatting for the numbered list/table.

Working with Larger Data Sets My tendency is to be irritated when I'm made to do tedious things when there are much simpler solutions at hand. So I'm not sure of the value in having all of the examples in the previous sections. I'd much rather get to this section faster, using the previous section as a way to introduce the simple statistical functions plus variables.

There's an error in the code below "To see a column of the data, you could enter" (the return for mtcars[1,2] surely isn't three back-ticks). I suspect this is an error in how the R code is embedded since "This would show you..." is probably not part of the code window, and then there's the "mtcars[1,2][1]6" string after.

Matrices P 1: random "Matrices" inserted after first sentence. The second sentence might be better off with "...knowing how to construct matrices in R..." P 2: I'd rather "To do this, let's create the variables Theft and ViolentTheft using the totals from each decade as data points:" It also would be useful to use a screenshot (or something else) to show where this data is coming from. The "cbind() combines the data by column" is redundant, just start in with the rbind stuff. Adding the t function here (in passing) might be too much. P 3: I think the natural question while reading this would be "why can't I just run the matrix function on my two variables?" P 4: I don't believe this paragraph adds much to the tutorial. P 5: I'd redo the intro to this paragraph by just launching in: "The apply() function allows you to ..." I also wonder if this discussion - using the car data instead of the crime data - might be better off in the previous section. Final paragraph for this section: the thing is, matrices can be useful with large amounts of data, too; manually creating matrices is only useful with small data. I would delete this.

Loading Your Own Data Sets into R Intro into this section with some kind of "Now that you've practiced with simple data, you're probably ready to work with your own. Chances are your data is in a spreadsheet; how can you work with this data in R?" You don't have to convert to CSV since there is a package for importing Excel files (readxl). Perhaps you could introduce the standard read functions (because what if they have tab-delimited rather than comma-delimited data) plus readxl. The working directory issue is important, so good to introduce this here, but maybe give it more attention, such as its own paragraph with example. Could also add in here how to write to file. Like, you've got your crime matrix, here's how to save it.

Summary and Next Steps Erroneous pound sign in section title/formatting off I might use "work with research data" in the first sentence. The tutorial is more about manipulating data than analyzing it, isn't it? P 2: For more information on R, visit the R Manual. P 3: Be more selective here, I think. Like one online tutorial and then the DataCamp course. And annotate - why do you like this tutorial in particular? Who is it good for? Are these free? P 4: What's so great about Digital History Methods in R? Is it introductory or for advanced users?

General Issues Regarding James' general comments: I agree that there is a slight disconnect - there's some basic quant work and then how to enter/manipulate data - but I think the solution is in the packaging. Tweaking the intro and conclusion along the lines I've suggested and then coming up with a different title would do the trick. I'm rubbish at titles or else I'd suggest a few ("R Data Basics"?).

acrymble commented 8 years ago

Thanks to you both. I'll need a few days to summarise this for our author. But I'll try to do that as soon as I can.

acrymble commented 8 years ago

Thanks to our two reviewers. We'll close reviews at this stage so @taryndewar can focus on making updates.

I think the two reviews are fairly self-explanatory, but ask questions if you need to @taryndewar. Both reviewers have focused primarily on helping to clarify concepts and language, and working on reducing some redundancy in the lesson.

There is a fairly substantial list of suggested copy edits, which I’d invite you to consider. Probably easiest to do these first, as they may help rectify some of the other issues. You don’t have to accept everything that was suggested, but you might want to acknowledge that anywhere a reviewer paused and thought: this doesn’t sound right, your readers will pause too, and they might not have as much experience.

A few things I’d particularly like you to respond to:

1) R vs Excel rather than R vs manual. I think James makes a good point here, and that it might be more compelling to compare what R can do to what people tend to use Excel for. Does anyone calculate things with pen and paper anymore? 2) The confusion about the different ways you can use R (eg, via console). This may confuse people, so perhaps best to give them one way and stick to it? They can learn other options later, but you don’t want to overwhelm them. 3) James has suggested reordering sections to make the lesson flow better for a new learner. John has suggested an alternative solution that involves beefing up some sections (which he’s provided very clear suggestions on).

We’ll probably have to think about a title, but let’s do that at the end.

Finally, I can appreciate why someone might be frustrated going step by step through easy examples, but we need to be wary that some users will be starting from zero, so I’d like you to keep the easy early examples in place so that we don’t raise the barrier to entry.

When you've had a chance to make the changes @taryndewar, please post here letting us know what you have/have not done. It looks like a big list, but a lot of it is copy editing, so I don't expect you've got a big job ahead of you. Let me know if you need any support or have any questions.

wcaleb commented 8 years ago

The images and figure syntax for this lesson will need to be updated according to the new guidelines posted here.

acrymble commented 8 years ago

@taryndewar has made edits based on the feedback. I've also done a copyedit and in the process have changed the name and URL:

https://github.com/programminghistorian/ph-submissions/blob/gh-pages/lessons/r-basics-with-tabular-data.md

Just waiting on a code example from @taryndewar for the 'Saving Data in R' section and then ready to go.

acrymble commented 8 years ago

Suggested images for icon:

https://www.flickr.com/photos/britishlibrary/11065618604/in/album-72157638733975756/ https://www.flickr.com/photos/britishlibrary/11054877045/in/album-72157638733975756/ https://www.flickr.com/photos/britishlibrary/11081312545/in/album-72157638733975756/ https://www.flickr.com/photos/britishlibrary/11081699376/in/album-72157638733975756/

acrymble commented 8 years ago

This has been published at: http://programminghistorian.org/lessons/r-basics-with-tabular-data

Thanks @drjwbaker @histlib for your efforts. We appreciate the time you put in to improve lessons.