Review Ticket for 'Basic Text Processing in R'

jerielizabeth commented 7 years ago

The Programming Historian has received the following tutorial on 'Basic Text Processing in R' by @statsmaths and @nolauren. This lesson is now under review and can be read at:

http://programminghistorian.github.io/ph-submissions/lessons/basic-text-processing-in-r

Please feel free to use the line numbers provided on the preview if that helps with anchoring your comments, although you can structure your review as you see fit.

I will act as editor for the review process. My role is to solicit two reviews from the community and to manage the discussions, which should be held here on this forum. I have already read through the lesson and provided feedback, to which the author has responded.

Members of the wider community are also invited to offer constructive feedback which should post to this message thread, but they are asked to first read our Reviewer Guidelines (http://programminghistorian.org/reviewer-guidelines) and to adhere to our anti-harassment policy (below). We ask that all reviews stop after the second formal review has been submitted so that the author can focus on any revisions. I will make an announcement on this thread when that has occurred.

I will endeavor to keep the conversation open here on Github. If anyone feels the need to discuss anything privately, you are welcome to email me. You can always turn to @ianmilligan1 or @amandavisconti if you feel there's a need for an ombudsperson to step in.

Anti-Harassment Policy

This is a statement of the Programming Historian's principles and sets expectations for the tone and style of all correspondence between reviewers, authors, editors, and contributors to our public forums.

The Programming Historian is dedicated to providing an open scholarly environment that offers community participants the freedom to thoroughly scrutinize ideas, to ask questions, make suggestions, or to requests for clarification, but also provides a harassment-free space for all contributors to the project, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age or religion, or technical experience. We do not tolerate harassment or ad hominem attacks of community participants in any form. Participants violating these rules may be expelled from the community at the discretion of the editorial board. If anyone witnesses or feels they have been the victim of the above described activity, please contact our ombudspeople (Ian Milligan and Amanda Visconti - http://programminghistorian.org/project-team). Thank you for helping us to create a safe space.

jerielizabeth commented 7 years ago

Thank you, @nolauren and @statsmaths for this lesson! I think the lesson is ready as is for external reviewers.

I am soliciting for reviewers currently, and hope to have the reviews submitted by the end of January (leaving a little extra time for the holidays).

acrymble commented 7 years ago

I'm really pleased to see another R lesson to compliment our current offering.

Just one very minor point from me: You don't mean 'digitised' text. You mean 'machine readable' text. Digitised could merely mean it's been scanned and is in image format.

walshbr commented 7 years ago

Am loving the lesson so far! I ran into a snag in paragraph 37 when using R to read in the text files from where they're hosted on the Programming Historian staging server. This bit:

files <- sprintf("%s/sotu_text/%03d.txt", base_url, 1:236)
text <- c()
for (f in files) {
  text <- c(text, paste(readLines(f), collapse = "\n"))
}

Ran through the lessons a couple times, copy and pasting all the code snippets, etc. @jerielizabeth said she wasn't having this problem, so we thought throwing it up here would be helpful. The output of 'files' looks something like this:

> files
  [1] "http://programminghistorian.github.io/ph-submissions/assets/basic-text-processing-in-r/sotu_text/001.txt"
  [2] "http://programminghistorian.github.io/ph-submissions/assets/basic-text-processing-in-r/sotu_text/002.txt"
  [3] "http://programminghistorian.github.io/ph-submissions/assets/basic-text-processing-in-r/sotu_text/003.txt"
  [4] "http://programminghistorian.github.io/ph-submissions/assets/basic-text-processing-in-r/sotu_text/004.txt"

And each of those links resolves as expected. My suspicion after experimenting more is that github is throttling my connection thinking that I'm a bot. I've been running a counter over the loop to see how far it is getting before it fails, and it's variable each time. Here are the points in the loop at which it has failed the last few times:

106, 47, 49, 100, 62, 51, 70

Adding a random sleep timer between scrapes didn't really seem to help that much, though I'm giving it a few spins before giving up on that entirely. Any thoughts on what might be going on?

jerielizabeth commented 7 years ago

Status update: we have two reviewers in line for this piece with the goal of having the reviews completed by the end of January (allowing for the holidays).

Once both reviews are in, I will summarize and provide any additional commentary for the authors to work from in revising the lesson.

shawngraham commented 7 years ago

Hi folks - just wanted to say that I found this lesson really useful. If I might make a suggestion - while I know that this is meant more for an intermediate user, it'd be useful perhaps to make a gesture somewhere to how a user might ingest their own texts from a local location. It might also be handy in the further directions to point towards some of the vignettes within the tidyverse, eg tidytext https://cran.r-project.org/web/packages/tidytext/vignettes/tidytext.html

Thanks again,

Shawn

jerielizabeth commented 7 years ago

@bmw9t I am still struggling to duplicate that particular error. Maybe a little more information would help: are you getting any error messages in R that you could share here?

walshbr commented 7 years ago

@jerielizabeth Ah sorry - thought I had pasted the error message in here. The only error message R was giving me was this:

Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
  unable to connect to 'programminghistorian.github.io' on port 80.

Might have been something with the particular settings on our firewall or something? I suggest that because now I'm on a different network in a different state, and those lines were able to read everything in from GitHub without any problems. I'll check when I'm not traveling and back on the home network. Can also try when I'm back on W&L's network.

walshbr commented 7 years ago

Thanks, @jerielizabeth for facilitating and to @statsmaths and @nolauren for writing this lesson.

I really enjoyed the tutorial! I think it's a great piece, and it's a lovely introduction to basic text analysis with R. I have several point-by-point things below that address particular line numbers as relevant. And then at the end of this comment are just a few typos that I noticed going through. I'm looking forward to being able to point students and faculty to this work in the future.

The bulk of the comments below pertain to comment number two and are just asking for glosses on pieces that appear to be assumed knowledge in the text. Nothing especially substantive. I tended to err on the side of glossing a bit more than the tutorial was originally, though obviously it's your call how much you would like to explain things if you're expecting an intermediate user. For me, I suppose it's a question of "intermediate in what skills relevant to the lesson?" since we're talking about more than just R. I can imagine people with varying skill levels in R, programming more generally, and text analysis being interested in the lesson for different reasons. I tended to assume that you wouldn't need to explain the particular syntax of R snippets, but that it could still be a good idea to give really short glosses of other terms or concepts as you go for people with other backgrounds. I made notes as such below, though take or leave them as you see fit.

Notes

1) I think having a few notes or a link about where to go to install R would be good. You don't need to explain the R syntax, since you make it clear they can get that elsewhere. But giving the installation link would be helpful for an ambitious person trying the lesson out as a first attempt at R.

2) I know you’re expecting a bit more advanced reader, someone familiar with R. But I think you could more consistently gloss some of the statistical methods/terms that you mention. I wouldn’t worry about getting lost in the weeds, but I think it might help a novice reader just to give the one sentence, hand-wavey definition of things. I wouldn’t worry about glossing R syntax, and I also think it’s fine to leave out the glosses during the introduction, since you’ll get them later.

3) In line 4 I would give the command to start R on your computer. That way even if people with no background can follow along and then learn the syntax somewhere else. Mostly thinking about myself here since I am more familiar with other programming languages and was learning R through the lesson.

4) In line 5, I got the prompt:

Do you want to install from sources the package which needs compilation?
y/n:

I think the response to that is implied by the text, but I might explicitly say that you’ll be asked to confirm the installations.

5) During line 5 I got this warning while installing tidyverses:

The downloaded source packages are in
    ‘/private/var/folders/jl/l0cs5mls1l1587v859x370r00000gq/T/RtmpRJfKau/downloaded_packages’
Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
  unable to load shared object '/Library/Frameworks/R.framework/Resources/modules//R_X11.so':
  dlopen(/Library/Frameworks/R.framework/Resources/modules//R_X11.so, 6): Library not loaded: /opt/X11/lib/libSM.6.dylib
  Referenced from: /Library/Frameworks/R.framework/Resources/modules//R_X11.so
  Reason: image not found

Looks like it’s not a big deal, since I was able to run through the lesson no problem. But might be worth mentioning it or (more likely) pointing to a place (like stack overflow) where people can troubleshoot installation errors.

Similarly I got a warning when loading tidyverse:

Warning message:
package ‘ggplot2’ was built under R version 3.3.2

Might be good to tell people to ignore warnings, or distinguish between them and errors.

6) line 8 might be a good moment to just say something like "…will print out the paragraph of text verbatim because 'text' now stores the document inside it." or something like that.

7) line 9 - I would say something about the output of words: [[1]] [1] "now" "i" "understand" "that" "because" [6] "it's" "an" "election" "season" "expectations" [11] "for" "what" "we" "will" "achieve" [16] "this" "year" "are" "low" "but"

Etc. That's different than how other languages would format the output of something like this, so it might be useful just to say something quick like "you'll get the tokenized output of the text along with the positions of the tokens."

And in general, the tutorial is a bit inconsistent about dealing with console output. For the most part, it doesn't actually display the output beyond the graphs later in the lesson. But I might think about adding in the expected output more often/more consistently. After all, the text of the lesson does reference the console output a fair bit. Not especially necessary, but it could be helpful hand-holding.

7) I think line 12 is a good model for the balance of hand-waving and hand-holding that I've been suggesting elsewhere.

8) data frame in line 14 might be helpfully glossed as "a type of data structure" or something like that. I don't think you need that for 'list' because that's pretty intuitive.

9) Line 16 - what is a tibble?

10) I think I would give the rearranged output as well. Particularly because (slightly out of the scope perhaps), it seems like an opportunity to namecheck the concept of stopwords.

11) line 19 - If you are a little more verbose on the output for number 7 above, I think you don't need to worry about parsing the output again here. So it's fine as is.

12) I would gloss character vector.

13) Probably an obvious point, but maybe want to change the URL for the base_url at line 24 to the production resting place of the text when you switch this off of Programming Historian's staging site. I would also just give a comment-level gloss of what the second and third sentences are doing for those following lines of code.

14) line 27 could also be an opportunity to namecheck stopwords again.

15) I think lines 27-8 could use a few more sentences of explication - I got a little lost when using the Web Trillion Words Corpus. I think a lot of it might be helped by expanding the phrase "these frequencies". These frequencies are the most common frequencies in sites indexed in google, right? So you're using that corpus to generate a list of high-frequency words, then seeing the frequency hits for the tokens of our own corpus? I would just expand on how you're using that WTWC and how it relates to the own example text analysis. Since it seems that you're definitely heading in the direction of filtering stopwords (after reading a little further), I think introducing the concept earlier as well as why you would want to filter them out will help clarify what the section is doing.

16) After "along with the option n set to 15 is used" I would add something like… "to print out more than the default 10 values." For the beginners in the room like me.

17) line 37 - I sometimes get an error saying R can't open the connection:

Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
  unable to connect to 'programminghistorian.github.io' on port 80.

But when I am on a different network the files are able to load fine, so might not be something to worry about.

18) line 46 - "complex dimensionality reduction algorithms" - might be worth giving another phrase to gloss briefly what this means, though you get to it a bit later.

19) line 50 - This paragraph was a bit hard for me to follow. I think it might help to explain what you mean by "summarize all of the lengths within a document." What are different ways to summarize lengths?

20) In line 60 - "applying dimensionality reduction techniques to plot stylistic tendencies over time or across multiple authors." This is the sort of gloss that I was suggesting in point 18 above, so it might be better deployed the first time you mention the concept.

Typos:

line 25 - 'their' instead of 'there'
line 31 - "it is use a lot"
line 42 - "It seems that for the most part addresses steadily from 1790 to around 1850,…" verb missing here? Maybe 'decreasing'
line 50 - "effected by a parsing errors"
line 53 - "in terms of the sentence length style." potentially a typo, but maybe not.
line 46 - "makes extensive used of computational methods describe the style of an author’s writing." verb agreement here / a word missing.
line 47 - " In this case the result is a list for 236 item in it." Maybe 'with 236 items'?
line 58 - "The results yields one row"

That's it for me, but I'm happy to discuss any of these further! Really great work.

jerielizabeth commented 7 years ago

Thank you @bmw9t and @shawngraham!

@statsmaths and @nolauren, once we have the second review, I'll provide some summary comments and give guidance on how to proceed with revisions. You are welcome to respond to the feedback here on GitHub as it arrives as well, though please save updating the tutorial until after the final review is in place.

histlib commented 7 years ago

In general, I found this tutorial to be useful (and helped me a bit since I am new to tidyverse). I did want to see more explanation of all of the code, especially since that would help users of the tutorial have a better sense of how they might apply this in different contexts.

1: "raw, digitized text." "involves only using a tokenizer" - implies that all the tutorial will cover is tokenization 3: thinking about appending clause to "covers all of the assumed knowledge" so that what's assumed is clear (= installing/starting R, packages, importing data, executing programs ...). 4: cite the two packages so that there are links to CRAN perhaps add screenshot and a bit more handholding:

does the tutorial depend on a certain version of R?
installation of tidyverse would probably (for newer users) trigger a lot of installs of other packages, so mention that this will happen (and maybe why? or what these packages are - which gets into what tidyverse is, perhaps, but it's possible to gloss it, e.g., "tidyverse is a library that depends on other libraries that help with importing, working with, and visualizing data" or something similar) while installing packages was done in the R Basics tutorial, it was done in passing and not very thoroughly; I'm on the fence about spending time on install - I do think something more about tidyverse & its dependencies is worth covering - but won't argue too hard for lots of additional words on package installation if you don't think it necessary or if it just gets in the way 5: library(tidyverse) loads other things, which will be less surprising if dependencies are explained in 4, but also generates a "Conflicts with tidy packages" for dplyr, stats; both libraries generate warning messages about being built under R version 3.3.2; might want to note these and that they can be ignored 12: why does words[[1]] work to get number of tokens within words? [[1]] not really a symbol - what is it? 14: relates to above - why is words[[1]] the argument for the function rather than words? should explain more what is going on in these lines of code, especially the second line 15: A tibble? & ? Put in a wee bit more explanation 16: more specifically unpack what arguments for arrange (e.g., The arrange function asks for the data set to be worked on, in our case it's tab, and then you add in the desc function to work on just the count column because you want to sort by count and not by word) 17: re: "We" vs "we" - this is probably fine, but I've always had to explain this to students and so am biased here; do you think you could briefly explain why computers think the two words are different? 19: again, the [[1]] problem (though if explained in 12, can simply gesture to that here) 22: which library does sapply belong to? (I assume it's in tidyverse, but I could be wrong) clarify what sapply is doing: The sapply function applies the function in its second argument to every element of the variable in its first argument 24: "load the data in from an online file" What's going on in this code? There's more to it than readLines and paste 25: "words in total there are" 26: The second sentence here is a little awkward, maybe "Now [or "Next"?], you can combine the table, data-frame, and arrange functions exactly as done above to output the most frequently used words in the entire speech." unpack the code a bit (either as part of para 26 or at the beginning in para 27) 27: "Highly represented"/"relatively rare" coupling muddies the statistical emphasis here; "words that are proportionally more common in this text than in a large external corpus of English"? briefly explain what the Google corpus is (it becomes clear in para 28, but would be good to lead with this info) explain what's going on in the code here (tying it back to para 24 code) 28: "that were equal to the given word"? 29: why "may be" used? are there other options? More direct would be: "To combine the web corpus word frequencies with the State of the Union speech data in tab, use the inner_join function that is part of the tidyverse library" (or dplyr library or ignore this idea; I think it is useful to know which library has which functions - so that if I'm expanding this knowledge to other problems, I'll know which libraries to select - but I could see feeling otherwise) 30: Try this as a positive (filter for) rather than a negative (remove); I think it makes it more clear, but you might feel differently: "we can filter for words that have a frequency below 0.1% (that is, occurring in the web corpus fewer than once every 1000 words)" and then make explicit why you want to do this (and why you don't compare to SOTU frequency, because that's a fairly natural assumption a reader might make) 31: "and we might speculate that it is used a lot in speeches by politicians, but..." 32: Maybe "seem to suggest some of the key themes of the speech..." 33: "Let us read that into R now:" 34: "given as a speech or as a written address" say something about what's going on in this code, need to explain the use of "$" (as it wasn't covered in the Dewar tutorial) 36: I wouldn't call it a summary - gives you an idea about topics, maybe? Might lead to keywords (in addition to, or instead of, an abstract). Also, no comments on "laughter"? 37: you've introduced sapply earlier - why can't you use it do this instead of doing the for loop? what does "these" in the final sentence refer to? Unpack the code a bit It's possible that people will be typing in code line-by-line, which presents some problems for a novice when they get to the for loop 39: Unpack the code, especially setting out what the arguments are for the qplot function 42: "Setting the color of the points to denote whether a speech was written or delivered orally explains a large part of the variation. The code for such a plot requires only a small tweak to the previous qplot code: we add in the color argument and base it on the sotu_type column in our metadata." 45: "The outliers we noticed were all post-World War II written addresses." 46: "...computational methods to describe ..." "formal" twice in consecutive sentences; maybe eliminate the first use? there are phrases in here that would not be understood by most of your readers (such as "complex dimensionality reduction algorithms") 47: Remove initial clause and just start with "The corpus can be split ..."? Or "The corpus can be split into sentences using the tokenize_sentences function that we've used before."? "...is a list with 236 items in it ..." 48: All of this needs more explanation: sapply vs. for (or refer back to para 37 if you explained it there), why you can't tokenize_words(sentences) but you can sapply it? 49: The first sentence is confusing For those with short attention spans, maybe refer back to why you need sentence length output Explain the code (and pay attention to the possibility that people are typing in line-by-line) 50: The median function sentence will be confusing for novices 51: y-axis label error 53: latter-half? Penultimate sentence: "This" refers to ? Last sentence: which pattern? (I think this grammatical problem is related to the use of "this" in previous sentence) What does geom_smooth do? y-axis label error Explain what + does in code Maybe use this as an opportunity to bring up using the up arrow to recover previous commands 55: I'd put this explanation in 53 when smoothing comes up 57: First sentence implies the Joining, by = "word" appears because of the inner_join function, but isn't entirely clear on this point. "...as we wait for it to finish" "...see the output of our script by typing description ..." Explain what cat function does 58: "The result is one row for each State of the Union address." 59: Not a summary - "...serve as useful keywords for each presidency." "flipped in favor of the Republicans" Consider changing final "turn" in Clinton sentence: "...and his attention to Medicare reform at the end of his term." (or something like that) 60: "...analyzed with the R programming language. There are several directions one can pursue to dive further into text analysis." Bullet 1: What is "a full NLP annotation pipeline?" Bullet 2: Refer to Mallet tutorial? Bullet 3: What is "dimensionality reduction?" These interesting examples aren't clear why they are interesting 61: "In the near future, we hope to offer additional Programming Historian tutorials on these advanced text analysis approaches."

jerielizabeth commented 7 years ago

Thank you, @histlib, for your comments and feedback on the lesson! There is a lot here between the two reviews, so I will take a little bit to go through it and offer my own recommendations to the authors. I will post my comments here within a week.

Thank you again to both @bmw9t and @histlib for the reviews! While additional comments are always welcome, the review phase for this lesson is now closed.

jerielizabeth commented 7 years ago

Thank you again, @histlib and @bmw9t for these excellent reviews! and thank you @shawngraham for the suggestion of additional information on how to use a new corpus for analysis.

@nolauren and @statsmaths, it seems that the overall consensus in the reviews is that some more explanatory text, both with regards to R syntax and with regards to statistical concepts, would help people using this lesson better understand not only how to use these libraries, but how those libraries are working. We should probably decide what difficulty to set this lesson at. I think it makes a good intermediate lesson, but since we have very little R-focused materials on the site, I agree with the reviewers that additional explanation would be useful. Screenshots of outputs so that users know that they are on the right track would be a useful addition throughout the lesson. I think it would also be useful to state that this lesson assumes that the reader is working in the R Console (and include a version range that this lesson should work with). I also think the opening sections would benefit from some explanation about the packages — what they do and what they include — especially to help people think critically about these functions and to avoid a reliance on “magic”.

To Shawn’s point that it would be helpful to illustrate how one could use a different dataset, would it be possible to provide the text files for download as part of the lesson and discuss any formatting requirements that might need to be done to do the same process on a different set of texts? I am in favor of having people pull in from the server in the lesson to avoid inevitable problems of people renaming or saving the files to an unexpected location. Perhaps as an optional “if you are interested in examining the files, you can download them here” and then provide a code example of how to load in from a local directory down in paragraph 24. That would also address Brandon’s trouble accessing the GitHub servers while on a campus network. And yes, I’ll make a note that we need to update the urls when we move the lesson over to the main site.

As you work through the rest of the feedback from the reviewers, I will leave this space open for you to ask for clarification from myself or from them about suggested edits or particular areas for additional explanatory text.

Our recommended timeframe for revisions is 4 week, which would put us at March 1. Let me know if you need more time than that.

Thank you again for this lesson, I think it will make an excellent addition to the Programming Historian. And thank you again to the reviewers for the thorough and helpful feedback!

statsmaths commented 7 years ago

Thank you @bmw9t, @histlib, @shawngraham, and @jerielizabeth for all of the very helpful comments and feedback. We have pushed several commits that address the various points made by both reviewers. Details are below:

@bmw9t:

Edits based on your feedback are mostly posted in commit 30be280091d1e65ec7ace139ed8a0041c140ea2d. We added many more output blocks of code in the separate commit a8c4911513d5ea82c0d54fb9f0502fe65b49a21d.
Point 1: a link to R and RStudio have been added into the lesson.
Point 2: We have incorporated short details in the places where you recommended. Your help with this is greatly appreciated because we did not want to give a full re-introduction to R, but are not great at realizing what parts need a bit of explanation. This applies to points 6-12, 14-16, 18-20.
For points 3-5: The reason you are getting these errors is that you are calling R from the terminal. This is fine, and actually how we usually do it, but the vast majority of inexperienced users will be calling R from an application (either the R GUI or RStudio). We have added a note about this and a link to RStudio in the introduction of the lesson.
Points 13 and 17 both mention the location and accessibility of the data. We will of course change the URL when pushing to the live site.

@histlib:

Edits based on your feedback are posted in commit 66e7fd21f1685df3a96352f907157f8f648c89d6. When points are not mentioned explicitly here, they generally pertained to typos or additions that have now been addressed.
Line 2: tokenziation is the only NLP parser used in the entire lesson, so I think this is okay.
Line 4: We've added citations, but inline links to CRAN are probably more confusing than helpful. Interested users can find these pages easily, but I worry that a link suggests users need to go to the page to download the packages, which they do not. I image many people navigating to the website, downloading the source code, and then getting stuck.
Line 12: [[1]] is the notation for accessing the element of a list. There is a description of what it does in this specific case in the lesson, but a more general discussion seems out of scope.
Line 14: I think this is already clear from the text: "we first apply the table function to the words in the first (and here, only) document".
Line 15: already adjusted in the previous commit above
Line 22: sapply is just in base R. The current description of "The sapply function applies its second argument to every element of its first argument" seems at the same level as the other functions that are introduced.
Line 37: It could be possible to use sapply here, but eventually we need for loops and it makes more sense to use them here first in a simple example
Line 39: It should be clear what our inputs for qplot are (both have been seen previously). Discussing qplot in general is far out of scope for the piece; qplot is a complex function and it's inputs can be mean many different things.
Line 48: Re-reading this part, it seems clear what is actually going on in the code and how sapply and the for loop are being use together. It is certainly not clear why you might use one versus the other in a general context, but that is really out of scope of the tutorial.
Lines 53, 57 (and sections of a few others): It should be clear what functions such as the geom_smooth and cat are doing in these particular examples. We admit that it is not giving a complete picture of what the functions do in general or why we choose to use these particular functions for each purpose. These latter two things are really out of the scope of this short intermediate-level tutorial.
Line 60: It is fairly hard to show why they are interesting without showing the examples themselves.

@shawngraham:

A new section titled "Alternative Method for Loading the Corpus (Optional)", has been added showing how to read in the corpus from a local file. This technique can be trivially adapted to load your own corpus. A zip file of all the texts has been added as well. See commit 595dae4c32bafa38c6904f83d8b426c52c55ee9b.

@jerielizabeth:

I think all of the other things we discussed are addressed in the relevant parts above.

jerielizabeth commented 7 years ago

Hi all!

Thank you -- the edits look great and I think address the reviewers' concerns within the constraints of keeping the tutorial narrowly focused and accessible.

I've done some copy editing in preparation for publication and have submitted those to you as a pull request. Let me know if you run into any trouble with that format, and I can send them another way. There were two additional places where I thought the console log would be useful, and one place (para 43) where I think a little more information about how the metadata lines and the text files are related would be useful for the beginning user.

Once you are happy with the edits, I'll give it one more read through to catch any straggling typos and then I think we'll be ready to publish!

statsmaths commented 7 years ago

@jerielizabeth Reviewed and accepted your pull request. Looks great on our end to go forward. Please flag if anything else in needed on our end.

jerielizabeth commented 7 years ago

Excellent! Thank you @statsmaths!

I'm getting started on the final edits and prepping to move this lesson over at the beginning of next week (for maximum eyes.) Any thoughts on this for a cover image: https://flic.kr/p/odcdqu

jerielizabeth commented 7 years ago

Also, @statsmaths and @nolauren -- can you send me a short bio for each of you, either via email or in here. One sentence if possible, two tops. Thanks!

jerielizabeth commented 7 years ago

Hi all!

The lesson is now live on the site at http://programminghistorian.org/lessons/basic-text-processing-in-r. If anyone sees any errors, let me know and I'll fix them!

Once we're ready to promote the lesson, I'll close this issue. That will be the signal to tweet widely!

Thank you all again for your work on this! Thanks to you, we have another excellent addition to the offerings on the Programming Historian.

ianmilligan1 commented 7 years ago

🎉

walshbr commented 7 years ago

All looks good to me! Looking forward to promoting and congratulations!

jerielizabeth commented 7 years ago

I concur! Basic Text Processing in R is official published!

programminghistorian / ph-submissions

Review Ticket for 'Basic Text Processing in R' #34

Anti-Harassment Policy