quanteda / readtext

an R package for reading text files
https://readtext.quanteda.io
120 stars 28 forks source link

Add initial version of readtext vignette #85

Closed stefan-mueller closed 7 years ago

stefan-mueller commented 7 years ago

This adds a folder vignettes which contains a readtext vignette. It is based both on ?readtext, README.Rmd and own amendments. Please have a look whether there are better stringi solutions to remove page numbers based on a regular expression. Excluding page numbers is a common question, so I came up with two typical examples.

stefan-mueller commented 7 years ago

Just read that the antiword package does not seem to support .docx (yet). We might add this as a comment. Would be perfect if @kbenoit checks whether my information are correct. If you would like me to add additional sections or subsections, please let me know.

kbenoit commented 7 years ago

antiword doesn't but our package does. .docx is basically XML and we are able to import it that way.