@jeroen the striprtf package seems to handle encoding issues more robustly than unrtf so I went with that.
library("readtext")
library("quanteda")
## Package version: 1.4.3
readtext("https://jeroen.github.io/files/sample.rtf") %>%
texts() %>%
cat()
## It is an example test rtf-file to RTF2XML bean for testing
##
## Font size 10, plain text;
## Font size 12, bold text. Underline,bold text.
## Underline,italic,bold text.
## Font size 22, plain text.
## Bold text.
## Italic text.
##
## Simple table :
##
##
## *| 1st column | 2nd column | 3rd column | 4th column | 5th column |
## *| 1.1 item | 1.2 item | 1.3 item | 1.4 item | 1.5 item |
## *| 2.1 item | 2.2 item | 2.3 item | 2.4 item | 2.5 item |
## *| 3.1 item | 3.2 item | 3.3 item | 3.4 item | 3.5 item |
## *| 4.1 item | 4.2 item | 4.3 item | 4.4 item | 4.5 item |
## *| 5.1 item | 5.2 item | 5.3 item | 5.4 item | 5.5 item |
## *| Empty |
## *| …
## *| |
## *| …
## *| |
## *| …
## *| | Empty |
## *| Last items |
## *| …
## *| |
## *| …
## *| |
## *| …
## *| | Last items |
##
##
## List :
##
## It is the 1st row of the list
## It is the 2nd row of the list
## …
##
## …
##
## …
##
## It is the last row of the list
##
## Here is a brief Courier text.
## Here is a brief MS Sans - Serif text.
## Here is a brief MS Serif text.
## Here is a brief Times New Roman text.
##
##
##
## Some paragraphs :
##
## Align left :
##
## The text you are reading is aligned left. It is an align – left text. It is also an align – left sentence.
##
## Align right:
##
## The text you are reading is aligned right. It is an align – right text. It is also an align – right sentence.
##
## Align centered:
##
## The text you are reading is aligned center. It is an align – centered text. It is also an align – centered sentence.
##
## Align justified:
##
## The text you are reading is aligned justify. It is an align – justified text. It is also an align – justified sentence.
##
## Here are some special characters:
## ö
## t
## á
## rv
## í
## zt
## û
## r
## õ
##
## ü
## tvef
## ú
## r
## ó
## g
## é
## p, which means “five flood resistant hammer drills” () in Hungarian.
##
## At last you can see an image :
Add support for rtf files. Solves #90.
@jeroen the striprtf package seems to handle encoding issues more robustly than unrtf so I went with that.