quanteda / readtext

an R package for reading text files
https://readtext.quanteda.io
120 stars 28 forks source link

Fix handling of archive files with upper-case extensions (e.g. .DOCX) #165

Closed pmyteh closed 4 years ago

pmyteh commented 4 years ago

In identifying the correct filetype to read, readtext() uses tolower() on the file extension. But extract_archive() doesn't. So if you try to read a file with the extension ".DOCX" it is correctly sent to extract_archive(), but that fails with "Archive extension DOCX unrecognised." and the file cannot be read. This trivial PR fixes that.

kbenoit commented 4 years ago

Thanks @pmyteh!