Closed stefan-mueller closed 7 years ago
That's an interesting edge case.
If you load it without the docvarsfrom, you can always parse the docnames manually:
do.call(rbind, strsplit(docnames(text_example), "."))
or something like that
Thanks for the reply and solution. This works – but only if we use "[.]"
instead of "."
.
Working example:
text_example <- readtext(file = "var1_var2.var3.var4.txt")
docvars_text <- do.call(rbind, strsplit(docnames(text_example), "[.]"))
corpus_example <- corpus(text_example)
docvars(corpus_example) <- docvars_text
I think do.call is not necessary, as long as you specify dvsep as a character range or an escaped character:
> readtext::readtext('/tmp/var1_var2.var3.var4.txt', docvarsfrom='filenames', dvsep='\\.')
text docvar1 docvar2 docvar3
var1_var2.var3.var4.txt var1_var2 var3 var4
> readtext::readtext('/tmp/var1_var2.var3.var4.txt', docvarsfrom='filenames', dvsep='[.]')
text docvar1 docvar2 docvar3
var1_var2.var3.var4.txt var1_var2 var3 var4
I am aware that one should not include "." in filenames. However, I downloaded a large amount of txt files which have names such as xxxx.yyy.zzz.txt where each part (xxxx; zzz etc) contains information that should become a docvar in the corpus. I tried to use the following code to create doctors from the filename, but simply using
dvsep = "."
does not create the docvars.text_example <- readtext(file = "xxx.yyy.zzz.txt", docvarsrom = "filenames", dvsep = ".")
Which regular expression do I need to insert so that the information in the file names are used as docvars? If we have a solution, I can amend the vignette and/or the manual, and describe this special case.