Closed MarianoRico closed 3 years ago
Yes I'm seeing this too, in Ubuntu 16.10 LTS:
> library("quanteda")
Package version: 3.0.0
Unicode version: 7.0
ICU version: 55.1
Parallel computing: 2 of 2 threads used.
See https://quanteda.io for tutorials and examples.
> library("quanteda.textstats")
> textstat_summary(data_corpus_inaugural[10:15])
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'as.list': error in evaluating the argument 'x' in selecting a method for function 'which': Illegal argument. (U_ILLEGAL_ARGUMENT_ERROR, context=`^\p{emoji_presentation}+$`)
The problem is that the Unicode library is really old on 16.04. On macOS 10.15.7, for instance, it's:
Loading required package: quanteda
Package version: 3.0.0
Unicode version: 10.0
ICU version: 61.1
I'd suggest you update to Ubuntu 20.04. But we can also issue a patch to not look for this emoji pattern for older versions of Unicode / ICU, once I find out when they were introduced.
Dear Kenneth,
if the problem are the emoticons, what about an argument to select which stats are returned?. I propose to use an argument skip
. By default skip=NULL
returns all, but we could specify something like skip=c(emoticons, exclamations)
to avoid stats about those types.
Should be fixed in the master now, and we will update CRAN today too. We did not implement the skipping of fields, since we want the return to have the same shape. But on systems with older ICU versions, it now returns NA for the emoji counts.
just try this:
I get this error:
I have used CRAN version 3.0.0 as well as the latest development version. Same error in both.