cnlp_annotate confusing strings with UTF-8 BOM

I'm having a very strange issue with cleanNLP (v2.0.3), using R 3.4.4, in Rstudio 1.1.456, on Windows.

I've loaded a custom spaCy NER model set up to detect a new NER category. The model works fine with spaCy in python, and also works fine in R for the vast majority of the text strings I'm testing it on. However, on strings that contain the word "first", the cnlp_annotate() command in R fails, and I get the following error:

Error in py_call_impl(callable, dots$args, dots$keywords) : UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position 18: character maps to .

Google told me that '\ufeff' is a UTF-8 BOM, so I tried iconv(), and various conversion tools in stringr and stringi to see if I could detect or filter out the offending characters. In searching for a common feature I eventually discovered that all of the failing strings contained the word "first". Furthermore, even strings manually typed (no read-in or copy-paste) into Rstudio failed to annotate with the same error if and only if they contain the word "first".
Manually typed string, "The first." fails to annotate with the '\ufeff' error.
Manually typed string, "The firs." works just fine. Manually typed string, "The irst." works just fine.

Needless to say, I am baffled by this. The issue does not occur with spaCy's own language model (en_core_web_sm-2.1.0), only with my custom model. This would lead me to believe it was an issue with the model, except that I can't reproduce the error when running the same model using spaCy directly in python, only with cnlp_annotate() in R.

Any thoughts or advice would be appreciated. I can always work around the issue by working in python (or, I suppose, by temporarily replacing all instances of "first" with "ferst" (which also causes no errors)), but this was so odd to me that I had to at least ask about it. Thanks.

P.S. I'm not at liberty to share the specific spaCy model that is failing, but if it helps, I used code pretty much identical to the spaCy code here: https://github.com/explosion/spaCy/blob/master/examples/training/train_new_entity_type.py (run to train a blank NER model, not to update the existing model, and of course with different training data).

statsmaths / cleanNLP

cnlp_annotate confusing strings with UTF-8 BOM #50