I'm using cleanNLP with the spaCy backend to process a set of about 13k documents. Most of the documents are short, but some are quite long. I received this error:
Error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: [E088] Text of length 1142787 exceeds maximum of 1000000. The v2.x parser and NER models require roughly 1GB of temporary memory per 100,000 characters in the input. This means long texts may cause memory allocation errors. If you're not using the parser or NER, it's probably safe to increase the `nlp.max_length` limit. The limit is in number of characters, so you can check whether your inputs are too long by checking `len(text)`.
It looks like I need to nudge the memory limits up a bit to accommodate the very long texts. But I don't see any way to do that — the cleanNLP and reticulate functions I'm using don't seem to take arguments that get passed on to the backend. Any suggestions would be appreciated.
I'm using
cleanNLP
with the spaCy backend to process a set of about 13k documents. Most of the documents are short, but some are quite long. I received this error:It looks like I need to nudge the memory limits up a bit to accommodate the very long texts. But I don't see any way to do that — the
cleanNLP
andreticulate
functions I'm using don't seem to take arguments that get passed on to the backend. Any suggestions would be appreciated.