Trying to vectorize this took 57s in my test run. I'm not sure if it would have taken even longer with a longer time out. Either way this is far too long for a single string and made my Batch request time out.
I think we need to:
Limit the maximum length of an expected compound word
Limit the amount of recursive splits (possibly we'll do this already by doing 1)
The following is a post that's part of the 20newsgroup dataset:
Trying to vectorize this took 57s in my test run. I'm not sure if it would have taken even longer with a longer time out. Either way this is far too long for a single string and made my Batch request time out.
I think we need to: