grantjenks / python-wordsegment

English word segmentation, written in pure-Python, and based on a trillion-word corpus.
http://www.grantjenks.com/docs/wordsegment/
Other
365 stars 49 forks source link

RecursionError on segment call #33

Open irmo322 opened 3 years ago

irmo322 commented 3 years ago

Hi,

I'm having trouble with following code:

import wordsegment

wordsegment.load()
text = "The article went on to say, “For in the pizza shops rich and poor harmoniously congregate; they are the only places where the members of Neapolitan aristocracy—far haughtier than those of any other part of Italy—may be seen (eating) their favorite delicacy side by side with their own coachmen and valets and barbers.”"
wordsegment.segment(text)

It fails with a RecursionError. RecursionError: maximum recursion depth exceeded while calling a Python object

I'm using python 3.8 on ubuntu 20.04.

grantjenks commented 3 years ago

Can you share more of the trace back?

irmo322 commented 3 years ago

Trace back error : error.txt

grantjenks commented 3 years ago

Works for me:

$ ipython
Python 3.9.1 (v3.9.1:1e5d33e9b9, Dec  7 2020, 12:10:52) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.24.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import wordsegment

In [2]: wordsegment.load()

In [3]: text = "The article went on to say, “For in the pizza shops rich and poor harmoniously congregate; they are the only places where 
   ...: the members of Neapolitan aristocracy—far haughtier than those of any other part of Italy—may be seen (eating) their favorite deli
   ...: cacy side by side with their own coachmen and valets and barbers.”"
   ...: 

In [4]: wordsegment.segment(text)
Out[4]: 
['the',
 'article',
 'went',
 'on',
 'to',
 'say',
 'for',
 'in',
 'the',
 'pizza',
 'shops',
 'rich',
 'and',
 'poor',
 'harmoniously',
 'congregate',
 'they',
 'are',
 'the',
 'only',
 'places',
 'where',
 'the',
 'members',
 'of',
 'neapolitan',
 'aristocracy',
 'far',
 'haugh',
 'tier',
 'than',
 'those',
 'of',
 'any',
 'other',
 'part',
 'of',
 'italy',
 'may',
 'be',
 'seen',
 'eating',
 'their',
 'favorite',
 'delicacy',
 'side',
 'by',
 'side',
 'with',
 'their',
 'own',
 'coachmen',
 'and',
 'valets',
 'and',
 'barbers']
grantjenks commented 3 years ago

Weird, it doesn't work for me in Python 3.8.

grantjenks commented 3 years ago

See the PR. If I set the CHUNK_SIZE to 200 then it works for me in 3.8

irmo322 commented 3 years ago

Thank you for the support :)