Closed matheuscas closed 10 years ago
This looks good! Thanks for contributing.
One last thing: the Travis build is failing because not all the corpora are being downloaded. You can add the following lines to .travis.yml
to get all the necessary corpora:
before_install:
- "wget https://s3.amazonaws.com/textblob/nltk_data.tar.gz"
- "tar -xzvf nltk_data.tar.gz -C ~"
Ok, then. I'll put this on .travis.yml
and I'll try again.
Yes, sure. But would you mind to enlighten me the reasons? Just for learning purposes. :)
Is not working. Two tests are failing when I use what you suggested. See it:
======================================================================
FAIL: test_tag (tests.test_taggers.TestPerceptronTagger)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/matheuscas/Development/python/textblob-aptagger/tests/test_taggers.py", line 44, in test_tag
'better', 'than', 'complicated', '.'])
AssertionError: Lists differ: [] != [u'Simple', u'is', u'better', ...
Second list contains 12 additional elements.
First extra element 0:
Simple
- []
+ [u'Simple',
+ u'is',
+ u'better',
+ u'than',
+ u'complex',
+ u'.',
+ u'Complex',
+ u'is',
+ u'better',
+ u'than',
+ u'complicated',
+ u'.']
======================================================================
FAIL: test_tag_textblob (tests.test_taggers.TestPerceptronTagger)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/matheuscas/Development/python/textblob-aptagger/tests/test_taggers.py", line 53, in test_tag_textblob
'better', 'than', 'complicated'])
AssertionError: Lists differ: [] != [u'Simple', u'is', u'better', ...
Second list contains 10 additional elements.
First extra element 0:
Simple
- []
+ [u'Simple',
+ u'is',
+ u'better',
+ u'than',
+ u'complex',
+ u'Complex',
+ u'is',
+ u'better',
+ u'than',
+ u'complicated']
----------------------------------------------------------------------
Ran 5 tests in 5.480s
FAILED (failures=2)
Oh I see. word_tokenize
and sent_tokenize
both return a generator rather than a list, which is more memory-efficient than keeping all tokens in memory. However, the list comp on line 50 exhausts the generator of words, which is why the for loop
on line 51 does not make any iterations.
I think what you have is fine. Thanks for the contribution.
You're welcome. Keep doing the good work.
This pull request it is supposed to close issues 3 that I opened. I've updated imports, code, tests, changelog and minimum requirements (TextBlob 0.9).