japerk / nltk3-cookbook

Code for NLTK3 Cookbook
142 stars 98 forks source link

Error encountered with Python 3.4 and Training a Brill Tagger #1

Closed nthrodeo closed 10 years ago

nthrodeo commented 10 years ago

Hi. I am getting an error with python 3.4.1 when trying to complete the example from Chapter 4 called Training a Brill Tagger. Could below be related to the book's use of version 3.3.5? Thanks in advance for any assistance.

I did not submit this via Pakt's Errata form because maybe it's just be a python version issue and on the errata form at https://www.packtpub.com/books/content/errata, only the NLTK 2.0 book seems to be available from its dropdown:

$ uname -a
Darwin MacBook-Pro.local 13.4.0 Darwin Kernel Version 13.4.0: Sun Aug 17 19:50:11 PDT 2014; root:xnu-2422.115.4~1/RELEASE_X86_64 x86_64
$ git pull origin master
From https://github.com/japerk/nltk3-cookbook
 * branch            master     -> FETCH_HEAD
Already up-to-date.
$ python3
Python 3.4.1 (default, Aug 24 2014, 21:32:40)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.__version__
'3.0.0'
>>> import enchant
>>> enchant.__version__
'1.6.6'
>>> import numpy
>>> numpy.__version__
'1.9.0'
>>> import scipy
>>> scipy.__version__
'0.14.0'
>>> import sklearn
>>> sklearn.__version__
'0.15.2'
>>> import execnet
>>> execnet.__version__
'1.2.0'
>>> import pymongo
>>> pymongo.version
'2.7.2'
>>> import redis
>>> redis.__version__
'2.10.3'
>>> from lxml import etree
>>> etree.LXML_VERSION
(3, 4, 0, 0)
>>> import bs4
>>> bs4.__version__
'4.3.2'
>>> import dateutil
>>> dateutil.__version__
'2.2'
>>> import charade
>>> charade.__version__
'1.0.3'
>>> from nltk.corpus import treebank
>>> from nltk.tag import DefaultTagger, UnigramTagger, BigramTagger, TrigramTagger
>>> from tag_util import backoff_tagger, train_brill_tagger
>>> test_sents  = treebank.tagged_sents()[3000:]
>>> train_sents = treebank.tagged_sents()[:3000]
>>> default_tagger = DefaultTagger('NN')
>>> initial_tagger = backoff_tagger(train_sents, [UnigramTagger, BigramTagger, TrigramTagger], backoff=default_tagger)
>>> initial_tagger.evaluate(test_sents)
0.8808115691776387
>>> brill_tagger = train_brill_tagger(initial_tagger, train_sents)
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/site-packages/nltk/tbl/rule.py", line 191, in __hash__
    return self.__hash
AttributeError: 'Rule' object has no attribute '_Rule__hash'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/site-packages/nltk/tbl/rule.py", line 200, in __repr__
    return self.__repr
AttributeError: 'Rule' object has no attribute '_Rule__repr'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/petethomas/git-projects/nltk3-cookbook/tag_util.py", line 48, in train_brill_tagger
    return trainer.train(train_sents, **kwargs)
  File "/usr/local/lib/python3.4/site-packages/nltk/tag/brill_trainer.py", line 288, in train
    self._init_mappings(test_sents, train_sents)
  File "/usr/local/lib/python3.4/site-packages/nltk/tag/brill_trainer.py", line 359, in _init_mappings
    train_sents)
  File "/usr/local/lib/python3.4/site-packages/nltk/tag/brill_trainer.py", line 387, in _update_rule_applies
    if pos in self._positions_by_rule[rule]:
  File "/usr/local/lib/python3.4/site-packages/nltk/tbl/rule.py", line 193, in __hash__
    self.__hash = hash(repr(self))
  File "/usr/local/lib/python3.4/site-packages/nltk/tbl/rule.py", line 210, in __repr__
    ", ".join("({0:s},{1:s})".format(f,unicode_repr(v)) for (f,v) in self._conditions)))
  File "/usr/local/lib/python3.4/site-packages/nltk/tbl/rule.py", line 210, in <genexpr>
    ", ".join("({0:s},{1:s})".format(f,unicode_repr(v)) for (f,v) in self._conditions)))
TypeError: non-empty format string passed to object.__format__
>>>
japerk commented 10 years ago

Thanks for the very detailed error report. It looks like it may an issue with NLTK 3, not Python, but I will investigate more & let you know.

japerk commented 10 years ago

I've verified that the same code works in Python 3.3.5 but not with Python 3.4.2, at least on my Mac. It appears that Python 3.4 is not handling exceptions as expected, or handles __attrs differently. I'll report this to NLTK and seem if there's a way to handle this. In the meantime, it looks like you'll need to use Python 3.3 instead of 3.4