nltk / nltk_book

NLTK Book
http://www.nltk.org/book
406 stars 142 forks source link

Chapter 6 example 2.3 #189

Open emesterhazy opened 7 years ago

emesterhazy commented 7 years ago

In chapter 6, example 2.3 lists the following:

Example 2.3: Recognizing Textual Entailment

>>> rtepair = nltk.corpus.rte.pairs(['rte3_dev.xml'])[33]
>>> extractor = nltk.RTEFeatureExtractor(rtepair)
>>> print(extractor.text_words)
{'Russia', 'Organisation', 'Shanghai', 'Asia', 'four', 'at',
'operation', 'SCO', ...}
>>> print(extractor.hyp_words)
{'member', 'SCO', 'China'}

However, when these lines are run in python the following output is returned:

Python 3.6

>>> import nltk
>>> rtepair = nltk.corpus.rte.pairs(['rte3_dev.xml'])[33]
>>> extractor = nltk.RTEFeatureExtractor(rtepair)
>>> print(extractor.text_words)
{'n', 'z', 'o', 'f', 't', 'g', 's', 'r', 'l', 'e', 'O', 'd', 'i'}
>>> print(extractor.hyp_words)
{'f', 's', 'r'}

Python 2.7

>>> import nltk
>>> rtepair = nltk.corpus.rte.pairs(['rte3_dev.xml'])[33]
>>> extractor = nltk.RTEFeatureExtractor(rtepair)
>>> print(extractor.text_words)
set([''])
>>> print(extractor.hyp_words)
set([''])

I am unsure if this is simple user error on my end, or if something has changed with NLTK that is interfering with this example. Any thoughts are appreciated.