Chapter 6 example 2.3 - Githubissues

In chapter 6, example 2.3 lists the following:

Example 2.3: Recognizing Textual Entailment

>>> rtepair = nltk.corpus.rte.pairs(['rte3_dev.xml'])[33]
>>> extractor = nltk.RTEFeatureExtractor(rtepair)
>>> print(extractor.text_words)
{'Russia', 'Organisation', 'Shanghai', 'Asia', 'four', 'at',
'operation', 'SCO', ...}
>>> print(extractor.hyp_words)
{'member', 'SCO', 'China'}

However, when these lines are run in python the following output is returned:

Python 3.6

>>> import nltk
>>> rtepair = nltk.corpus.rte.pairs(['rte3_dev.xml'])[33]
>>> extractor = nltk.RTEFeatureExtractor(rtepair)
>>> print(extractor.text_words)
{'n', 'z', 'o', 'f', 't', 'g', 's', 'r', 'l', 'e', 'O', 'd', 'i'}
>>> print(extractor.hyp_words)
{'f', 's', 'r'}

Python 2.7

>>> import nltk
>>> rtepair = nltk.corpus.rte.pairs(['rte3_dev.xml'])[33]
>>> extractor = nltk.RTEFeatureExtractor(rtepair)
>>> print(extractor.text_words)
set([''])
>>> print(extractor.hyp_words)
set([''])

I am unsure if this is simple user error on my end, or if something has changed with NLTK that is interfering with this example. Any thoughts are appreciated.

nltk / nltk_book

Chapter 6 example 2.3 #189