nltk / nltk_data

NLTK Data
1.43k stars 1.03k forks source link

Verbnet identifier in index.xml mismatch #124

Open alvations opened 5 years ago

alvations commented 5 years ago

When recompiling the nltk_data, it throws this error:

nltk_data$ make 
python tools/build_pkg_index.py . https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages index.xml
Traceback (most recent call last):
  File "tools/build_pkg_index.py", line 24, in <module>
    index = build_index(ROOT, BASE_URL)
  File "/Users/liling.tan/Library/Python/2.7/lib/python/site-packages/nltk/downloader.py", line 2088, in build_index
    for pkg_xml, zf, subdir in _find_packages(os.path.join(root, 'packages')):
  File "/Users/liling.tan/Library/Python/2.7/lib/python/site-packages/nltk/downloader.py", line 2216, in _find_packages
    'vs %s)' % (pkg_xml.get('id'), uid))
ValueError: package identifier mismatch (verbnet vs verbnet3)
make: *** [pkg_index] Error 1
alvations commented 5 years ago

This is because both verbnet and verbnet3 has the same id:

nltk_data/packages/corpora$ cat verbnet.xml 
<package id="verbnet"
         name="VerbNet Lexicon, Version 2.1"
         version="2.1"
         author="Karin Kipper-Schuler"
         webpage="https://verbs.colorado.edu/verbnet/"
         license="Distributed with permission of the author."
         unzip="1"
         />

nltk_data/packages/corpora$ cat verbnet3.xml 
<package id="verbnet"
         name="VerbNet Lexicon, Version 3.3"
         version="3.3"
         author="Karin Kipper-Schuler"
         webpage="https://verbs.colorado.edu/verbnet/"
         license="Distributed with permission of the author."
         unzip="1"
         />

The same identifier is causing the mismatch in the nltk code too, c.f. https://github.com/nltk/nltk/issues/2015