petermr / amilib

Python library of `ami` software especially NLP, HTML, downloading and related convenience utilities
Apache License 2.0
0 stars 0 forks source link

[LookupError] Branch `pmr_dict` fail #21

Open nitikabaghel opened 1 month ago

nitikabaghel commented 1 month ago

System: Windows 11, Python 3.12.3

C:\Users\User\Desktop\Semantics\amilib>pytest

================================================= test session starts =================================================
platform win32 -- Python 3.12.3, pytest-8.2.1, pluggy-1.5.0
rootdir: C:\Users\User\Desktop\Semantics\amilib
collected 307 items

test\test_dict.py ...........s...........ss.............s...........................s.ssss.......ss.....                                             [ 28%]
test\test_file.py ss                                                                                                                                 [ 28%]
test\test_headless.py s..sssssss.....s                                                                                                               [ 33%]
test\test_html.py ...s.s......s..s.....ssss...s.........s..ssssss..ss.s...ss................................s... [ 64%]
.s                                                                                                                                                   [ 65%]
test\test_misc.py s.                                                                                                                                 [ 65%]
test\test_nlp.py F                                                                                                                                   [ 66%]
test\test_pdf.py Fss........sFs.s.sssssss.s....ss.ss..s....ssssssssss..s....ss                                                                       [ 85%]
test\test_pytest.py .                                                                                                                                [ 86%]
test\test_stat.py .                                                                                                                                  [ 86%]
test\test_svg.py ...                                                                                                                                 [ 87%]
test\test_util.py ss.....s...s...                                                                                                                    [ 92%]
test\test_wikidata.py .s...........s.......                                                                                                          [ 99%]
test\test_xml.py ..                                                                                                                                  [100%]

======================================================================== FAILURES =========================================================================
________________________________________________________ NLPTest.test_compute_text_similarity_STAT ________________________________________________________

self = <WordListCorpusReader in '.../corpora/stopwords' (not loaded yet)>

    def __load(self):
        # Find the corpus root directory.
        zip_name = re.sub(r"(([^/]+)(/.*)?)", r"\2.zip/\1/", self.__name)
        if TRY_ZIPFILE_FIRST:
            try:
                root = nltk.data.find(f"{self.subdir}/{zip_name}")
            except LookupError as e:
                try:
                    root = nltk.data.find(f"{self.subdir}/{self.__name}")
                except LookupError:
                    raise e
        else:
            try:
                root = nltk.data.find(f"{self.subdir}/{self.__name}")
            except LookupError as e:
                try:
>                   root = nltk.data.find(f"{self.subdir}/{zip_name}")

..\..\..\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\nltk\corpus\util.py:84:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

resource_name = 'corpora/stopwords.zip/stopwords/'
paths = ['C:\\Users\\User/nltk_data', 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__q..._3.12.1008.0_x64__qbz5n2kfra8p0\\lib\\nltk_data', 'C:\\Users\\User\\AppData\\Roaming\\nltk_data', 'C:\\nltk_data', ...]

    def find(resource_name, paths=None):
        """
        Find the given resource by searching through the directories and
        zip files in paths, where a None or empty string specifies an absolute path.
        Returns a corresponding path name.  If the given resource is not
        found, raise a ``LookupError``, whose message gives a pointer to
        the installation instructions for the NLTK downloader.

        Zip File Handling:

          - If ``resource_name`` contains a component with a ``.zip``
            extension, then it is assumed to be a zipfile; and the
            remaining path components are used to look inside the zipfile.

          - If any element of ``nltk.data.path`` has a ``.zip`` extension,
            then it is assumed to be a zipfile.

          - If a given resource name that does not contain any zipfile
            component is not found initially, then ``find()`` will make a
            second attempt to find that resource, by replacing each
            component *p* in the path with *p.zip/p*.  For example, this
            allows ``find()`` to map the resource name
            ``corpora/chat80/cities.pl`` to a zip file path pointer to
            ``corpora/chat80.zip/chat80/cities.pl``.

          - When using ``find()`` to locate a directory contained in a
            zipfile, the resource name must end with the forward slash
            character.  Otherwise, ``find()`` will not locate the
            directory.

        :type resource_name: str or unicode
        :param resource_name: The name of the resource to search for.
            Resource names are posix-style relative path names, such as
            ``corpora/brown``.  Directory names will be
            automatically converted to a platform-appropriate path separator.
        :rtype: str
        """
        resource_name = normalize_resource_name(resource_name, True)

        # Resolve default paths at runtime in-case the user overrides
        # nltk.data.path
        if paths is None:
            paths = path

        # Check if the resource name includes a zipfile name
        m = re.match(r"(.*\.zip)/?(.*)$|", resource_name)
        zipfile, zipentry = m.groups()

        # Check each item in our path
        for path_ in paths:
            # Is the path item a zipfile?
            if path_ and (os.path.isfile(path_) and path_.endswith(".zip")):
                try:
                    return ZipFilePathPointer(path_, resource_name)
                except OSError:
                    # resource not in zipfile
                    continue

            # Is the path item a directory or is resource_name an absolute path?
            elif not path_ or os.path.isdir(path_):
                if zipfile is None:
                    p = os.path.join(path_, url2pathname(resource_name))
                    if os.path.exists(p):
                        if p.endswith(".gz"):
                            return GzipFileSystemPathPointer(p)
                        else:
                            return FileSystemPathPointer(p)
                else:
                    p = os.path.join(path_, url2pathname(zipfile))
                    if os.path.exists(p):
                        try:
                            return ZipFilePathPointer(p, zipentry)
                        except OSError:
                            # resource not in zipfile
                            continue

        # Fallback: if the path doesn't include a zip file, then try
        # again, assuming that one of the path components is inside a
        # zipfile of the same name.
        if zipfile is None:
            pieces = resource_name.split("/")
            for i in range(len(pieces)):
                modified_name = "/".join(pieces[:i] + [pieces[i] + ".zip"] + pieces[i:])
                try:
                    return find(modified_name, paths)
                except LookupError:
                    pass

        # Identify the package (i.e. the .zip file) to download.
        resource_zipname = resource_name.split("/")[1]
        if resource_zipname.endswith(".zip"):
            resource_zipname = resource_zipname.rpartition(".")[0]
        # Display a friendly error message if the resource wasn't found:
        msg = str(
            "Resource \33[93m{resource}\033[0m not found.\n"
            "Please use the NLTK Downloader to obtain the resource:\n\n"
            "\33[31m"  # To display red text in terminal.
            ">>> import nltk\n"
            ">>> nltk.download('{resource}')\n"
            "\033[0m"
        ).format(resource=resource_zipname)
        msg = textwrap_indent(msg)

        msg += "\n  For more information see: https://www.nltk.org/data.html\n"

        msg += "\n  Attempted to load \33[93m{resource_name}\033[0m\n".format(
            resource_name=resource_name
        )

        msg += "\n  Searched in:" + "".join("\n    - %r" % d for d in paths)
        sep = "*" * 70
        resource_not_found = f"\n{sep}\n{msg}\n{sep}\n"
>       raise LookupError(resource_not_found)
E       LookupError:
E       **********************************************************************
E         Resource stopwords not found.
E         Please use the NLTK Downloader to obtain the resource:
E
E         >>> import nltk
E         >>> nltk.download('stopwords')
E
E         For more information see: https://www.nltk.org/data.html
E
E         Attempted to load corpora/stopwords.zip/stopwords/
E
E         Searched in:
E           - 'C:\\Users\\User/nltk_data'
E           - 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\\nltk_data'
E           - 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\\share\\nltk_data'
E           - 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\\lib\\nltk_data'
E           - 'C:\\Users\\User\\AppData\\Roaming\\nltk_data'
E           - 'C:\\nltk_data'
E           - 'D:\\nltk_data'
E           - 'E:\\nltk_data'
E       **********************************************************************

..\..\..\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\nltk\data.py:583: LookupError

During handling of the above exception, another exception occurred:

self = <test.test_nlp.NLPTest testMethod=test_compute_text_similarity_STAT>

>   ???

C:\Users\User\Desktop\sciCli\amilib\test\test_nlp.py:27:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
amilib\ami_nlp.py:44: in __init__
    stop_words = stopwords.words('english')
..\..\..\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\nltk\corpus\util.py:121: in __getattr__
    self.__load()
..\..\..\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\nltk\corpus\util.py:86: in __load
    raise e
..\..\..\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\nltk\corpus\util.py:81: in __load
    root = nltk.data.find(f"{self.subdir}/{self.__name}")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

resource_name = 'corpora/stopwords'
paths = ['C:\\Users\\User/nltk_data', 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__q..._3.12.1008.0_x64__qbz5n2kfra8p0\\lib\\nltk_data', 'C:\\Users\\User\\AppData\\Roaming\\nltk_data', 'C:\\nltk_data', ...]

    def find(resource_name, paths=None):
        """
        Find the given resource by searching through the directories and
        zip files in paths, where a None or empty string specifies an absolute path.
        Returns a corresponding path name.  If the given resource is not
        found, raise a ``LookupError``, whose message gives a pointer to
        the installation instructions for the NLTK downloader.

        Zip File Handling:

          - If ``resource_name`` contains a component with a ``.zip``
            extension, then it is assumed to be a zipfile; and the
            remaining path components are used to look inside the zipfile.

          - If any element of ``nltk.data.path`` has a ``.zip`` extension,
            then it is assumed to be a zipfile.

          - If a given resource name that does not contain any zipfile
            component is not found initially, then ``find()`` will make a
            second attempt to find that resource, by replacing each
            component *p* in the path with *p.zip/p*.  For example, this
            allows ``find()`` to map the resource name
            ``corpora/chat80/cities.pl`` to a zip file path pointer to
            ``corpora/chat80.zip/chat80/cities.pl``.

          - When using ``find()`` to locate a directory contained in a
            zipfile, the resource name must end with the forward slash
            character.  Otherwise, ``find()`` will not locate the
            directory.

        :type resource_name: str or unicode
        :param resource_name: The name of the resource to search for.
            Resource names are posix-style relative path names, such as
            ``corpora/brown``.  Directory names will be
            automatically converted to a platform-appropriate path separator.
        :rtype: str
        """
        resource_name = normalize_resource_name(resource_name, True)

        # Resolve default paths at runtime in-case the user overrides
        # nltk.data.path
        if paths is None:
            paths = path

        # Check if the resource name includes a zipfile name
        m = re.match(r"(.*\.zip)/?(.*)$|", resource_name)
        zipfile, zipentry = m.groups()

        # Check each item in our path
        for path_ in paths:
            # Is the path item a zipfile?
            if path_ and (os.path.isfile(path_) and path_.endswith(".zip")):
                try:
                    return ZipFilePathPointer(path_, resource_name)
                except OSError:
                    # resource not in zipfile
                    continue

            # Is the path item a directory or is resource_name an absolute path?
            elif not path_ or os.path.isdir(path_):
                if zipfile is None:
                    p = os.path.join(path_, url2pathname(resource_name))
                    if os.path.exists(p):
                        if p.endswith(".gz"):
                            return GzipFileSystemPathPointer(p)
                        else:
                            return FileSystemPathPointer(p)
                else:
                    p = os.path.join(path_, url2pathname(zipfile))
                    if os.path.exists(p):
                        try:
                            return ZipFilePathPointer(p, zipentry)
                        except OSError:
                            # resource not in zipfile
                            continue

        # Fallback: if the path doesn't include a zip file, then try
        # again, assuming that one of the path components is inside a
        # zipfile of the same name.
        if zipfile is None:
            pieces = resource_name.split("/")
            for i in range(len(pieces)):
                modified_name = "/".join(pieces[:i] + [pieces[i] + ".zip"] + pieces[i:])
                try:
                    return find(modified_name, paths)
                except LookupError:
                    pass

        # Identify the package (i.e. the .zip file) to download.
        resource_zipname = resource_name.split("/")[1]
        if resource_zipname.endswith(".zip"):
            resource_zipname = resource_zipname.rpartition(".")[0]
        # Display a friendly error message if the resource wasn't found:
        msg = str(
            "Resource \33[93m{resource}\033[0m not found.\n"
            "Please use the NLTK Downloader to obtain the resource:\n\n"
            "\33[31m"  # To display red text in terminal.
            ">>> import nltk\n"
            ">>> nltk.download('{resource}')\n"
            "\033[0m"
        ).format(resource=resource_zipname)
        msg = textwrap_indent(msg)

        msg += "\n  For more information see: https://www.nltk.org/data.html\n"

        msg += "\n  Attempted to load \33[93m{resource_name}\033[0m\n".format(
            resource_name=resource_name
        )

        msg += "\n  Searched in:" + "".join("\n    - %r" % d for d in paths)
        sep = "*" * 70
        resource_not_found = f"\n{sep}\n{msg}\n{sep}\n"
>       raise LookupError(resource_not_found)
E       LookupError:
E       **********************************************************************
E         Resource stopwords not found.
E         Please use the NLTK Downloader to obtain the resource:
E
E         >>> import nltk
E         >>> nltk.download('stopwords')
E
E         For more information see: https://www.nltk.org/data.html
E
E         Attempted to load corpora/stopwords
E
E         Searched in:
E           - 'C:\\Users\\User/nltk_data'
E           - 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\\nltk_data'
E           - 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\\share\\nltk_data'
E           - 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\\lib\\nltk_data'
E           - 'C:\\Users\\User\\AppData\\Roaming\\nltk_data'
E           - 'C:\\nltk_data'
E           - 'D:\\nltk_data'
E           - 'E:\\nltk_data'
E       **********************************************************************

..\..\..\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\nltk\data.py:583: LookupError

================================================================= short test summary info =================================================================

FAILED test/test_nlp.py::NLPTest::test_compute_text_similarity_STAT - LookupError:

============================================ 3 failed, 221 passed, 83 skipped, 4 warnings in 182.20s (0:03:02) ============================================
petermr commented 1 month ago

Good report @nitika

It's missing the stopwords from NLTK. I think you have to install them:

We probabaly have to add

import nltk
nltk.download('stopwords')

Let me check

nitikabaghel commented 1 month ago

Thank you PMR, and noted!

petermr commented 1 month ago

I have added this to the code and committed and (I think) it now works