petermr / amilib

Python library of `ami` software especially NLP, HTML, downloading and related convenience utilities
Apache License 2.0
0 stars 0 forks source link

[LookupError] Branch `pmr_dict` fail #21

Open nitikabaghel opened 1 month ago

nitikabaghel commented 1 month ago

System: Windows 11, Python 3.12.3


================================================= test session starts =================================================
platform win32 -- Python 3.12.3, pytest-8.2.1, pluggy-1.5.0
rootdir: C:\Users\User\Desktop\Semantics\amilib
collected 307 items

test\                                             [ 28%]
test\ ss                                                                                                                                 [ 28%]
test\ s..sssssss.....s                                                                                                               [ 33%]
test\ [ 64%]
.s                                                                                                                                                   [ 65%]
test\ s.                                                                                                                                 [ 65%]
test\ F                                                                                                                                   [ 66%]
test\                                                                       [ 85%]
test\ .                                                                                                                                [ 86%]
test\ .                                                                                                                                  [ 86%]
test\ ...                                                                                                                                 [ 87%]
test\ ss.....s...s...                                                                                                                    [ 92%]
test\ .s...........s.......                                                                                                          [ 99%]
test\ ..                                                                                                                                  [100%]

======================================================================== FAILURES =========================================================================
________________________________________________________ NLPTest.test_compute_text_similarity_STAT ________________________________________________________

self = <WordListCorpusReader in '.../corpora/stopwords' (not loaded yet)>

    def __load(self):
        # Find the corpus root directory.
        zip_name = re.sub(r"(([^/]+)(/.*)?)", r"\\1/", self.__name)
                root ="{self.subdir}/{zip_name}")
            except LookupError as e:
                    root ="{self.subdir}/{self.__name}")
                except LookupError:
                    raise e
                root ="{self.subdir}/{self.__name}")
            except LookupError as e:
>                   root ="{self.subdir}/{zip_name}")


_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

resource_name = 'corpora/'
paths = ['C:\\Users\\User/nltk_data', 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__q..._3.12.1008.0_x64__qbz5n2kfra8p0\\lib\\nltk_data', 'C:\\Users\\User\\AppData\\Roaming\\nltk_data', 'C:\\nltk_data', ...]

    def find(resource_name, paths=None):
        Find the given resource by searching through the directories and
        zip files in paths, where a None or empty string specifies an absolute path.
        Returns a corresponding path name.  If the given resource is not
        found, raise a ``LookupError``, whose message gives a pointer to
        the installation instructions for the NLTK downloader.

        Zip File Handling:

          - If ``resource_name`` contains a component with a ``.zip``
            extension, then it is assumed to be a zipfile; and the
            remaining path components are used to look inside the zipfile.

          - If any element of ```` has a ``.zip`` extension,
            then it is assumed to be a zipfile.

          - If a given resource name that does not contain any zipfile
            component is not found initially, then ``find()`` will make a
            second attempt to find that resource, by replacing each
            component *p* in the path with **.  For example, this
            allows ``find()`` to map the resource name
            ``corpora/chat80/`` to a zip file path pointer to

          - When using ``find()`` to locate a directory contained in a
            zipfile, the resource name must end with the forward slash
            character.  Otherwise, ``find()`` will not locate the

        :type resource_name: str or unicode
        :param resource_name: The name of the resource to search for.
            Resource names are posix-style relative path names, such as
            ``corpora/brown``.  Directory names will be
            automatically converted to a platform-appropriate path separator.
        :rtype: str
        resource_name = normalize_resource_name(resource_name, True)

        # Resolve default paths at runtime in-case the user overrides
        if paths is None:
            paths = path

        # Check if the resource name includes a zipfile name
        m = re.match(r"(.*\.zip)/?(.*)$|", resource_name)
        zipfile, zipentry = m.groups()

        # Check each item in our path
        for path_ in paths:
            # Is the path item a zipfile?
            if path_ and (os.path.isfile(path_) and path_.endswith(".zip")):
                    return ZipFilePathPointer(path_, resource_name)
                except OSError:
                    # resource not in zipfile

            # Is the path item a directory or is resource_name an absolute path?
            elif not path_ or os.path.isdir(path_):
                if zipfile is None:
                    p = os.path.join(path_, url2pathname(resource_name))
                    if os.path.exists(p):
                        if p.endswith(".gz"):
                            return GzipFileSystemPathPointer(p)
                            return FileSystemPathPointer(p)
                    p = os.path.join(path_, url2pathname(zipfile))
                    if os.path.exists(p):
                            return ZipFilePathPointer(p, zipentry)
                        except OSError:
                            # resource not in zipfile

        # Fallback: if the path doesn't include a zip file, then try
        # again, assuming that one of the path components is inside a
        # zipfile of the same name.
        if zipfile is None:
            pieces = resource_name.split("/")
            for i in range(len(pieces)):
                modified_name = "/".join(pieces[:i] + [pieces[i] + ".zip"] + pieces[i:])
                    return find(modified_name, paths)
                except LookupError:

        # Identify the package (i.e. the .zip file) to download.
        resource_zipname = resource_name.split("/")[1]
        if resource_zipname.endswith(".zip"):
            resource_zipname = resource_zipname.rpartition(".")[0]
        # Display a friendly error message if the resource wasn't found:
        msg = str(
            "Resource \33[93m{resource}\033[0m not found.\n"
            "Please use the NLTK Downloader to obtain the resource:\n\n"
            "\33[31m"  # To display red text in terminal.
            ">>> import nltk\n"
        msg = textwrap_indent(msg)

        msg += "\n  For more information see:\n"

        msg += "\n  Attempted to load \33[93m{resource_name}\033[0m\n".format(

        msg += "\n  Searched in:" + "".join("\n    - %r" % d for d in paths)
        sep = "*" * 70
        resource_not_found = f"\n{sep}\n{msg}\n{sep}\n"
>       raise LookupError(resource_not_found)
E       LookupError:
E       **********************************************************************
E         Resource stopwords not found.
E         Please use the NLTK Downloader to obtain the resource:
E         >>> import nltk
E         >>>'stopwords')
E         For more information see:
E         Attempted to load corpora/
E         Searched in:
E           - 'C:\\Users\\User/nltk_data'
E           - 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\\nltk_data'
E           - 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\\share\\nltk_data'
E           - 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\\lib\\nltk_data'
E           - 'C:\\Users\\User\\AppData\\Roaming\\nltk_data'
E           - 'C:\\nltk_data'
E           - 'D:\\nltk_data'
E           - 'E:\\nltk_data'
E       **********************************************************************

..\..\..\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\nltk\ LookupError

During handling of the above exception, another exception occurred:

self = <test.test_nlp.NLPTest testMethod=test_compute_text_similarity_STAT>

>   ???

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
amilib\ in __init__
    stop_words = stopwords.words('english')
..\..\..\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\nltk\corpus\ in __getattr__
..\..\..\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\nltk\corpus\ in __load
    raise e
..\..\..\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\nltk\corpus\ in __load
    root ="{self.subdir}/{self.__name}")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

resource_name = 'corpora/stopwords'
paths = ['C:\\Users\\User/nltk_data', 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__q..._3.12.1008.0_x64__qbz5n2kfra8p0\\lib\\nltk_data', 'C:\\Users\\User\\AppData\\Roaming\\nltk_data', 'C:\\nltk_data', ...]

    def find(resource_name, paths=None):
        Find the given resource by searching through the directories and
        zip files in paths, where a None or empty string specifies an absolute path.
        Returns a corresponding path name.  If the given resource is not
        found, raise a ``LookupError``, whose message gives a pointer to
        the installation instructions for the NLTK downloader.

        Zip File Handling:

          - If ``resource_name`` contains a component with a ``.zip``
            extension, then it is assumed to be a zipfile; and the
            remaining path components are used to look inside the zipfile.

          - If any element of ```` has a ``.zip`` extension,
            then it is assumed to be a zipfile.

          - If a given resource name that does not contain any zipfile
            component is not found initially, then ``find()`` will make a
            second attempt to find that resource, by replacing each
            component *p* in the path with **.  For example, this
            allows ``find()`` to map the resource name
            ``corpora/chat80/`` to a zip file path pointer to

          - When using ``find()`` to locate a directory contained in a
            zipfile, the resource name must end with the forward slash
            character.  Otherwise, ``find()`` will not locate the

        :type resource_name: str or unicode
        :param resource_name: The name of the resource to search for.
            Resource names are posix-style relative path names, such as
            ``corpora/brown``.  Directory names will be
            automatically converted to a platform-appropriate path separator.
        :rtype: str
        resource_name = normalize_resource_name(resource_name, True)

        # Resolve default paths at runtime in-case the user overrides
        if paths is None:
            paths = path

        # Check if the resource name includes a zipfile name
        m = re.match(r"(.*\.zip)/?(.*)$|", resource_name)
        zipfile, zipentry = m.groups()

        # Check each item in our path
        for path_ in paths:
            # Is the path item a zipfile?
            if path_ and (os.path.isfile(path_) and path_.endswith(".zip")):
                    return ZipFilePathPointer(path_, resource_name)
                except OSError:
                    # resource not in zipfile

            # Is the path item a directory or is resource_name an absolute path?
            elif not path_ or os.path.isdir(path_):
                if zipfile is None:
                    p = os.path.join(path_, url2pathname(resource_name))
                    if os.path.exists(p):
                        if p.endswith(".gz"):
                            return GzipFileSystemPathPointer(p)
                            return FileSystemPathPointer(p)
                    p = os.path.join(path_, url2pathname(zipfile))
                    if os.path.exists(p):
                            return ZipFilePathPointer(p, zipentry)
                        except OSError:
                            # resource not in zipfile

        # Fallback: if the path doesn't include a zip file, then try
        # again, assuming that one of the path components is inside a
        # zipfile of the same name.
        if zipfile is None:
            pieces = resource_name.split("/")
            for i in range(len(pieces)):
                modified_name = "/".join(pieces[:i] + [pieces[i] + ".zip"] + pieces[i:])
                    return find(modified_name, paths)
                except LookupError:

        # Identify the package (i.e. the .zip file) to download.
        resource_zipname = resource_name.split("/")[1]
        if resource_zipname.endswith(".zip"):
            resource_zipname = resource_zipname.rpartition(".")[0]
        # Display a friendly error message if the resource wasn't found:
        msg = str(
            "Resource \33[93m{resource}\033[0m not found.\n"
            "Please use the NLTK Downloader to obtain the resource:\n\n"
            "\33[31m"  # To display red text in terminal.
            ">>> import nltk\n"
        msg = textwrap_indent(msg)

        msg += "\n  For more information see:\n"

        msg += "\n  Attempted to load \33[93m{resource_name}\033[0m\n".format(

        msg += "\n  Searched in:" + "".join("\n    - %r" % d for d in paths)
        sep = "*" * 70
        resource_not_found = f"\n{sep}\n{msg}\n{sep}\n"
>       raise LookupError(resource_not_found)
E       LookupError:
E       **********************************************************************
E         Resource stopwords not found.
E         Please use the NLTK Downloader to obtain the resource:
E         >>> import nltk
E         >>>'stopwords')
E         For more information see:
E         Attempted to load corpora/stopwords
E         Searched in:
E           - 'C:\\Users\\User/nltk_data'
E           - 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\\nltk_data'
E           - 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\\share\\nltk_data'
E           - 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\\lib\\nltk_data'
E           - 'C:\\Users\\User\\AppData\\Roaming\\nltk_data'
E           - 'C:\\nltk_data'
E           - 'D:\\nltk_data'
E           - 'E:\\nltk_data'
E       **********************************************************************

..\..\..\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\nltk\ LookupError

================================================================= short test summary info =================================================================

FAILED test/ - LookupError:

============================================ 3 failed, 221 passed, 83 skipped, 4 warnings in 182.20s (0:03:02) ============================================
petermr commented 1 month ago

Good report @nitika

It's missing the stopwords from NLTK. I think you have to install them:

We probabaly have to add

import nltk'stopwords')

Let me check

nitikabaghel commented 1 month ago

Thank you PMR, and noted!

petermr commented 1 month ago

I have added this to the code and committed and (I think) it now works