miso-belica / sumy

Module for automatic summarization of text documents and HTML pages.
https://miso-belica.github.io/sumy/
Apache License 2.0
3.46k stars 525 forks source link

Lowercase of all languages needed in utils.py #206

Closed Manamama closed 5 months ago

Manamama commented 5 months ago

In the utils.py, I needed to change to language.lower()

def normalize_language(language):
    for lookup_key in ("alpha_2", "alpha_3"):
        try:
            lang = languages.get(**{lookup_key: language})

            if lang:
                language = lang.name.lower()
        except KeyError:
            pass

    return language.lower()

so as to avoid cryptic errors when the language name was capitalized:

sumy text-rank --format=html --language=Polish sumy text-rank --format=html --language=French

etc.

->

>  LookupError: NLTK tokenizers are missing or the language is not supported.
> Download them by following command: python -c "import nltk; nltk.download('punkt')"
> Original error was:
> 
> **********************************************************************
>   Resource punkt not found.
>   Please use the NLTK Downloader to obtain the resource:
> 
>   >>> import nltk
>   >>> nltk.download('punkt')
>   
>   For more information see: https://www.nltk.org/data.html
> 
>   Attempted to load tokenizers/punkt/PY3/Polish.pickle
> 

Otherwise the package is great.

miso-belica commented 5 months ago

Thank you, should be fixed in main now.