oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.85k stars 5.34k forks source link

Suspect Superbooga V2 missing 'import pytextrank' in data_preprocessor #5860

Closed CharlieCarisiLiu closed 5 months ago

CharlieCarisiLiu commented 7 months ago

Describe the bug

Sorry if this is the wrong place to share a bug for the Superbooga v2 extension.

With superbooga V2 enabled in session, after loading a txt / text file and start generate from Default tab, there's the following error:

 File "/home/sysadmin/text-generation-webui/extensions/superboogav2/data_preprocessor.py", line 160, in _load_nlp_pipeline
    TextSummarizer._nlp_pipeline.add_pipe("textrank", last=True)
  File "/home/sysadmin/text-generation-webui/installer_files/env/lib/python3.11/site-packages/spacy/language.py", line 821, in add_pipe
    pipe_component = self.create_pipe(
                     ^^^^^^^^^^^^^^^^^
  File "/home/sysadmin/text-generation-webui/installer_files/env/lib/python3.11/site-packages/spacy/language.py", line 690, in create_pipe
    raise ValueError(err)
ValueError: [E002] Can't find factory for 'textrank' for language English (en). This usually happens when spaCy calls `nlp.create_pipe` with a custom component name that's not registered on the current language class. If you're using a custom component, make sure you've added the decorator `@Language.component` (for function components) or `@Language.factory` (for class components).

Available factories: attribute_ruler, tok2vec, merge_noun_chunks, merge_entities, merge_subtokens, token_splitter, doc_cleaner, parser, beam_parser, lemmatizer, trainable_lemmatizer, entity_linker, entity_ruler, tagger, morphologizer, ner, beam_ner, senter, sentencizer, spancat, spancat_singlelabel, span_finder, future_entity_ruler, span_ruler, textcat, textcat_multilabel, en.lemmatizer

When you click the generate again, it will prompt the following error:

File "/home/sysadmin/text-generation-webui/extensions/superboogav2/data_processor.py", line 51, in preprocess_text
    important_sentences = TextSummarizer.process_long_text(text, parameters.get_min_num_sentences())
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sysadmin/text-generation-webui/extensions/superboogav2/data_preprocessor.py", line 191, in process_long_text
    result = [str(sent) for sent in doc.textrank.summary(limit_phrases=limit_phrases, limit_sentences=limit_sentences)]
                                    ^^^^^^^^^^^^^^
  File "/home/sysadmin/text-generation-webui/installer_files/env/lib/python3.11/site-packages/spacy/tokens/underscore.py", line 48, in __getattr__
    raise AttributeError(Errors.E046.format(name=name))
AttributeError: [E046] Can't retrieve unregistered extension attribute 'textrank'. Did you forget to call the `set_extension` method?

I tried to import the extension in data_preprocessor.py with: import pytextrank and superbooga works again!

Maybe this is something to investigate.

Is there an existing issue for this?

Reproduction

  1. Install Superbooga V2 requirements using update_wizard_linux.sh

  2. Enable the extension after fire up text-generation-webui

  3. Upload and submit txt file in the superbooga v2 section

  4. Disable is manual in general settings

  5. Click Generate on the Default tab with <|injection-point|> and user input tags in input. Error Appears: ValueError: [E002] Can't find factory for 'textrank' for language English (en). This usually happens when spaCy callsnlp.create_pipewith a custom component name that's not registered on the current language class. If you're using a custom component, make sure you've added the decorator@Language.component(for function components) or@Language.factory(for class components).

  6. Click Generate again Error Appears:

AttributeError: [E046] Can't retrieve unregistered extension attribute 'textrank'. Did you forget to call theset_extensionmethod?

Screenshot

No response

Logs

File "/home/sysadmin/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/queueing.py", line 561, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sysadmin/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/route_utils.py", line 260, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sysadmin/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/blocks.py", line 1741, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sysadmin/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/blocks.py", line 1308, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sysadmin/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 575, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sysadmin/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 568, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sysadmin/text-generation-webui/installer_files/env/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sysadmin/text-generation-webui/installer_files/env/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/home/sysadmin/text-generation-webui/installer_files/env/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sysadmin/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 551, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/home/sysadmin/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 734, in gen_wrapper
    response = next(iterator)
               ^^^^^^^^^^^^^^
  File "/home/sysadmin/text-generation-webui/modules/text_generation.py", line 181, in generate_reply_wrapper
    for reply in generate_reply(question, state, stopping_strings, is_chat=False, escape_html=True, for_ui=True):
  File "/home/sysadmin/text-generation-webui/modules/text_generation.py", line 33, in generate_reply
    for result in _generate_reply(*args, **kwargs):
  File "/home/sysadmin/text-generation-webui/modules/text_generation.py", line 63, in _generate_reply
    question = apply_extensions('input', question, state)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sysadmin/text-generation-webui/modules/extensions.py", line 231, in apply_extensions
    return EXTENSION_MAP[typ](*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sysadmin/text-generation-webui/modules/extensions.py", line 89, in _apply_string_extensions
    text = func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/sysadmin/text-generation-webui/extensions/superboogav2/script.py", line 173, in input_modifier
    return input_modifier_internal(string, collector, is_chat)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sysadmin/text-generation-webui/extensions/superboogav2/notebook_handler.py", line 29, in input_modifier_internal
    user_input = preprocess_text(user_input)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sysadmin/text-generation-webui/extensions/superboogav2/data_processor.py", line 51, in preprocess_text
    important_sentences = TextSummarizer.process_long_text(text, parameters.get_min_num_sentences())
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sysadmin/text-generation-webui/extensions/superboogav2/data_preprocessor.py", line 180, in process_long_text
    nlp_pipeline = TextSummarizer._load_nlp_pipeline()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sysadmin/text-generation-webui/extensions/superboogav2/data_preprocessor.py", line 160, in _load_nlp_pipeline
    TextSummarizer._nlp_pipeline.add_pipe("textrank", last=True)
  File "/home/sysadmin/text-generation-webui/installer_files/env/lib/python3.11/site-packages/spacy/language.py", line 821, in add_pipe
    pipe_component = self.create_pipe(
                     ^^^^^^^^^^^^^^^^^
  File "/home/sysadmin/text-generation-webui/installer_files/env/lib/python3.11/site-packages/spacy/language.py", line 690, in create_pipe
    raise ValueError(err)
ValueError: [E002] Can't find factory for 'textrank' for language English (en). This usually happens when spaCy calls `nlp.create_pipe` with a custom component name that's not registered on the current language class. If you're using a custom component, make sure you've added the decorator `@Language.component` (for function components) or `@Language.factory` (for class components).

Available factories: attribute_ruler, tok2vec, merge_noun_chunks, merge_entities, merge_subtokens, token_splitter, doc_cleaner, parser, beam_parser, lemmatizer, trainable_lemmatizer, entity_linker, entity_ruler, tagger, morphologizer, ner, beam_ner, senter, sentencizer, spancat, spancat_singlelabel, span_finder, future_entity_ruler, span_ruler, textcat, textcat_multilabel, en.lemmatizer

System Info

Azure: Standard NC8as T4 v3 (8 vcpu,56 GiB RAM)

spaCy version    3.7.4
Location         /home/sysadmin/text-generation-webui/installer_files/env/lib/python3.11/site-packages/spacy
Platform         Linux-6.5.0-1018-azure-x86_64-with-glibc2.35
Python version   3.11.8
Pipelines        en_core_web_sm (3.7.1)
MB7979 commented 7 months ago

Yes, I had the same issue and reported it here: https://github.com/oobabooga/text-generation-webui/issues/5704

There is also an issue (which I haven’t been able to resolve) where previously loaded embeddings can no longer be cleared. Both of these issues arose after the recent superboogav2 overhaul.

github-actions[bot] commented 5 months ago

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.