Python crashes while building a flow that includes Hugging Face Embeddings sentence_transformers

AlastairJarvis commented 3 weeks ago

Describe the bug While building a flow using Hugging Face Embeddings component (model: sentence-transformers/all-mpnet-base-v2), Python crashes.

Browser and Version

Browser : Chrome 124.0.6367.119, Mac
Version 1.0.0a26

To Reproduce Steps to reproduce the behavior:

Go to My Collection
Select Vector Store RAG Flow added from Store
Replace OpenAI Embeddings with Hugging Face Embeddings
Build
See error in Langflow interface

  Error Building Component
       Error building vertex Hugging Face Embeddings: Could not import sentence_transformers python package. Please install it with `pip install sentence-transformers`.

Install sentence-transformers
Attempt to build again.
See error in Terminal window

/usr/local/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' çobjc[8054]: +[NSCFConstantString initialize] may have been in progress in another thread when fork() was called. objc[8054]: +[NSCFConstantString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug. [05/09/24 15:24:22] ERROR 2024-05-09 15:24:22 - ERROR - glogging.py:272 glogging - Worker (pid:8054) was
sent SIGABRT!

dosubot[bot] commented 3 weeks ago

Hey there, @AlastairJarvis! 👋 I'm Dosu, a friendly bot here to lend a hand with bugs, answer your queries, or guide you on contributing while we await a human maintainer. Just wanted you to know I'm diving into your issue and will get back to you with a comprehensive response soon! Stay tuned.

AlastairJarvis commented 3 weeks ago

Let me know if you can't reproduce and I can send through the Problem Report for Python from the OS.

I get this 100%, and happy to help debug if you're unable to repro it there.

dosubot[bot] commented 3 weeks ago

To address the crash issue you're encountering with Python when integrating Hugging Face Embeddings into your Vector Store RAG Flow, consider the following targeted steps:

Ensure Initialization Before Forking: Load all Hugging Face models before any multiprocessing occurs to avoid semaphore and objc fork safety issues.
Adjust Langflow Component Configuration: In the HuggingFaceEmbeddingsComponent, set multi_process to False to prevent multiprocessing complications.
Set Environment Variable on macOS: Use OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES to bypass objc fork safety checks, though proceed with caution as it may affect application stability.
Change Multiprocessing Context: Switch Python's multiprocessing context to "spawn" or "thread" to mitigate issues related to process forking.
Update Libraries: Keep Hugging Face Transformers, Sentence Transformers, and Langflow libraries up to date to benefit from the latest fixes and improvements.
Enhance Debugging in Langflow: Implement detailed logging around the initialization and operation of the HuggingFaceEmbeddingsComponent to identify the crash source more effectively.

Here's an example of how to configure the HuggingFaceEmbeddingsComponent to avoid multiprocessing issues:

hugging_face_component = HuggingFaceEmbeddingsComponent(
    model_name="sentence-transformers/all-mpnet-base-v2",
    multi_process=False  # Avoids multiprocessing issues
)

Additionally, the introduction of the CTransformersComponent in Langflow, with its config parameter for detailed configuration, including multiprocessing settings, suggests enhanced support for Hugging Face model integration. This component is designed to improve the handling of language models by providing customizable deployment and execution options.

If these steps do not resolve the issue, please share any additional error messages or information for further assistance.

#### Details

**Sources** - [src/backend/base/langflow/initial_setup/starter_projects/VectorStore-RAG-Flows.json](https://github.com/logspace-ai/langflow/blob/dev/src/backend/base/langflow/initial_setup/starter_projects/VectorStore-RAG-Flows.json) - [src/backend/langflow/components/embeddings/HuggingFaceEmbeddings.py](https://github.com/logspace-ai/langflow/blob/dev/src/backend/langflow/components/embeddings/HuggingFaceEmbeddings.py) - [src/backend/base/langflow/components/embeddings/HuggingFaceEmbeddings.py](https://github.com/logspace-ai/langflow/blob/dev/src/backend/base/langflow/components/embeddings/HuggingFaceEmbeddings.py) - [src/backend/langflow/components/llms/CTransformers.py](https://github.com/logspace-ai/langflow/blob/dev/src/backend/langflow/components/llms/CTransformers.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

ogabrielluiz commented 3 weeks ago

Hey @AlastairJarvis

Do you know if this component accesses the HuggingFace API or if it loads the model locally?

Running it locally will not work on the 1.0 alpha because it will load the model on every run.

AlastairJarvis commented 3 weeks ago

Aha.

When I first tried this component (HuggingFaceEmbeddings), LangFlow prompted me to download and install it, which leads me to believe it's trying to run it locally.

I see HuggingFaceAPI Embeddings is a different component, but it looks like this is also pointing to localhost - so pointing to an API being served locally?

langflow-ai / langflow

Python crashes while building a flow that includes Hugging Face Embeddings sentence_transformers #1869