joaomdmoura / crewAI

Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
https://crewai.com
MIT License
17.02k stars 2.31k forks source link

ollama not in embedder provider list #439

Open punitchauhan771 opened 3 months ago

punitchauhan771 commented 3 months ago

I am trying to write a simple pdf agent which would answer questions on the basis of pdf knowledge

app.py

llm = Ollama(base_url = url,model=model,num_gpu=2)
rag_tool =  PDFSearchTool(
    pdf = r'pdf_path',
    config=dict(
        llm=dict(
            provider="ollama", # or google, openai, anthropic, llama2, ...
            config=dict(
                model="gemma",
                # temperature=0.5,
                # top_p=1,
                # stream=true,
            ),
        ),
        embedder=dict(
            provider="ollama",
            config=dict(
                model="nomic-embed-text",
                task_type="retrieval_document",
                # title="Embeddings",
            ),
        ),
    )
)

auditor_agent = Agent(
    role = 'Data Analyst',
    goal = 'You perfectly know how to analyze any data using provided txt file and searching info via RAG tool',
    Background = 'You are data expert',
    verbose = True,
    allow_delegation = False,
    tools = [rag_tool]
)

task = Task(
    description = "what is the latest status of ₹2000 bank notes",
    tools = [rag_tool],
    agent = auditor_agent,
    expected_output = '''The output should be in following format :
    Format:
    Word Limit : 25
    Writing style :  simple and logical
    '''
)
task1 = task.execute()
print(task1.output())

error:

schema.SchemaError: Key 'embedder' error:
Key 'provider' error:
Or('openai', 'gpt4all', 'huggingface', 'vertexai', 'azure_openai', 'google', 'mistralai', 'nvidia') did not validate 'ollama'
'openai' does not match 'ollama'
'gpt4all' does not match 'ollama'
'huggingface' does not match 'ollama'
'vertexai' does not match 'ollama'
'azure_openai' does not match 'ollama'
'google' does not match 'ollama'
'mistralai' does not match 'ollama'
'nvidia' does not match 'ollama'
fubz commented 2 months ago

While it's not Ollama, you can run a local embedder by using the HugginFace provider. Here is an example.

        test_crew = Crew(
            agents=[reader, writer],
            tasks=[read_book, write_report],
            process=Process.sequential,
            cache=True,
            verbose=2,
            memory=True,
            embedder={
                "provider": "huggingface",
                "config": {
                    "model": "mixedbread-ai/mxbai-embed-large-v1", # https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1
                }
            }
        )
piovis2023 commented 2 months ago

@fubz Good idea. Would this route still be private, meaning no one else could access the data? Would I need to clone my own instance under my own HugFace account?

Guerdal commented 2 months ago

Ollama can be used for PdfSearchTool with

PDFSearchTool = PDFSearchTool(pdf=pdf_file_path,
    config=dict(
        llm=dict(
            provider="ollama", # or google, openai, anthropic, llama2, ...
                config=dict(
                    model="llama3:8b-instruct-q6_K",
                    base_url="http://ollama_server_ip:11434",
                ),
        ),
        embedder=dict(
            provider="ollama",
                config=dict(
                    model="mxbai-embed-large:latest",
                    base_url="http://192.168.42.173:11434",
            ),
        ),
    )
)

and pip install -U embedchain==0.1.103 But when you install this new version of embedchain an error occured

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
crewai-tools 0.2.3 requires chromadb<0.5.0,>=0.4.22, but you have chromadb 0.5.0 which is incompatible.

@joaomdmoura may be a little update in crewai-tools to enable chromadb 0.5.0 ? :-)

Timilla commented 2 months ago

Hello, I'm curious if anyone has experience using the PdfSearchTool alongside Groq as both the provider and embedder. I'm exploring this combination and would appreciate any insights or tips anyone might have.

yuriwa commented 2 months ago

I am trying to write a simple pdf agent which would answer questions on the basis of pdf knowledge

app.py

llm = Ollama(base_url = url,model=model,num_gpu=2)
rag_tool =  PDFSearchTool(
    pdf = r'pdf_path',
    config=dict(
        llm=dict(
            provider="ollama", # or google, openai, anthropic, llama2, ...
            config=dict(
                model="gemma",
                # temperature=0.5,
                # top_p=1,
                # stream=true,
            ),
        ),
        embedder=dict(
            provider="ollama",
            config=dict(
                model="nomic-embed-text",
                task_type="retrieval_document",
                # title="Embeddings",
            ),
        ),
    )
)

auditor_agent = Agent(
    role = 'Data Analyst',
    goal = 'You perfectly know how to analyze any data using provided txt file and searching info via RAG tool',
    Background = 'You are data expert',
    verbose = True,
    allow_delegation = False,
    tools = [rag_tool]
)

task = Task(
    description = "what is the latest status of ₹2000 bank notes",
    tools = [rag_tool],
    agent = auditor_agent,
    expected_output = '''The output should be in following format :
    Format:
    Word Limit : 25
    Writing style :  simple and logical
    '''
)
task1 = task.execute()
print(task1.output())

error:

schema.SchemaError: Key 'embedder' error:
Key 'provider' error:
Or('openai', 'gpt4all', 'huggingface', 'vertexai', 'azure_openai', 'google', 'mistralai', 'nvidia') did not validate 'ollama'
'openai' does not match 'ollama'
'gpt4all' does not match 'ollama'
'huggingface' does not match 'ollama'
'vertexai' does not match 'ollama'
'azure_openai' does not match 'ollama'
'google' does not match 'ollama'
'mistralai' does not match 'ollama'
'nvidia' does not match 'ollama'

@punitchauhan771, Langchain currently does not support ollama as an embedding provider. The reason, probably, is that ollama currenlty does not have an openai compatble (/v1) embedding endoint.

yuriwa commented 2 months ago

Hello, I'm curious if anyone has experience using the PdfSearchTool alongside Groq as both the provider and embedder. I'm exploring this combination and would appreciate any insights or tips anyone might have.

@Timilla, Langchain currently does not support groq as an embedding provider. The reason, probably, is that groq does not host embedding models.

punitchauhan771 commented 2 months ago

@punitchauhan771, Langchain currently does not support ollama as an embedding provider. The reason, probably, is that ollama currenlty does not have an openai compatble (/v1) embedding endoint.

Hi, I think langchain supports ollama as an embeddings provider

https://api.python.langchain.com/en/latest/embeddings/langchain_community.embeddings.ollama.OllamaEmbeddings.html

Also I think this is issue is likely due to embedchain module used in crew ai for embeddings, in previous versions it didn't had ollama as a provider.

Guerdal commented 2 months ago

As I write earlier the last version of embedchain (0.1.103) is compatible with ollama and need upgrade chromadb to 0.5.0. But crewai-tools need chromadb < 0.5.0 so we must wait @joaomdmoura to upgrade the requirements of crewai-tools :-)

joaomdmoura commented 2 months ago

Already updated on the new RC 0.30.0rc5 will probably push it live over the weekend / monday

Guerdal commented 2 months ago

I must have missed something :-) @joaomdmoura

pip install -U crewai[tools]==0.30.0rc5
.....
Requirement already satisfied: pycparser in ./.local/lib/python3.10/site-packages (from cffi>=1.4.1->pynacl>=1.4.0->PyGithub<2.0.0,>=1.59.1->embedchain<0.2.0,>=0.1.98->crewai[tools]==0.30.0rc5) (2.21)
Installing collected packages: crewai, crewai-tools
  Attempting uninstall: crewai
    Found existing installation: crewai 0.28.8
    Uninstalling crewai-0.28.8:
      Successfully uninstalled crewai-0.28.8
  Attempting uninstall: crewai-tools
    Found existing installation: crewai-tools 0.1.7
    Uninstalling crewai-tools-0.1.7:
      Successfully uninstalled crewai-tools-0.1.7
Successfully installed **crewai-0.30.0rc5 crewai-tools-0.2.3**

After => pip install -U embedchain==0.1.103 (to have ollama in embedings)

....
Installing collected packages: pypdf, chromadb, embedchain
  Attempting uninstall: pypdf
    Found existing installation: pypdf 3.17.4
    Uninstalling pypdf-3.17.4:
      Successfully uninstalled pypdf-3.17.4
  Attempting uninstall: chromadb
    Found existing installation: chromadb 0.4.23
    Uninstalling chromadb-0.4.23:
      Successfully uninstalled chromadb-0.4.23
  Attempting uninstall: embedchain
    Found existing installation: embedchain 0.1.102
    Uninstalling embedchain-0.1.102:
      Successfully uninstalled embedchain-0.1.102
**ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
crewai-tools 0.2.3 requires chromadb<0.5.0,>=0.4.22, but you have chromadb 0.5.0 which is incompatible.**
Successfully installed chromadb-0.5.0 embedchain-0.1.103 pypdf-4.2.0
yuriwa commented 2 months ago

Same here: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. crewai-tools 0.2.3 requires chromadb<0.5.0,>=0.4.22, but you have chromadb 0.5.0 which is incompatible.

joaomdmoura commented 2 months ago

oh ! sorry, looking into that

puffo commented 2 months ago

If we are able to get Ollama supported in the embeddings provider, it might also help solve some general failures with local LLM tool usage and memory.

There might be a need for clearer exception messages when embedding-requests fail while using tools. I noticed this when using the WebsiteSearchTool via openhermes; it requests embeddings from the ollama server using the unsupported openAI endpoint at /v1/embeddings/. The ollama server returns a 404 and I guess an error is shown, but it I thought it was related to the website, not the request for embeddings :)

The action then reliably results in a loop of errors:

I encountered an error while trying to use the tool. This was the error: 404 page not found.
 Tool Search in a specific website accepts these inputs: Search in a specific website(search_query: 'string', website: 'string')

Switching to "gpt4all" as the embedder provider stops the requests to ollama and fixes tool usage locally.

Could it be that the embedding mismatches/failures might also explain some broader problems with tool usage?

Anyway, I just thought I'd try connect some of the dots possibly related to this. Thanks for all the amazing effort on this project @joaomdmoura !

swayson commented 1 month ago

Just boosting signal that ollama support would be great!

SumaiyaSultan2002 commented 1 month ago

Hello, I'm curious if anyone has experience using the PdfSearchTool alongside Groq as both the provider and embedder. I'm exploring this combination and would appreciate any insights or tips anyone might have.

@Timilla, Langchain currently does not support groq as an embedding provider. The reason, probably, is that groq does not host embedding models.

in that case what can be an embedder for groq. any ideas?

jcoombes commented 1 month ago

It's a stopgap, but I've naively updated the chromadb and the embedchain. and Memory seems to work with the ollama provider now, I'm currently taking a look at making the MDXSearchTool work without an OPENAI_API_KEY.

(just bear in mind the base_url for embeddings lacks the /v1 that the other endpoints have.)

My pyproject.toml looks like this

[tool.poetry.dependencies]
python = "^3.12.1,<=3.13"
crewai-tools = { git = "https://github.com/jcoombes/crewai-tools.git", rev = "63d3ae1" }
crewai = { version = "^0.30.11" }
...etc

PR Here. https://github.com/joaomdmoura/crewAI-tools/pull/36

Orwlit commented 1 month ago

While it's not Ollama, you can run a local embedder by using the HugginFace provider. Here is an example.

        test_crew = Crew(
            agents=[reader, writer],
            tasks=[read_book, write_report],
            process=Process.sequential,
            cache=True,
            verbose=2,
            memory=True,
            embedder={
                "provider": "huggingface",
                "config": {
                    "model": "mixedbread-ai/mxbai-embed-large-v1", # https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1
                }
            }
        )

Thanks! This helps !!!

shivpatil1901 commented 1 month ago

File ~\AppData\Roaming\Python\Python311\site-packages\sentence_transformers\SentenceTransformer.py:1296, in SentenceTransformer._load_sbert_model(self, model_name_or_path, token, cache_folder, revision, trust_remote_code) 1294 else: ... 241 Dict[str, int]: The added tokens. 242 """ --> 243 return self._tokenizer.get_added_tokens_decoder()

AttributeError: 'tokenizers.Tokenizer' object has no attribute 'get_added_tokens_decoder'

I am getting this error if I run a local embedder(mixedbread-ai/mxbai-embed-large-v1) by using the HugginFace provider. Could someone please help me.

cblaison commented 1 month ago

I still get this error : ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. crewai-tools 0.2.3 requires chromadb<0.5.0,>=0.4.22, but you have chromadb 0.5.0 which is incompatible. Has it been resolved ?

deshraj commented 4 weeks ago

Hi, I am co-founder and cto of Embedchain here. We have fixed the issue on our side and the ollama embedder should work now. Please use embedchain>=0.1.107 and it should fix the issue.

Here is a test script that worked for me:

from crewai_tools import PDFSearchTool
import embedchain

print("embedchain version:", embedchain.__version__)

tool = PDFSearchTool(
    config=dict(
        llm=dict(
            provider="ollama",
            config=dict(
                model="gemma",
            ),
        ),
        embedder=dict(
            provider="ollama",
            config=dict(
                model="nomic-embed-text",
            ),
        ),
    )
)

print("tool config:", tool.config)

@joaomdmoura please feel free to test and close the issue accordingly.

leobilocastro commented 3 weeks ago
 embedder={
                "provider": "huggingface",
                "config": {
                    "model": "mixedbread-ai/mxbai-embed-large-v1", # https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1
                }
            }

i got this error TypeError: Pooling.init() got an unexpected keyword argument 'include_prompt' Does anyone know what is causing this?

Traceback (most recent call last): File "/home/bil/ollacrew/insta.py", line 145, in crew = Crew( File "/home/bil/.local/lib/python3.10/site-packages/pydantic/main.py", line 171, in init self.pydantic_validator__.validate_python(data, self_instance=self) File "/home/bil/.local/lib/python3.10/site-packages/crewai/crew.py", line 167, in create_crew_memory self._short_term_memory = ShortTermMemory(crew=self, embedder_config=self.embedder) File "/home/bil/.local/lib/python3.10/site-packages/crewai/memory/short_term/short_term_memory.py", line 16, in init storage = RAGStorage(type="short_term", embedder_config=embedder_config, crew=crew) File "/home/bil/.local/lib/python3.10/site-packages/crewai/memory/storage/rag_storage.py", line 75, in init__ self.app = App.from_config(config=config) File "/home/bil/.local/lib/python3.10/site-packages/embedchain/app.py", line 388, in from_config embedding_model = EmbedderFactory.create( File "/home/bil/.local/lib/python3.10/site-packages/embedchain/factory.py", line 79, in create return embedder_class(config=embedder_config_class(config_data)) File "/home/bil/.local/lib/python3.10/site-packages/embedchain/embedder/huggingface.py", line 14, in init embeddings = HuggingFaceEmbeddings(model_name=self.config.model) File "/home/bil/.local/lib/python3.10/site-packages/langchain_community/embeddings/huggingface.py", line 72, in init self.client = sentence_transformers.SentenceTransformer( File "/home/bil/.local/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 194, in init modules = self._load_sbert_model( File "/home/bil/.local/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 1073, in _load_sbert_model module = module_class.load(module_path) File "/home/bil/.local/lib/python3.10/site-packages/sentence_transformers/models/Pooling.py", line 198, in load return Pooling(config)

crisschan commented 3 weeks ago

Crew can g

If we are able to get Ollama supported in the embeddings provider, it might also help solve some general failures with local LLM tool usage and memory.

There might be a need for clearer exception messages when embedding-requests fail while using tools. I noticed this when using the WebsiteSearchTool via openhermes; it requests embeddings from the ollama server using the unsupported openAI endpoint at /v1/embeddings/. The ollama server returns a 404 and I guess an error is shown, but it I thought it was related to the website, not the request for embeddings :)

The action then reliably results in a loop of errors:

I encountered an error while trying to use the tool. This was the error: 404 page not found.
 Tool Search in a specific website accepts these inputs: Search in a specific website(search_query: 'string', website: 'string')

Switching to "gpt4all" as the embedder provider stops the requests to ollama and fixes tool usage locally.

Could it be that the embedding mismatches/failures might also explain some broader problems with tool usage?

Anyway, I just thought I'd try connect some of the dots possibly related to this. Thanks for all the amazing effort on this project @joaomdmoura !

I got the same error. Did you fixed?