truefoundry / cognita

RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
https://cognita.truefoundry.com
Apache License 2.0
3.27k stars 267 forks source link

Invalid URL '/embeddings': No scheme supplied. Perhaps you meant https:///embeddings? #174

Closed j-pielen closed 5 months ago

j-pielen commented 5 months ago

Don't know whats going on, i have configured everything so it should work locally with ollama but this comes up while running python3 -m local.ingest

  return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/hans/cognita/local/ingest.py", line 6, in <module>
    from backend.indexer.indexer import ingest_data as ingest_data_to_collection
  File "/home/hans/cognita/backend/indexer/indexer.py", line 14, in <module>
    from backend.modules.metadata_store.client import METADATA_STORE_CLIENT
  File "/home/hans/cognita/backend/modules/metadata_store/client.py", line 4, in <module>
    METADATA_STORE_CLIENT = get_metadata_store_client(config=settings.METADATA_STORE_CONFIG)
  File "/home/hans/cognita/backend/modules/metadata_store/base.py", line 212, in get_metadata_store_client
    return METADATA_STORE_REGISTRY[config.provider](config=config.config)
  File "/home/hans/cognita/backend/modules/metadata_store/local.py", line 69, in __init__
    VECTOR_STORE_CLIENT.create_collection(
  File "/home/hans/cognita/backend/modules/vector_db/qdrant.py", line 46, in create_collection
    partial_embeddings = embeddings.embed_documents(["Initial document"])
  File "/home/hans/cognita/backend/modules/embedder/embedding_svc.py", line 44, in embed_documents
    return self.call_embedding_service(texts, "documents")
  File "/home/hans/cognita/backend/modules/embedder/embedding_svc.py", line 38, in call_embedding_service
    response = requests.post(self.url.rstrip("/") + "/embeddings", json=payload)
  File "/home/hans/.local/lib/python3.10/site-packages/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
  File "/home/hans/.local/lib/python3.10/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/hans/.local/lib/python3.10/site-packages/requests/sessions.py", line 575, in request
    prep = self.prepare_request(req)
  File "/home/hans/.local/lib/python3.10/site-packages/requests/sessions.py", line 486, in prepare_request
    p.prepare(
  File "/home/hans/.local/lib/python3.10/site-packages/requests/models.py", line 368, in prepare
    self.prepare_url(url, params)
  File "/home/hans/.local/lib/python3.10/site-packages/requests/models.py", line 439, in prepare_url
    raise MissingSchema(
requests.exceptions.MissingSchema: Invalid URL '/embeddings': No scheme supplied. Perhaps you meant https:///embeddings?

My .env

METADATA_STORE_CONFIG='{"provider":"local","config":{"path":"local.metadata.yaml"}}'
VECTOR_DB_CONFIG='{"provider":"qdrant","local":"true"}'

DEBUG_MODE=true
LOG_LEVEL="DEBUG"
LOCAL=true

# If Ollama is installed in the system
OLLAMA_URL="http://localhost:11434"

local.metadata.yaml

collection_name: creditcard
data_source:
    type: localdir
    uri: sample-data/creditcards
parser_config:
    chunk_size: 512
    chunk_overlap: 40
    parser_map:
        ".pdf": PdfTableParser
embedder_config:
    provider: embedding-svc
    config:
        model: "mixedbread-ai/mxbai-embed-large-v1"
S1LV3RJ1NX commented 5 months ago

to use embedding-svc you need to have instance of infinity API running. https://github.com/michaelfeil/infinity

Else, you can install additional requirements for embedder and use provider as mixed-bread.

Make sure to enable the provider from backend/embedder/__init__.py

j-pielen commented 5 months ago

Thanks for your response @S1LV3RJ1NX ! When embedding-svc is the default, and it's mandatory to use the infinity API, is there a part that I have missed in the readme?

Also when I want to use mixedbread as an provider, i got to the backend/embedder/init.py adjust the code as you mentioned like this:

if settings.OPENAI_API_KEY:
    from langchain.embeddings.openai import OpenAIEmbeddings

    register_embedder("openai", OpenAIEmbeddings)

register_embedder("truefoundry", TrueFoundryEmbeddings)

# Using embedding th' a deployed service such as Infinity API
register_embedder("embedding-svc", InfinityEmbeddingSvc)

# Register the MixBreadEmbeddings class if required
from backend.modules.embedder.mixbread_embedder import MixBreadEmbeddings
register_embedder("mixbread", MixBreadEmbeddings)

then the local.metadata.yaml like this:


collection_name: creditcard
data_source:
    type: localdir
    uri: sample-data/creditcards
parser_config:
    chunk_size: 512
    chunk_overlap: 40
    parser_map:
        ".pdf": PdfTableParser
embedder_config:
    provider: mixedbread
    config:
        model: "mixedbread-ai/mxbai-embed-large-v1"

but still get this error: ValueError: No embedder registered with provider mixedbread

S1LV3RJ1NX commented 5 months ago

The docs were in progress. Docs are now updated to use Infinity service. you might have to delete the old collection from qdrant folder before proceeding ahead.