langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.02k stars 14.64k forks source link

Pinecone.from_documents() does not create the index if it does not exist #5748

Open jeromemassot opened 1 year ago

jeromemassot commented 1 year ago

System Info

langchain == 0.0.190 Google Colab

Who can help?

No response

Information

Related Components

Reproduction

docsearch = Pinecone.from_documents(chunks, embeddings, index_name=index_name) calls from_text which returns ValueError: No active indexes found in your Pinecone project, are you sure you're using the right API key and environment? if the index does not exist.

if index_name in indexes: index = pinecone.Index(index_name) elif len(indexes) == 0: raise ValueError( "No active indexes found in your Pinecone project, " "are you sure you're using the right API key and environment?" )

Expected behavior

The method should create the index if it does not exist. This is the expected behavior from the documentation and the official examples.

Creating a new index if it does not exist is the only recommended behavior as it implies that the dimension of the index is always consistent with the dimension of the embedding model used.

Thanks

Best regards

Jerome

taimurshaikh commented 1 year ago

From docs/modules/indexes/vectorstores/examples/pinecone.ipynb:

docsearch = Pinecone.from_documents(docs, embeddings, index_name=index_name)
# if you already have an index, you can load it like this
# docsearch = Pinecone.from_existing_index(index_name, embeddings)

From my understanding, the example here is highlighting that, if you have an existing index, you should use from_existing_index() rather than from_documents(). This might be due to concerns over accidentally overwriting existing indexes. Hope this helps!

jeromemassot commented 1 year ago

I do not have an existing index, I need to create a new one. So nope, I do not want to use from_existing_index() but instead from_documents() and I am expecting this method to create the index to be populated.

taimurshaikh commented 1 year ago

Ah, my apologies. I misunderstood.

In a project of mine, my control flow is looking something like this and it is working well, although the init process is unbearably slow:

# Initialize Pinecone
pinecone.init(
    api_key=PINECONE_API_KEY,
    environment=PINECONE_API_ENV 
)

# Create and configure index if doesn't already exist
if index_name not in pinecone.list_indexes():
    pinecone.create_index(
        name=index_name, 
        metric="cosine",
        dimension=DIMENSIONS)
    docsearch = Pinecone.from_documents(docs, embeddings, index_name=index_name)

else:
    docsearch = Pinecone.from_existing_index(index_name, embeddings)
Abe410 commented 1 year ago

Has Pinecone.from_documents been deprecated now?

TanmayAT commented 9 months ago

pinecone.from_documents()

doesn't exist

ashwinivader commented 8 months ago

add from langchain.vectorstores import Pinecone

It resolved my issue

mekhiya commented 7 months ago

add from langchain.vectorstores import Pinecone

It resolved my issue

WORKS!!!!

mdsimarspan commented 7 months ago

it has been depreciated, to add vector to Pinecone use the index.upsert() command from Pinecone docs. But I am not sure how to we can add this into langchain.

takashi57 commented 7 months ago

I spent the past 2 days trying to figure this out so hopefully this might help people but for anyone recently trying to run functions such as "from_documents" or "from_texts" you'll likely run into an error where there is a namespace collision between Langchain and Pinecone. In order to get around this you'll need to rename it such as

from langchain_community.vectorstores import Pinecone as PineconeStore and then you should be able to use the function docsearch = PineconeStore.from_documents(text, embedding, index_name)

PrabhuRajendhran commented 7 months ago
@takashi57 As langchain_community is deprecated, you can follow the below steps 

!pip install langchain-pinecone

from langchain_pinecone import Pinecone

import os 
os.environ['PINECONE_API_KEY'] = pinecone_api_key

vector_database_index = Pinecone.from_documents(
                                            index_name = index_name, 
                                            documents = chunks, 
                                            embedding = embedding)

This will help to resolve the issue.
sushantsk1 commented 7 months ago

im getting this errro when i use your snippet type object 'Pinecone' has no attribute 'from_documents'

takashi57 commented 7 months ago

@sushantsk1 did you try the import line?

from langchain_community.vectorstores import Pinecone as PineconeStore When I had that issue my issue was the namespace collision as I listed above. But if that's still having that error it may be another issue.

GauravYS commented 6 months ago
@takashi57 As langchain_community is deprecated, you can follow the below steps 

!pip install langchain-pinecone

from langchain_pinecone import Pinecone

import os 
os.environ['PINECONE_API_KEY'] = pinecone_api_key

vector_database_index = Pinecone.from_documents(
                                            index_name = index_name, 
                                            documents = chunks, 
                                            embedding = embedding)

This will help to resolve the issue.

Helpful !

sriramancr commented 6 months ago

import os os.environ['PINECONE_API_KEY'] = pinecone_api_key

what is this and how do you set this in colab ?

---- this is what i tried -----

%pip install langchain-pinecone --q from langchain_pinecone import Pinecone as LCPC ndx = LCPC.from_documents(index_name=index_name,documents=docs,embedding=emb1)

**** Error: PineconeConfigurationError: You haven't specified an Api-Key.

harsh-vardhan7695 commented 5 months ago

Either you can use this, pc = Pinecone(api_key= "a1794dab---****-1fb4ba4cde61") # yourapikey index_name = pc.Index("langchainvector")

Or

pass the key in your env variable and call os.getenv('api_key') in above statement.

KBaheti commented 5 months ago
@takashi57 As langchain_community is deprecated, you can follow the below steps 

!pip install langchain-pinecone

from langchain_pinecone import Pinecone

import os 
os.environ['PINECONE_API_KEY'] = pinecone_api_key

vector_database_index = Pinecone.from_documents(
                                            index_name = index_name, 
                                            documents = chunks, 
                                            embedding = embedding)

Thanks, This will help to resolve the issue.

YvesLoic5 commented 4 months ago

import os os.environ['PINECONE_API_KEY'] = pinecone_api_key

what is this and how do you set this in colab ?

---- this is what i tried -----

%pip install langchain-pinecone --q from langchain_pinecone import Pinecone as LCPC ndx = LCPC.from_documents(index_name=index_name,documents=docs,embedding=emb1)

**** Error: PineconeConfigurationError: You haven't specified an Api-Key.

Did you resolve this issue? if yes, how was it?

yashrooprai commented 2 months ago

import os os.environ['PINECONE_API_KEY'] = pinecone_api_key what is this and how do you set this in colab ? ---- this is what i tried ----- %pip install langchain-pinecone --q from langchain_pinecone import Pinecone as LCPC ndx = LCPC.from_documents(index_name=index_name,documents=docs,embedding=emb1) **** Error: PineconeConfigurationError: You haven't specified an Api-Key.

Did you resolve this issue? if yes, how was it?

Little Late, but if you still facing issue Do like this in colab

  1. create a .env file in which have a key PINECONE_API_KEY with value as your pinecone api key
  2. write this code
pinecone_api_key = os.environ['PINECONE_API_KEY']

vector_database_index = Pinecone.from_documents(
                                            index_name = index_name, 
                                            documents = doc, 
                                            embedding = embeddings)

change the doc or embeddings according to your requirement this should work

YvesLoic5 commented 2 months ago

import os os.environ['PINECONE_API_KEY'] = pinecone_api_key what is this and how do you set this in colab ? ---- this is what i tried ----- %pip install langchain-pinecone --q from langchain_pinecone import Pinecone as LCPC ndx = LCPC.from_documents(index_name=index_name,documents=docs,embedding=emb1) **** Error: PineconeConfigurationError: You haven't specified an Api-Key.

Did you resolve this issue? if yes, how was it?

Little Late, but if you still facing issue Do like this in colab

1. create a .env file in which have a key PINECONE_API_KEY with value as your pinecone api key

2. write this code
pinecone_api_key = os.environ['PINECONE_API_KEY']

vector_database_index = Pinecone.from_documents(
                                            index_name = index_name, 
                                            documents = doc, 
                                            embedding = embeddings)

change the doc or embeddings according to your requirement this should work

It doesn't matter, thank you!