langflow-ai / langflow

⛓️ Langflow is a visual framework for building multi-agent and RAG applications. It's open-source, Python-powered, fully customizable, model and vector store agnostic.
http://www.langflow.org
MIT License
20.02k stars 2.97k forks source link

AstraDB component creates collection in Astra, but doesn't upsert data to collection #2427

Open V1999B opened 4 days ago

V1999B commented 4 days ago

image

Simple upsert flow runs successfully, AstraDB component is built successfully, it creates the new collection within the target DB, but doesn't upload any data, the collection remains empty, although no errors are reported on any stage of the process

Browser and Version

dosubot[bot] commented 4 days ago

Hey @V1999B! I'm here to help with any bugs, questions, or feedback you have. I can also assist with becoming a contributor. Let's tackle this issue together!

The issue with the AstraDB component not upserting data into the collection, despite successfully creating the collection, could be due to several factors. Here are some steps to troubleshoot and resolve this issue:

  1. Check Data Type: Ensure that the data being ingested is of type Data. The _add_documents_to_vector_store method requires inputs to be Data objects. If the input data is not of this type, it will not be processed correctly [1].

  2. Verify Configuration: Double-check that all necessary fields (e.g., collection_name, token, api_endpoint, embedding) are correctly set and valid. Incorrect configuration can prevent data from being uploaded [2].

  3. Inspect Logs: Look at the logs for any hidden errors or warnings that might indicate what is going wrong during the data upsertion process. Even if no errors are reported in the UI, there might be useful information in the logs.

  4. Embedding Check: Ensure that the embedding field is correctly set. If the embedding is not properly configured, the documents will not be added to the vector store [1].

Here is a snippet of the relevant code that handles the data upload:

def _add_documents_to_vector_store(self, vector_store):
    documents = []
    for _input in self.ingest_data or []:
        if isinstance(_input, Data):
            documents.append(_input.to_lc_document())
        else:
            raise ValueError("Vector Store Inputs must be Data objects.")

    if documents and self.embedding is not None:
        logger.debug(f"Adding {len(documents)} documents to the Vector Store.")
        try:
            vector_store.add_documents(documents)
        except Exception as e:
            raise ValueError(f"Error adding documents to AstraDBVectorStore: {str(e)}") from e
    else:
        logger.debug("No documents to add to the Vector Store.")

By ensuring that the data is correctly formatted and all configurations are properly set, you should be able to resolve the issue with data not being uploaded to the Astra DB collection [1][2].

To continue talking to Dosu, mention @dosu.

nicoloboschi commented 4 days ago

Hi @V1999B I just tried exactly the same flow on docker 1.0.5. Works for me.

The fact the collection exists means the astra configuration is correct. My doubt is about the file being empty? Could you add "parse data" + "chat output" to debug? (see the screenshot) Screenshot 2024-06-28 at 16 50 58

after that you should be able to see what is being parsed inside the "Playground"

V1999B commented 4 days ago

Thank you very much for your prompt reply!

File is not empty for sure. I've checked the output from File component and from Splitter - they show the content I'm uploading. I'm on my phone now, so cannot send you the screenshot, but can send later.

The file I tried to upload is just a Nvidia research article, pdf, about 30 pages, for test.

I used the modified embedding component - I added to it the option to set the dimensions parameter, as it was for some reason removed in v1.05 (why by the way? Very useful), everything else was the same as in the standard, set the dimensions to 3072, as shown in the screenshot. It creates collection in Astra with 3072 dimensions, but doesn't upload the data, collection remains empty.

But I also tried to do it with the standard Embeddings component, same situation.

Maybe there are some minor parameters that do not match? Don't see any errors in logs, but the data is not uploaded.

Volodymyr Bandura

CEO

www.innolyticsgroup.comhttp://www.innolyticsgroup.com/

This email and the information contained in it and in any attachments are confidential and may be privileged. If you have received this email in error please notify us immediately. You are not authorized to and must disclose, copy, distribute or retain this email or any part of it.


From: Nicolò Boschi @.> Sent: Friday, June 28, 2024 3:52:14 PM To: langflow-ai/langflow @.> Cc: Volodymyr Bandura @.>; Mention @.> Subject: Re: [langflow-ai/langflow] AstraDB component creates collection in Astra, but doesn't upsert data to collection (Issue #2427)

Hi @V1999Bhttps://github.com/V1999B I just tried exactly the same flow on docker 1.0.5. Works for me.

The fact the collection exists means the astra configuration is correct. My doubt is about the file being empty? Could you add "parse data" + "chat output" to debug? (see the screenshot) Screenshot.2024-06-28.at.16.50.58.png (view on web)https://github.com/langflow-ai/langflow/assets/23314389/2867c4bc-e939-4991-b905-2c9c075f44e1

after that you should be able to see what is being parsed inside the "Playground"

— Reply to this email directly, view it on GitHubhttps://github.com/langflow-ai/langflow/issues/2427#issuecomment-2197116324, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6NTJWGD6DYWP46IG4LVLCTZJV2B5AVCNFSM6AAAAABKBUTRZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJXGEYTMMZSGQ. You are receiving this because you were mentioned.Message ID: @.***>

PlusA2M commented 2 days ago

Hi, @V1999B ~ I've been here since I've encountered very same issue with you, after taking look of the component code, I found that some of the code is missing in my langflow v1.0.5, which is very weird.

You may add the following code into the AstraDB component or maybe just reinstall the whole langflow so you're getting the latest version of it. The issue will be fixed afterwards.

https://github.com/langflow-ai/langflow/blob/bcf5807262fd20e83d17b91c1d066a24a2fda4b4/src/backend/base/langflow/components/vectorstores/AstraDB.py#L199-L201

if hasattr(self, "ingest_data") and self.ingest_data:
            logger.debug("Ingesting data into the Vector Store.")
            self._add_documents_to_vector_store(vector_store)
image
V1999B commented 2 days ago

Many thanks, glad we solved it, will try to test it later today.

Actually yesterday I've tried to do the same in Astra cloud Langflow.

First, very strange to see some really old version, I have 0.11 or something in the cloud, and it really looks different from 1.05 installed with docker. And given that it is in Astra cloud I expected some examplary installation of the latest version with best setup possible, but really doesn't look so. And the store is also not available.

The good thing that even so the Astradb component worked and the test file was vectorized and upserted to Astra vector Db. But even so, not the whole file was upserted. For some reason only 30 vectors. I've played with different batches settings, it worked reliably with 10 batches, had problems with more. But the strange thing is that if I set batches processing parameter to 10 it only upserts 10 vectors and stops. Maybe it is the problem of the old version.

Will try with the latest version as you proposed and also to update the code of this specific component, as you proposed as a first step. As it takes time to move stuff to the new image. Want to try a more simple solutions first.

Volodymyr Bandura

CEO

www.innolyticsgroup.comhttp://www.innolyticsgroup.com/

This email and the information contained in it and in any attachments are confidential and may be privileged. If you have received this email in error please notify us immediately. You are not authorized to and must disclose, copy, distribute or retain this email or any part of it.


From: PlusA2M @.> Sent: Sunday, June 30, 2024 12:46:58 PM To: langflow-ai/langflow @.> Cc: Volodymyr Bandura @.>; Mention @.> Subject: Re: [langflow-ai/langflow] AstraDB component creates collection in Astra, but doesn't upsert data to collection (Issue #2427)

Hi, @V1999Bhttps://github.com/V1999B ~ I've been here since I've encountered very same issue with you, after taking look of the component code, I found that some of the code is missing in my langflow v1.0.5, which is very weird.

You may add the following code into the AstraDB component or maybe just reinstall the whole langflow so you're getting the latest version of it. The issue will be fixed afterwards.

https://github.com/langflow-ai/langflow/blob/bcf5807262fd20e83d17b91c1d066a24a2fda4b4/src/backend/base/langflow/components/vectorstores/AstraDB.py#L199-L201

if hasattr(self, "ingest_data") and self.ingest_data: logger.debug("Ingesting data into the Vector Store.") self._add_documents_to_vector_store(vector_store)

image.png (view on web)https://github.com/langflow-ai/langflow/assets/18495330/8c671493-6fbe-478b-88b1-80b8fc2c42bc

— Reply to this email directly, view it on GitHubhttps://github.com/langflow-ai/langflow/issues/2427#issuecomment-2198534290, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6NTJWBASMTETMX5EMOMNCTZJ7V3FAVCNFSM6AAAAABKBUTRZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJYGUZTIMRZGA. You are receiving this because you were mentioned.Message ID: @.***>

V1999B commented 2 days ago

Thank you, tested this code you offered, replaced the previous code of the Astra component.

Now it works with the same credentials, which is good certainly.

However, when I run this Astra component, it has the same problem as the one in the Astra Cloud. Without definition of Batch Size, it says the request is “413 Request Entity Too Large”. When I set some value of Batch size, like 10 for – it works, but upserts only 20-30 first vectors (of 170 chunks from the previous component – Recursive Text splitter in this case – for a sample Nvidia PDF article).

I don’t see any documentation how to deal with this situation – would you have any suggestions?

Best regards,

Volodymyr Bandura

CEO

www.innolyticsgroup.comhttp://www.innolyticsgroup.com/

This email and the information contained in it and in any attachments are confidential and may be privileged. If you have received this email in error please notify us immediately.

From: PlusA2M @.> Date: Sunday, 30 June 2024 at 12:47 To: langflow-ai/langflow @.> Cc: Volodymyr Bandura @.>, Mention @.> Subject: Re: [langflow-ai/langflow] AstraDB component creates collection in Astra, but doesn't upsert data to collection (Issue #2427)

Hi, @V1999Bhttps://github.com/V1999B ~ I've been here since I've encountered very same issue with you, after taking look of the component code, I found that some of the code is missing in my langflow v1.0.5, which is very weird.

You may add the following code into the AstraDB component or maybe just reinstall the whole langflow so you're getting the latest version of it. The issue will be fixed afterwards.

https://github.com/langflow-ai/langflow/blob/bcf5807262fd20e83d17b91c1d066a24a2fda4b4/src/backend/base/langflow/components/vectorstores/AstraDB.py#L199-L201

if hasattr(self, "ingest_data") and self.ingest_data:

        logger.debug("Ingesting data into the Vector Store.")

        self._add_documents_to_vector_store(vector_store)

image.png (view on web)https://github.com/langflow-ai/langflow/assets/18495330/8c671493-6fbe-478b-88b1-80b8fc2c42bc

— Reply to this email directly, view it on GitHubhttps://github.com/langflow-ai/langflow/issues/2427#issuecomment-2198534290, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6NTJWBASMTETMX5EMOMNCTZJ7V3FAVCNFSM6AAAAABKBUTRZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJYGUZTIMRZGA. You are receiving this because you were mentioned.Message ID: @.***>

V1999B commented 2 days ago

Ok, update, it seems to work well, it just takes time to update the number of vectors/records in the collection after Upserting, so as I waited, discovered everything works well. Already tested on several docs, seems to be solved. Let’s see how it does in a more systematic work.

Best regards,

Volodymyr Bandura

CEO

www.innolyticsgroup.comhttp://www.innolyticsgroup.com/

This email and the information contained in it and in any attachments are confidential and may be privileged. If you have received this email in error please notify us immediately.

From: Volodymyr Bandura @.> Date: Sunday, 30 June 2024 at 13:15 To: langflow-ai/langflow @.>, langflow-ai/langflow @.> Cc: Mention @.> Subject: Re: [langflow-ai/langflow] AstraDB component creates collection in Astra, but doesn't upsert data to collection (Issue #2427) Many thanks, glad we solved it, will try to test it later today.

Actually yesterday I've tried to do the same in Astra cloud Langflow.

First, very strange to see some really old version, I have 0.11 or something in the cloud, and it really looks different from 1.05 installed with docker. And given that it is in Astra cloud I expected some examplary installation of the latest version with best setup possible, but really doesn't look so. And the store is also not available.

The good thing that even so the Astradb component worked and the test file was vectorized and upserted to Astra vector Db. But even so, not the whole file was upserted. For some reason only 30 vectors. I've played with different batches settings, it worked reliably with 10 batches, had problems with more. But the strange thing is that if I set batches processing parameter to 10 it only upserts 10 vectors and stops. Maybe it is the problem of the old version.

Will try with the latest version as you proposed and also to update the code of this specific component, as you proposed as a first step. As it takes time to move stuff to the new image. Want to try a more simple solutions first.

Volodymyr Bandura

CEO

www.innolyticsgroup.comhttp://www.innolyticsgroup.com/

This email and the information contained in it and in any attachments are confidential and may be privileged. If you have received this email in error please notify us immediately. You are not authorized to and must disclose, copy, distribute or retain this email or any part of it.


From: PlusA2M @.> Sent: Sunday, June 30, 2024 12:46:58 PM To: langflow-ai/langflow @.> Cc: Volodymyr Bandura @.>; Mention @.> Subject: Re: [langflow-ai/langflow] AstraDB component creates collection in Astra, but doesn't upsert data to collection (Issue #2427)

Hi, @V1999Bhttps://github.com/V1999B ~ I've been here since I've encountered very same issue with you, after taking look of the component code, I found that some of the code is missing in my langflow v1.0.5, which is very weird.

You may add the following code into the AstraDB component or maybe just reinstall the whole langflow so you're getting the latest version of it. The issue will be fixed afterwards.

https://github.com/langflow-ai/langflow/blob/bcf5807262fd20e83d17b91c1d066a24a2fda4b4/src/backend/base/langflow/components/vectorstores/AstraDB.py#L199-L201

if hasattr(self, "ingest_data") and self.ingest_data:

        logger.debug("Ingesting data into the Vector Store.")

        self._add_documents_to_vector_store(vector_store)

image.png (view on web)https://github.com/langflow-ai/langflow/assets/18495330/8c671493-6fbe-478b-88b1-80b8fc2c42bc

— Reply to this email directly, view it on GitHubhttps://github.com/langflow-ai/langflow/issues/2427#issuecomment-2198534290, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6NTJWBASMTETMX5EMOMNCTZJ7V3FAVCNFSM6AAAAABKBUTRZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJYGUZTIMRZGA. You are receiving this because you were mentioned.Message ID: @.***>