Closed Moshie1112 closed 1 month ago
Hi @Moshie1112, would you be able to drop the document that you are trying to upload, as well as the chunking settings you are using so that I can try to replicate the issue? I have tried with this youtube video transcript How_to_grow_your_SRE_practice.txt. I've verified this configuration works at least on my machine: The SimpleReader (since it's a text file), the TokenChunker set to 750 units and a 250 overlap, and the MiniLmEmbedder.
I get the same issue, but a different chunk mismatch. This is the document I was using with the PDFReader.
My TokenChunker is set to 250 with a 50 overlap and I'm using the ADAEmbedder.
I'm running Verba using Docker Compose.
Edit: Turns out @cam-barts that if I use your .txt
file in your message that I get the same error as @Moshie1112 and I are getting.
I am facing exactly same issue "Chunk mismatch for 1f6e1308-08a3-4f98-b52c-424fe71a39c0 0 != 2" with ADAEmbedder. Any help would be appreciated.
Thanks
We improved the Reader functionality in the newest release, it should now support all basic file types! Let me know if the error still persists
I'm facing the same issue in the latest release v1.0.2. The error message is "Chunk mismatch for e1831290-33d6-4724-9661-64245306bf53 0 != 168" when I uploading the file README.md. My TokenChunker is set to 50 with a 20 overlap and I'm using the ADAEmbedder.
Are you encountering any errors in the CLI? Did you verify that your OpenAI key is working?
yes Azure openai key is correct. Error occurred while executing verba start in python venv.
Just in case anyone else had this issue - I had same problem but found it was because I ran out of credits on my openapi account :)
if you check console output you may see
✘ {'errors': {'error': [{'message': 'update vector: connection to: OpenAI API failed with status: 429 error: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs:
My documents are txt. They either Load documents no chunks; or Load 0 documents with no chunks; or Chunk mismatch for 1fa2a323-d32c-4a87-89fc-4566c56d30fd 0 != 37
I do not know what to do. I am trying to load Youtube transcripts, if that matters... help