Open prdgymx opened 1 year ago
Is it happening for all prompts or just this one ?
I seem to have the same problem with an indexed website. I can apparently only retrieve information from <rootURL>/index.html
but not from subpages <rootURL>FolderName/index.html
which is referenced in as
href="FolderName/index.html"`.
I noted also that my Pinecone index says it has 0 vectors, though OpenChat says it has successfully indexed the website.
PS: I'd have a couple of other questions, in particular on how to adapt the bot to my requirements. Would you mind setting up a kind of community support forum?
@ingodahn I noticed this situation before on large pdf file upload, and the reason was that the connection for uploading the file was lost. To avoid this, we should ensure the file finishes uploading before moving to the next page.
Is there a timeline for correcting this bug? let me know if I can help with testing.
@ingodahn I plan to have this resolved by the end of day Monday. It would help me out if you could provide a link to the PDF file or upload it. That way I can review it and try to fix the issue. Please let me know if you can send the file my way.
Great. My use case requires crawling a web site, not a pdf document. The website I am working with is https://netmath.vcrp.de/downloads/Skripte/Vorkurs/HSWildau/. You can download it zipped here. Only linked pages with links with URLs extending the root URL need to be indexed.
Hi @ingodahn ,
The first 10 pages of your website were indeed scanned. But i do see the problem here
Perhaps it's related to the fact that the website is in German. However, I'm not entirely certain at this time. I will look further into this
Hi, The URLS listed in your first screenshot under URL are outside the root and don't need to be tracked. Moreover these contain dynamically generated content which cannot be indexed anyhow.
Here is the German summary of the complete start page generated by Bing. Only the first paragraph of the summary is correct. I don't know why it hallucinates about LK-99, this doesn't occur in the source code. When I talked back it confirmed that the page says nothing about LK-99 and gave a correct short summary: When I asked for an English summary of the complete start page, it gave a correct and detailed summary: Hope this helps.
Hello!
We are running the bot via the website for now to test things out before deploying our own, however we are encountering an issue where the uploaded PDF files are showing as successful, however the specific information inside of them is not being referenced by the bot.
For example, one section of a PDF has an "answering machine script" - but if I ask the bot what the answering machine script is, it tells me it cannot answer questions out of context.
I am wondering if the initial prompt is preventing the bot from parsing or using the data held within the PDFS? Our initial prompt is as follows:
Is there something I am missing here? Thanks for the great product so far! Looking forward to growing and expanding along with its progress.