Closed MercureTony closed 3 months ago
Hi @MercureTony,
Can you also share the degreegurucrawl.log
so that I can check where you got this error?
I tried to recreate the error and I was able to recreate it when I ran the crawler without setting the environment variables. Are the UPSTASH_VECTOR_REST_URL
and UPSTASH_VECTOR_REST_TOKEN
environment variables set as explained in our guide?
Another thing I want to note, your website makes calls to populate the page with content. Our crawler will not work that well for your website. In this case, we need to apply methods like pre-rendering with Javascript using scrapy-splash. We can try to integrate this into our crawler if you are interested.
Hi @CahidArda !
Thank you for getting back to me.
Hi @MercureTony,
It looks like you set the variables in the .env.local
file. That file is for the chatbot. Crawler won't be able to read the environment variables from there. To set the environment variables for chatbot, you need to export them. If you are using mac you can run export UPSTASH_VECTOR_REST_URL=****
. Then you will be able to run the crawler.
I will look into adding the feature to our crawler so that your website can be crawled. In the meantime, I can suggest crawling the content on your webpage manually to create a vector store. If you populate an Upstash vector store, you can still use our app to run the chatbot.
Definitely! Thank you for letting me know, I'll test it. And I'll try to implement the feature too in meanwhile. Let's keep in touch. 🕺 😃
Hey @MercureTony please feel free to contribute this project, appreciated 👍
I have followed all the steps in the
README
, but I end up with this error everytime I run this codescrapy crawl configurable --logfile degreegurucrawl.log
from theREADME
. May you know why ? I haven't found a solution yet.httpx.UnsupportedProtocol: Request URL is missing an 'http://' or 'https://' protocol.
An example of the
crawl.yaml
: