upstash / degree-guru

AI chatbot for expert answers on university degrees
https://degreeguru.vercel.app/
119 stars 32 forks source link

httpx.UnsupportedProtocol #7

Closed MercureTony closed 3 months ago

MercureTony commented 3 months ago

I have followed all the steps in the README, but I end up with this error everytime I run this code

scrapy crawl configurable --logfile degreegurucrawl.log from the README. May you know why ? I haven't found a solution yet.

httpx.UnsupportedProtocol: Request URL is missing an 'http://' or 'https://' protocol.

An example of the crawl.yaml:

Capture d’écran, le 2024-03-15 à 21 34 34
CahidArda commented 3 months ago

Hi @MercureTony,

Can you also share the degreegurucrawl.log so that I can check where you got this error?

I tried to recreate the error and I was able to recreate it when I ran the crawler without setting the environment variables. Are the UPSTASH_VECTOR_REST_URL and UPSTASH_VECTOR_REST_TOKEN environment variables set as explained in our guide?

Another thing I want to note, your website makes calls to populate the page with content. Our crawler will not work that well for your website. In this case, we need to apply methods like pre-rendering with Javascript using scrapy-splash. We can try to integrate this into our crawler if you are interested.

MercureTony commented 3 months ago

Hi @CahidArda !

Thank you for getting back to me.

  1. I did set up the environment variables. See here:
Capture d’écran, le 2024-03-16 à 07 41 06
  1. I take note of this information, I didn't know. I'd love too if you can 😄 !
CahidArda commented 3 months ago

Hi @MercureTony,

It looks like you set the variables in the .env.local file. That file is for the chatbot. Crawler won't be able to read the environment variables from there. To set the environment variables for chatbot, you need to export them. If you are using mac you can run export UPSTASH_VECTOR_REST_URL=****. Then you will be able to run the crawler.

I will look into adding the feature to our crawler so that your website can be crawled. In the meantime, I can suggest crawling the content on your webpage manually to create a vector store. If you populate an Upstash vector store, you can still use our app to run the chatbot.

MercureTony commented 3 months ago

Definitely! Thank you for letting me know, I'll test it. And I'll try to implement the feature too in meanwhile. Let's keep in touch. 🕺 😃

buggyhunter commented 3 months ago

Hey @MercureTony please feel free to contribute this project, appreciated 👍