Fix 403 Forbidden error while crawling

openai / web-crawl-q-and-a-example

Learn how to crawl your website and build a Q/A bot with the OpenAI API

https://platform.openai.com/docs/tutorials/web-qa-embeddings

271 stars 181 forks source link

Fix 403 Forbidden error while crawling #2

Open tylerkennedy opened 11 months ago

tylerkennedy commented 11 months ago

I changed the domain and root URL to crawl a different website. After changing the domain to something other than openai.com, I started getting a 403 error on the web requests made by the crawler:

HTTP Error 403: Forbidden

The simple fix was to add a user agent to the web requests in the crawler. This should prevent the issue for anyone else who tries to run the crawler on a different domain

ItamarRocha commented 10 months ago

The change makes sense to me. I am not one of the maintainers of the repository, but one thing I would add would be to make the same change in the jupyter notebook.

tylerkennedy commented 10 months ago

@ItamarRocha Thanks, good call. I updated the jupyter notebook