What would be the best way to implement web scraping into this solution?

marella / chatdocs

Chat with your documents offline using AI.

MIT License

684 stars 99 forks source link

What would be the best way to implement web scraping into this solution? #23

Closed esskeey closed 1 year ago

esskeey commented 1 year ago

Hi again,

I am looking into langchain and there are so many tools available (doc loaders). I am looking at making this solution go online and scrape websites as an alternative to documents. How would you suggest to do that as easily as possible?

marella commented 1 year ago

Hi, you can look at tools like scrapy but you should also check the copyright/license, robots.txt etc of websites to see if you are allowed to scrape them.