run-llama / llama-hub

A library of data loaders for LLMs made by the community -- to be used with LlamaIndex and/or LangChain
https://llamahub.ai/
MIT License
3.44k stars 731 forks source link

feat: Add full site BFS scraping loader #827

Closed an-bluecat closed 7 months ago

an-bluecat commented 8 months ago

Description

Currently, there is no loader to scrape the full site. Instead, users need to scrape all the links themselves. With this loader, users can specify an entry point and a prefix to scrape and store all texts within a site.

Type of Change

How Has This Been Tested?

tried modifying the download repo link and tried downloading it and tested basic functionality:

image

Suggested Checklist: