issues
search
vectara
/
vectara-ingest
An open source framework to crawl data sources and ingest into Vectara
https://vectara.com
Apache License 2.0
120
stars
48
forks
source link
Tools for interfacing with crawlers
#39
Open
cjcenizal
opened
11 months ago
cjcenizal
commented
11 months ago
We've had a request to make it easy to do things like:
Crawl a sitemap file
Specify the HTML classes or selectors (eg: "title,.main-content") that describe particular kinds of information
Crawl a CSV of URLs
Crawl full website by following the robots.txt rules for the crawler's user-agent
We've had a request to make it easy to do things like: