hlp-ai / mt-data

MT Data
Apache License 2.0
1 stars 2 forks source link

Focused crawler with Python #5

Closed hlp-ai closed 1 year ago

hlp-ai commented 1 year ago

Given a set of domains or web sites, crawl pages in them.

  1. Multi-thread
  2. Only crawl pages in given domains
  3. Keep polite to web sites
  4. Store crawled pages
hlp-ai commented 1 year ago

Done