tanussingh / Big-Data-Management-Analytics-Project

Final Project for CS 6350.001 - Large Scale Data Collection and preprocessing in Spark
3 stars 2 forks source link

Crawl new sites using Scrapy #11

Open ishansharma opened 5 years ago

ishansharma commented 5 years ago

Scrapy should be able to handle manual crawling for websites where news-please throws an error.

ishansharma commented 5 years ago

Scrapy didn't work but RSS + manual crawling has us at 1500 articles a day which should be good. I'll keep crawling till tomorrow evening.