Open Ricksanchez000 opened 8 months ago
We have the same issue. I also tried to scrape Google News with a different code before and got 100 results max. per query. It seems that we need pagination but I am not sure how to implement this here. One option would be to work with the start and end dates, going through really small windows of time to collect more results for consecutive days.
This is a related issue suggesting some workarounds: https://github.com/ranahaani/GNews/issues/31
I think the max I can get is 100 a day. Can anyone do better?
No, we tried crawling by the hour, but then we did not get any additional results. They seem to have the same time stamp for all posts published in one day, so you get the same 100 results wherever you start. In the end, we adjusted the research question a bit to work with 100 results per day but crawl through several months of data.
I think I managed to get a few more by iterating every hour but I think this is okay for now- thanks for the response. I think the error warning of 'must be 1 day apart else no results will return' should be altered however because you can get results by looking from hour to hour. However, iterating by hour doesn't work 100% from my experience.
Hi plz some one help me with this:
I utilized GNews to crawl News from 2023.10.1 to 2024.3.10 about the "Red Sea Crisis", but only got about 80 papers. But when I search key word in Factiva for the same duration, it has results about 3000 articles. I am doing NLP analysis so the volume of articles is quite essential.
Is the number of articles being limited by GNews or it simply does not have that much articles on Google News?