Closed rivernews closed 4 years ago
collision.<review id>.<timestamp>.json
. This is so that it's easier to search by prefix, which is the only native way to search that S3 provides.Eventually we want to find out why we lose so much reviews. Or why the scraper did not see the next page link and exit by 0?
When we re-tried groupon, we got around 6xx results. Seems like the abortion is due to no studout in 10 minutes. This is due to we tweak the loglevel in travis to 2
, warning. Changing to 3
INFO should solve this issue. The original problem's cause is still not identified.
After we run the 2nd time, it's now:
Processed reviews count: 2443/2696
Duration: 0h:32min:22s.905
Seems like not a big deal here looking at the review count rate. However we do want to verify if there is no further page.
Based on the page number, we believe that we did retrieve all we have. So this indicate that there's a gap between the shown local count and the actual reviews available.
Slack message of the scraper log.
Org: groupon
Hint: observe the url page number & the processed review count. You can see basically one page contains 10 reviews.