mattpodolak / pmaw

A multithread Pushshift.io API Wrapper for reddit.com comment and submission searches.
MIT License
212 stars 28 forks source link

How to skip a request if taking too long? #38

Closed jonlee112 closed 2 years ago

jonlee112 commented 2 years ago

Sometimes a request within a loop of requests ends up going through hundreds (thousands?) of batches trying to locate 1 single comment. I'd rather just skip that comment and move on to the next request. Any way to implement this in the code?

example output: INFO:pmaw.PushshiftAPIBase:Checkpoint:: Success Rate: 100.00% - Requests: 19 - Batches: 10 - Items Remaining: 1 INFO:pmaw.PushshiftAPIBase:Checkpoint:: Success Rate: 100.00% - Requests: 29 - Batches: 20 - Items Remaining: 1 INFO:pmaw.PushshiftAPIBase:Checkpoint:: Success Rate: 100.00% - Requests: 39 - Batches: 30 - Items Remaining: 1 INFO:pmaw.PushshiftAPIBase:Checkpoint:: Success Rate: 100.00% - Requests: 49 - Batches: 40 - Items Remaining: 1 INFO:pmaw.PushshiftAPIBase:Checkpoint:: Success Rate: 100.00% - Requests: 59 - Batches: 50 - Items Remaining: 1 INFO:pmaw.PushshiftAPIBase:Checkpoint:: Success Rate: 100.00% - Requests: 69 - Batches: 60 - Items Remaining: 1 etc etc etc etc

mattpodolak commented 2 years ago

Hi @jonlee112, i think this is related to an issue in the code where a bad request is repeatedly retried. I should be able to push a fix to pmaw addressing this scenario

jonlee112 commented 2 years ago

@mattpodolak Thank you so much! Would be much appreciated as I could gather a lot more data overnight when I can't be around to monitor the progress.

HenryBlackie commented 2 years ago

I've been having the same issue. I left a script running overnight and later realised it had made ~100,000 requests for the same 3 comments.

It can be consistently recreated with: results = api.search_comments(subreddit='incels', after=1469919600, before=1470006000)

mattpodolak commented 2 years ago

fixed this issue in 2.1.2

jonlee112 commented 2 years ago

Many thanks @mattpodolak !