Open ranbix666 opened 1 year ago
Hi @ranbix666
The parameter names for before and after have changed to "until" and "since", so try this line instead:
comments = api.search_comments(subreddit=subreddit, limit=limit, until=before, since=after)
Additionally, the Pushshift API itself is undergoing a major migration, as a result there is not (yet) any data from before November 2022, so along with the above change, try changing the date range also.
The following code returns the expected count for me:
api = pmaw.PushshiftAPI()
before = int(dt.datetime(2023, 1, 25, 0, 0).timestamp())
after = int(dt.datetime(2023, 1, 1, 0, 0).timestamp())
subreddit = "wallstreetbets"
limit = 301
comments = api.search_comments(subreddit=subreddit, limit=limit, until=before, since=after)
print(f'Retrieved {len(comments)} comments from Pushshift')
Hello,
I have the same issue: request is ok if limit <= 1000. @eddvrs your example works because your limit is under 1000. This:
import pmaw
import datetime as dt
api = pmaw.PushshiftAPI()
before = int(dt.datetime(2023, 1, 25, 0, 0).timestamp())
after = int(dt.datetime(2023, 1, 1, 0, 0).timestamp())
subreddit = "wallstreetbets"
limit =1000
comments = api.search_comments(subreddit=subreddit, limit=limit, until=before, since=after)
print(f'Retrieved {len(comments)} comments from Pushshift')
returns
Retrieved 1000 comments from Pushshift
While this (which is the exact same code but with a limit at 1001 instead of 1000):
import pmaw
import datetime as dt
api = pmaw.PushshiftAPI()
before = int(dt.datetime(2023, 1, 25, 0, 0).timestamp())
after = int(dt.datetime(2023, 1, 1, 0, 0).timestamp())
subreddit = "wallstreetbets"
limit =1001
comments = api.search_comments(subreddit=subreddit, limit=limit, until=before, since=after)
print(f'Retrieved {len(comments)} comments from Pushshift')
returns
Not all PushShift shards are active. Query results may be incomplete. Retrieved 0 comments from Pushshift
Using the parameter "size" instead of "limit" fixed the issue for me. It is probably due to the pushshift migration.
Using the parameter "size" instead of "limit" fixed the issue for me. It is probably due to the pushshift migration.
I set "size = 2000" and after about 15 minutes of runtime, I interrupted the process to find myself with over86000 results. Could someone please help?
Size parameter is doesn't work right now
Was there ever any resolution to this problem? If both size and limit parameters aren't working as expected, how can we retrieve a desired number of records?
Using the parameter "size" instead of "limit" fixed the issue for me. It is probably due to the pushshift migration.
I set "size = 2000" and after about 15 minutes of runtime, I interrupted the process to find myself with over86000 results. Could someone please help?
You get more than you asked for. Isn't it great? LOL, just joking.
Hi Matthew, thank you so much for your great work on PMAW!
I tried to use your example with a limit = 100000. It seems 0 comments will be retrieved if the limit is greater than 1000.
The log:
I have tried with limit = 100, 1000, 1001. It seems 0 comments will be retrieved if the limit is greater than 1000.
Can you please let me know if I missed anything? Thanks!