Open junwenhu opened 1 year ago
Did you upgrade to the latest version?
I think this has to do with some kind of update that pushshift is doing. Sending parameters through the URL doesn't seem to work anymore - only sending parameters in the body of the GET request works, it seems.
Using pmaw I got 0 submission returned today. It worked before. Don't know why. I exactly followed the medium post Matt wrote. I checked the status of pushshift server and it says it is fine. Any idea what has happened?
hey @junwenhu, I have to update the medium post, the issue you encountered is likely due to changes that have been made to the API parameters, before
-> until
and after
-> since
.
Can you try using these new parameters, if the problem persists can you share the code that is returning 0 results and the version of PMAW that you are using?
That is interesting. I have not tried that yet.
Using pmaw I got 0 submission returned today. It worked before. Don't know why. I exactly followed the medium post Matt wrote. I checked the status of pushshift server and it says it is fine. Any idea what has happened?
hey @junwenhu, I have to update the medium post, the issue you encountered is likely due to changes that have been made to the API parameters,
before
->until
andafter
->since
.Can you try using these new parameters, if the problem persists can you share the code that is returning 0 results and the version of PMAW that you are using?
Oh! I kept using the before and after and I didn't know there have been updates! Thank you for telling me. I will try when I have time to and get back in here soon. Happy new year.
I am also seeing this issue. Search_comments returns lots of results and search_submissions returns 0 with the exact same inputs. Search_submissions returns results if I do not specify before/after/until/since, but when I pass epoch values with before/after or until/since, it returns 0. Using the same epochs works for search_comments.
I encountered the same issue, changing before/after to until/since but still got 0 results. Can anyone give updates?
I actually think this is an issue with PushShift rather than PMAW, since I also get 0 results directly querying PushShift. This should get a year of data matching the query "science" but it returns 0 results. Replacing "submission" with "comment" returns plenty. Seems like there must be a problem. https://api.pushshift.io/reddit/search/submission?q=science&since=1619236800&until=1650772800&limit=10
Using pmaw I got 0 submission returned today. It worked before. Don't know why. I exactly followed the medium post Matt wrote. I checked the status of pushshift server and it says it is fine. Any idea what has happened?
hey @junwenhu, I have to update the medium post, the issue you encountered is likely due to changes that have been made to the API parameters,
before
->until
andafter
->since
.Can you try using these new parameters, if the problem persists can you share the code that is returning 0 results and the version of PMAW that you are using?
Hi I tried until and after, it doesn't work either. It returns 0 submissions.
import pandas as pd
from pmaw import PushshiftAPI
api = PushshiftAPI()
import datetime as dt
until = int(dt.datetime(2021,2,1,0,0).timestamp())
since = int(dt.datetime(2020,12,1,0,0).timestamp())
subreddit='science'
limit=1000000
submissions = api.search_submissions(subreddit=subreddit, limit=limit, until=until, since=since)
print(f'Retrieved {len(submissions)} submissions from Pushshift')
I'm also tried using the codes your shared on your github. They didn't work anymore. For example, this returns 0 submissions.
from pmaw import PushshiftAPI
api = PushshiftAPI()
posts = api.search_submissions(subreddit="science", limit=700000, until=1613234822, safe_exit=True)
print(f'{len(posts)} posts retrieved from Pushshift')
Like @MackBlackburn said, can it be a problem of PushShift? It so, this problem has lasted a long time (at least two weeks). Is there anything we can do about it if we'd still like to get some data from Reddit? Thank you!
Did you upgrade to the latest version?
Yes I install pmaw every time I use it
I actually think this is an issue with PushShift rather than PMAW, since I also get 0 results directly querying PushShift. This should get a year of data matching the query "science" but it returns 0 results. Replacing "submission" with "comment" returns plenty. Seems like there must be a problem. https://api.pushshift.io/reddit/search/submission?q=science&since=1619236800&until=1650772800&limit=10
It's weird. I can get results from requests (provided that size is smaller than 1000).
Please refer this https://www.reddit.com/r/pushshift/comments/zuclhb/psa_pmaw_has_been_updated_to_handle_the_api/ "Submissions earlier than November 3rd still have not been loaded so any searches for submissions earlier than that will fail."
Anyone still having trouble pulling submissions and comments that took place after November 3rd when using PMAW 3.0.0? Here's my request that is still returning zero results:
from pmaw import PushshiftAPI
import datetime as dt
import pandas as pd
import numpy as np
start_epoch = int(dt.datetime(2023, 1, 1).timestamp())
end_epoch = int(dt.datetime(2023, 1, 6).timestamp())
api = PushshiftAPI()
gen1 = api.search_submissions(subreddit="science", since=start_epoch, until=end_epoch)
gen2 = api.search_comments(subreddit="science", since=start_epoch, until=end_epoch)
@AshankKumar, your code works fine for me here (with a few print-outs added):
import pmaw
import datetime as dt
#import pandas as pd
#import numpy as np
print(pmaw.__version__)
start_epoch = int(dt.datetime(2023, 1, 1).timestamp())
end_epoch = int(dt.datetime(2023, 1, 6).timestamp())
api = pmaw.PushshiftAPI()
gen1 = api.search_submissions(subreddit="science", since=start_epoch, until=end_epoch)
print("gen1:", len(gen1))
gen2 = api.search_comments(subreddit="science", since=start_epoch, until=end_epoch, limit=100)
print("gen2:", len(gen2))
Yields:
3.0.0
gen1: 330
gen2: 100
And occasionally this:
3.0.0
Not all PushShift shards are active. Query results may be incomplete.
gen1: 0
gen2: 100
The Pushshift API has been patchy recently- look at all this red: Daily status: Submissions Daily status: Comments
Do you definitely have the latest version of PMAW installed? Additionally, is your project definitely referencing the latest version?
Ah geez thank you for the sanity check. Of course I forgot to start up the virtual environment
Hello, I am still facing same issue. I do get response if my limit <= 1000, but once I change limit to 1001, I get 0 response. @junwenhu How were you able to get pass this issue?
Hello, I am still facing same issue. I do get response if my limit <= 1000, but once I change limit to 1001, I get 0 response. @junwenhu How were you able to get pass this issue?
I didn't. sorry
Hello, I am still facing same issue. I do get response if my limit <= 1000, but once I change limit to 1001, I get 0 response. @junwenhu How were you able to get pass this issue?
I didn't. sorry
Okay. Thanks
Updates: I found the problem. Pip install kept installing what was in the computer, so even though I kept installing the newest version, it won’t update.
I uninstalled and reinstalled anaconda and the problem was solved.
Using pmaw I got 0 submission returned today. It worked before. Don't know why. I exactly followed the medium post Matt wrote. I checked the status of pushshift server and it says it is fine. Any idea what has happened?