mattpodolak / pmaw

A multithread Pushshift.io API Wrapper for reddit.com comment and submission searches.
MIT License
212 stars 28 forks source link

Returned 0 result #54

Open junwenhu opened 1 year ago

junwenhu commented 1 year ago

Using pmaw I got 0 submission returned today. It worked before. Don't know why. I exactly followed the medium post Matt wrote. I checked the status of pushshift server and it says it is fine. Any idea what has happened?

chengren commented 1 year ago

Did you upgrade to the latest version?

architectdrone commented 1 year ago

I think this has to do with some kind of update that pushshift is doing. Sending parameters through the URL doesn't seem to work anymore - only sending parameters in the body of the GET request works, it seems.

mattpodolak commented 1 year ago

Using pmaw I got 0 submission returned today. It worked before. Don't know why. I exactly followed the medium post Matt wrote. I checked the status of pushshift server and it says it is fine. Any idea what has happened?

hey @junwenhu, I have to update the medium post, the issue you encountered is likely due to changes that have been made to the API parameters, before -> until and after -> since.

Can you try using these new parameters, if the problem persists can you share the code that is returning 0 results and the version of PMAW that you are using?

junwenhu commented 1 year ago

That is interesting. I have not tried that yet.

junwenhu commented 1 year ago

Using pmaw I got 0 submission returned today. It worked before. Don't know why. I exactly followed the medium post Matt wrote. I checked the status of pushshift server and it says it is fine. Any idea what has happened?

hey @junwenhu, I have to update the medium post, the issue you encountered is likely due to changes that have been made to the API parameters, before -> until and after -> since.

Can you try using these new parameters, if the problem persists can you share the code that is returning 0 results and the version of PMAW that you are using?

Oh! I kept using the before and after and I didn't know there have been updates! Thank you for telling me. I will try when I have time to and get back in here soon. Happy new year.

MackBlackburn commented 1 year ago

I am also seeing this issue. Search_comments returns lots of results and search_submissions returns 0 with the exact same inputs. Search_submissions returns results if I do not specify before/after/until/since, but when I pass epoch values with before/after or until/since, it returns 0. Using the same epochs works for search_comments.

hienvantran commented 1 year ago

I encountered the same issue, changing before/after to until/since but still got 0 results. Can anyone give updates?

MackBlackburn commented 1 year ago

I actually think this is an issue with PushShift rather than PMAW, since I also get 0 results directly querying PushShift. This should get a year of data matching the query "science" but it returns 0 results. Replacing "submission" with "comment" returns plenty. Seems like there must be a problem. https://api.pushshift.io/reddit/search/submission?q=science&since=1619236800&until=1650772800&limit=10

junwenhu commented 1 year ago

Using pmaw I got 0 submission returned today. It worked before. Don't know why. I exactly followed the medium post Matt wrote. I checked the status of pushshift server and it says it is fine. Any idea what has happened?

hey @junwenhu, I have to update the medium post, the issue you encountered is likely due to changes that have been made to the API parameters, before -> until and after -> since.

Can you try using these new parameters, if the problem persists can you share the code that is returning 0 results and the version of PMAW that you are using?

Hi I tried until and after, it doesn't work either. It returns 0 submissions.

import pandas as pd

from pmaw import PushshiftAPI
api = PushshiftAPI()

import datetime as dt
until = int(dt.datetime(2021,2,1,0,0).timestamp())
since = int(dt.datetime(2020,12,1,0,0).timestamp())

subreddit='science'
limit=1000000
submissions = api.search_submissions(subreddit=subreddit, limit=limit, until=until, since=since)
print(f'Retrieved {len(submissions)} submissions from Pushshift')

I'm also tried using the codes your shared on your github. They didn't work anymore. For example, this returns 0 submissions.

from pmaw import PushshiftAPI

api = PushshiftAPI()
posts = api.search_submissions(subreddit="science", limit=700000, until=1613234822, safe_exit=True)
print(f'{len(posts)} posts retrieved from Pushshift')

Like @MackBlackburn said, can it be a problem of PushShift? It so, this problem has lasted a long time (at least two weeks). Is there anything we can do about it if we'd still like to get some data from Reddit? Thank you!

junwenhu commented 1 year ago

Did you upgrade to the latest version?

Yes I install pmaw every time I use it

junwenhu commented 1 year ago

I actually think this is an issue with PushShift rather than PMAW, since I also get 0 results directly querying PushShift. This should get a year of data matching the query "science" but it returns 0 results. Replacing "submission" with "comment" returns plenty. Seems like there must be a problem. https://api.pushshift.io/reddit/search/submission?q=science&since=1619236800&until=1650772800&limit=10

It's weird. I can get results from requests (provided that size is smaller than 1000).

chengren commented 1 year ago

Please refer this https://www.reddit.com/r/pushshift/comments/zuclhb/psa_pmaw_has_been_updated_to_handle_the_api/ "Submissions earlier than November 3rd still have not been loaded so any searches for submissions earlier than that will fail."

shanktt commented 1 year ago

Anyone still having trouble pulling submissions and comments that took place after November 3rd when using PMAW 3.0.0? Here's my request that is still returning zero results:

from pmaw import PushshiftAPI
import datetime as dt
import pandas as pd
import numpy as np

start_epoch = int(dt.datetime(2023, 1, 1).timestamp())
end_epoch = int(dt.datetime(2023, 1, 6).timestamp())

api = PushshiftAPI()
gen1 = api.search_submissions(subreddit="science", since=start_epoch, until=end_epoch)
gen2 = api.search_comments(subreddit="science", since=start_epoch, until=end_epoch)
eddvrs commented 1 year ago

@AshankKumar, your code works fine for me here (with a few print-outs added):

import pmaw
import datetime as dt
#import pandas as pd
#import numpy as np

print(pmaw.__version__)

start_epoch = int(dt.datetime(2023, 1, 1).timestamp())
end_epoch = int(dt.datetime(2023, 1, 6).timestamp())

api = pmaw.PushshiftAPI()
gen1 = api.search_submissions(subreddit="science", since=start_epoch, until=end_epoch)

print("gen1:", len(gen1))

gen2 = api.search_comments(subreddit="science", since=start_epoch, until=end_epoch, limit=100)

print("gen2:", len(gen2))

Yields:

3.0.0
gen1: 330
gen2: 100

And occasionally this:

3.0.0
Not all PushShift shards are active. Query results may be incomplete.
gen1: 0
gen2: 100

The Pushshift API has been patchy recently- look at all this red: Daily status: Submissions Daily status: Comments

Do you definitely have the latest version of PMAW installed? Additionally, is your project definitely referencing the latest version?

shanktt commented 1 year ago

Ah geez thank you for the sanity check. Of course I forgot to start up the virtual environment

ibnahmadbello commented 1 year ago

Hello, I am still facing same issue. I do get response if my limit <= 1000, but once I change limit to 1001, I get 0 response. @junwenhu How were you able to get pass this issue?

junwenhu commented 1 year ago

Hello, I am still facing same issue. I do get response if my limit <= 1000, but once I change limit to 1001, I get 0 response. @junwenhu How were you able to get pass this issue?

I didn't. sorry

ibnahmadbello commented 1 year ago

Hello, I am still facing same issue. I do get response if my limit <= 1000, but once I change limit to 1001, I get 0 response. @junwenhu How were you able to get pass this issue?

I didn't. sorry

Okay. Thanks

junwenhu commented 1 year ago

Updates: I found the problem. Pip install kept installing what was in the computer, so even though I kept installing the newest version, it won’t update.

I uninstalled and reinstalled anaconda and the problem was solved.