mattpodolak / pmaw

A multithread Pushshift.io API Wrapper for reddit.com comment and submission searches.
MIT License
212 stars 28 forks source link

Pushshift API updates #51

Closed eddvrs closed 1 year ago

eddvrs commented 1 year ago

PR To incorporate changes to PMAW to reflect changes to PushShift API. Quick fixes

Changes to the api include renaming of after & before params to since & until. Sort renamed to order.

metadata structure has changes, so made updates to: Checking for "down" shards Request size to enable batching of request threads

Will update documentation soon

eddvrs commented 1 year ago

Hi Matt, I hope it's OK. There have been some changes to the Pushshift API that mean PMAW will no longer work. I mad some changes at my end, but thought it might help to share those with you. I've addressed the changes I know about, but there may be more fixes needed

MatchaOnMuffins commented 1 year ago

Hi there,

I took a look at your pull request and noticed some issues with the tests and performance. The test results showed that PushShift API requests timed out, and there were also failures and hanging tests.

It seems that the issues may be related to the PushShift API itself. Do you think that could be the case? If so, it might be helpful to reach out to the PushShift API team to see if they can provide any insights or assistance.

Here are the details of the failing and hanging tests:

eddvrs commented 1 year ago

Hi there.

I've been following progress on the PushShift subreddit, and they're undergoing a lot of change all at once- Moving servers to a new data centre, reloading historical data, and at the same time have made some breaking changes to the API itself. At this point- it's difficult to know which of those is causing downstream problems in pmaw.

The API has been a bit flakey since then, additionally there is no data older than 3rd November 2022 available. The unit tests themselves probably also need updating - if they're set up to test on data older than 03/11/2022, then they simply won't work. I did some basic functional testing, but wasn't confident enough to review the unit-tests themselves. I'll take another look this evening.

mattpodolak commented 1 year ago

Hey @eddvrs I'll review the PR today/tomorrow. thanks for the contribution!!

eddvrs commented 1 year ago

Thanks @mattpodolak - There are still some issues with the API itself which may affect your original tests (see @MatchaOnMuffins's comment).

mattpodolak commented 1 year ago

Hi there.

I've been following progress on the PushShift subreddit, and they're undergoing a lot of change all at once- Moving servers to a new data centre, reloading historical data, and at the same time have made some breaking changes to the API itself. At this point- it's difficult to know which of those is causing downstream problems in pmaw.

The API has been a bit flakey since then, additionally there is no data older than 3rd November 2022 available. The unit tests themselves probably also need updating - if they're set up to test on data older than 03/11/2022, then they simply won't work. I did some basic functional testing, but wasn't confident enough to review the unit-tests themselves. I'll take another look this evening.

good to know! I'll update the unit tests before releasing