Closed eddvrs closed 1 year ago
Hi Matt, I hope it's OK. There have been some changes to the Pushshift API that mean PMAW will no longer work. I mad some changes at my end, but thought it might help to share those with you. I've addressed the changes I know about, but there may be more fixes needed
Hi there,
I took a look at your pull request and noticed some issues with the tests and performance. The test results showed that PushShift API requests timed out, and there were also failures and hanging tests.
It seems that the issues may be related to the PushShift API itself. Do you think that could be the case? If so, it might be helpful to reach out to the PushShift API team to see if they can provide any insights or assistance.
Here are the details of the failing and hanging tests:
test_filter_fn
failed.test_search_comments
and test_search_submissions
failed after hanging for a few minutes.test_search_submission_comment_ids
and test_response hangs
indefinitely.Hi there.
I've been following progress on the PushShift subreddit, and they're undergoing a lot of change all at once- Moving servers to a new data centre, reloading historical data, and at the same time have made some breaking changes to the API itself. At this point- it's difficult to know which of those is causing downstream problems in pmaw.
The API has been a bit flakey since then, additionally there is no data older than 3rd November 2022 available. The unit tests themselves probably also need updating - if they're set up to test on data older than 03/11/2022, then they simply won't work. I did some basic functional testing, but wasn't confident enough to review the unit-tests themselves. I'll take another look this evening.
Hey @eddvrs I'll review the PR today/tomorrow. thanks for the contribution!!
Thanks @mattpodolak - There are still some issues with the API itself which may affect your original tests (see @MatchaOnMuffins's comment).
Hi there.
I've been following progress on the PushShift subreddit, and they're undergoing a lot of change all at once- Moving servers to a new data centre, reloading historical data, and at the same time have made some breaking changes to the API itself. At this point- it's difficult to know which of those is causing downstream problems in pmaw.
The API has been a bit flakey since then, additionally there is no data older than 3rd November 2022 available. The unit tests themselves probably also need updating - if they're set up to test on data older than 03/11/2022, then they simply won't work. I did some basic functional testing, but wasn't confident enough to review the unit-tests themselves. I'll take another look this evening.
good to know! I'll update the unit tests before releasing
PR To incorporate changes to PMAW to reflect changes to PushShift API. Quick fixes
Changes to the api include renaming of after & before params to since & until. Sort renamed to order.
metadata structure has changes, so made updates to: Checking for "down" shards Request size to enable batching of request threads
Will update documentation soon