pushshift / api

Pushshift API
1.29k stars 107 forks source link

Issues with search_submission_comment_ids and search_comments #104

Closed Jhann024 closed 2 years ago

Jhann024 commented 2 years ago

Ive been working on a project where I am gathering large amounts of comment data and I have ran into two issues with each of these functions. The first issue I ran into when using the search_submission_comment_ids function is that when searching post comment ids, there returns zero comments for any post following November 26th 2021, as well as periodically prior to this date (although I haven't done extensive testing for prior).

Following the discovery of this issue, I attempted to remedy it by checking if the use of the prior function resulted in comment data being available, and if not, then switching to the use of the search_comments function. While this did work and I was able to find comment data following the November 26th 2021 date, every API request made using the searchcomments function gave a warning of "Not all Pushshift shards are active. Query Results may be incomplete.". Upon investigation using the api.metadata.get('Shards') command, I was getting results such as:

{'failed': 0, 'skipped': 0, 'successful': 67, 'total': 74}

If anybody has any idea for why either of these issues is occurring, or why the shard metadata shows the missing shards as neither failed nor skipped and is willing to share id greatly appreciate it.

TimOgden commented 2 years ago

why did you close this? Did you come to a solution? Please share if you did

Jhann024 commented 2 years ago

I posted it here instead of the PMAW wrapper by accident, I have not found a solution