pushshift / api

Pushshift API
1.29k stars 107 forks source link

always 100 unique ids despite the size of returned comments #136

Open chaee opened 1 year ago

chaee commented 1 year ago

Hi! I am getting comments from the subreddit using before and after dates, but I found out that the number of unique items per day is always 100. The number of total result varies and seems right, but there are a lot of duplicates. The unique items are always 100 which is also the limit from reddit API, so I wonder if there's any connection here. Do I need to specify something in the query additionally? I tried adding size or limit but didn't seem to solve this problem (other than returning zero result when the limit is too big as others pointed out) Below is how I am sending the query now:

from pmaw import PushshiftAPI
api = PushshiftAPI()
api_request_generator = list(api.search_comments(subreddit='The_Donald',
                                                            before=calendar.timegm(until_date.timetuple()),
                                                            after=calendar.timegm(since_date.timetuple()),
                                                            safe_exit=True,
                                                            size=500,
                                                            mem_safe=True,
                                                            until=calendar.timegm(until_date.timetuple())
                                                         )