mattpodolak / pmaw

A multithread Pushshift.io API Wrapper for reddit.com comment and submission searches.
MIT License
212 stars 28 forks source link

Confusion about multithreading #47

Closed kadnan closed 1 year ago

kadnan commented 1 year ago

Hi

When I call :

from psaw import PushshiftAPI as psawAPI
from pmaw import PushshiftAPI as pmawAPI
api = psawAPI()
api_request_generator = list(api.search_submissions(...)

And

api = pmawAPI()
api_request_generator = list(api.search_submissions(...)

How it makes search_submissions mutithreaded? ain't both psaw and pmsaw making a single REST call to PushShift?

mattpodolak commented 1 year ago

hey @kadnan, search_submissions is multithreaded by splitting up a request/query into many smaller requests that are divided across multiple threads. You can see where that happens here.

ain't both psaw and pmsaw making a single REST call to PushShift?

Neither psaw or pmaw make a single API call to Pushshift, querying results is done with many API calls as the response is limited (Pushshift wont return 10k submissions for a single API call), the difference is psaw makes these calls sequentially