iSarabjitDhiman / TweeterPy

TweeterPy is a python library to extract data from Twitter. TweeterPy API lets you scrape data from a user's profile like username, userid, bio, followers/followings list, profile media, tweets, etc.
MIT License
123 stars 17 forks source link

handle pagination using end cursor issue - Pagination Issue when total number of results specified #41

Closed TheDudeLebowsky closed 8 months ago

TheDudeLebowsky commented 8 months ago

unable to continue fetching friends/follower using the get_friends. api call returns empty data after a few calls example : (this user has 137 followers) using it with total = 10 parameter. Fetching followers Total users fetched : 10 : End cursor value : 1761772550753500306|1718657196322979790 Total users fetched : 20 : End cursor value : 1753300493287752709|1718657196322979738 Total users fetched : 30 : End cursor value : 0|1718657196322979700

Warning: No data returned from API call Total users fetched : 30 End cursor value : 0|1718657196322979698

iSarabjitDhiman commented 8 months ago

Hey @TheDudeLebowsky

I see what you mean. Let me tell you what is happening here: When the script/tool/bot sends a request to twitter to fetch the data, it specifies a parameter "count" which tells the backend how much data it requested for. Most of the time its set to the max number supported/accepted (for some endpoints its 50 for others its 100) by the backend to decrease the number of api calls made. So when u specify the total number of results (which u did by setting total=10), the script made a request for the whole page (which may contain 50~100 results). Since you asked for 10 results, you were given only 10 results but the end_cursor you were given was fetched after getting those (50~100 results). Means the end_cursor is for the next page and will exclude the previous page results (50~100 results which it fetched with the previous request). Therefore when u request for the next page with the given endpoint, its going to ignore the results from the previous page. The number of results are going to be less than the actual total results as the script fetched more results than you specified/recieved. So when you use the end_cursor, the script is going to continue from where it left, not from the number of results you were given(total=10). As you said there are total 137 followers, the script fetched around 50 results with each api request. (50x3 =150>137) and it had to make 3 api requests. But you asked you 10 results for each request, thats the reason you got 30 results. Hope this helps?

I am going to fix it soon. Btw thanks for drawing my attention to this.

iSarabjitDhiman commented 8 months ago

Fixed in 589b08c784f3ad18d8a1bf3451bcf54c6bac1ff7 Make sure to update the package with

pip install tweeterpy -U

Just set the pagination=False and dont set the total to any number. Leave the as total=None. While pagination is set to False, its going to return the result for each request, you can use for or while loop to handle the each api request manually. This way you will get all the results. Make sure to check for "has_next_page" value from the returned dataset after each request. Once there are no more pages, you can break out of the loop.