subzeroid / instagrapi

🔥 The fastest and powerful Python library for Instagram Private API 2024
https://hikerapi.com/p/bkXQlaVe
MIT License
4.38k stars 685 forks source link

user_followers_v1_chunk mobile api max amount not working #400

Closed Osanchez closed 3 years ago

Osanchez commented 3 years ago

I am using the following api function and it is returning more than the specified max_amount. I set chunk size to 1000 and the api is returning 1081. I am trying to get the id's of all the followers of a user and am not getting all of the followers listed under user_info.

example: profile says 2002 followers only get back 1974

func: user_followers_gql_chunk(user_id: int, max_amount: int = 0, end_cursor: str = None) return: Tuple[List[UserShort], str] description: Get user’s followers information by Public Graphql API and end_cursor

def get_followers(self, user_id):
        user_info = self.api.user_info(user_id)
        follower_count = user_info.follower_count

        followees = set()
        chunk_size = 1000
        current_index = 0
        last_cursor = None

        start_time = time()
        while current_index < follower_count:
            start_time_2 = time()
            # we get the followers in chunks to avoid rate limiting
            user_followers = self.api.user_followers_v1_chunk(user_id=user_id, max_amount=chunk_size, max_id=last_cursor)
            print(f"Time to get {chunk_size} followers:  {time() - start_time_2} seconds")

            if len(user_followers[0]) == 0:
                break

            for follower in user_followers[0]:
                followees.add(follower.pk)

            last_cursor = user_followers[1]
            current_index += len(user_followers[0])

        print(f"Time to get {follower_count} followers:  {time() - start_time} seconds")

        return followees

without params not all followers are returned.

Osanchez commented 3 years ago

I believe issue is where you check for users > max_amount

def user_followers_v1_chunk(self, user_id: int, max_amount: int = 0, max_id: str = "") -> Tuple[List[UserShort], str]:
        """
        Get user's followers information by Private Mobile API and max_id (cursor)
        Parameters
        ----------
        user_id: int
            User id of an instagram account
        max_amount: int, optional
            Maximum number of media to return, default is 0 - Inf
        max_id: str, optional
            Max ID, default value is empty String
        Returns
        -------
        Tuple[List[UserShort], str]
            Tuple of List of users and max_id
        """
        unique_set = set()
        users = []
        while True:
            result = self.private_request(f"friendships/{user_id}/followers/", params={
                "max_id": max_id,
                "rank_token": self.rank_token,
                "search_surface": "follow_list_page",
                "query": "",
                "enable_groups": "true"
            })
            for user in result["users"]:
                user = extract_user_short(user)
                if user.pk in unique_set:
                    continue
                unique_set.add(user.pk)
                users.append(user)
            max_id = result.get("next_max_id")
            if not max_id or (max_amount and len(users) >= max_amount):
                break
        return users, max_id

you will need to add check for max_amount in user append code block so that users are no longer appended if limit is reached. make this a return block so you do not send another unnecessary API request. I am also not sure of result.get("next_max_id"), is this just an increment based on max_amount? it looks like you get users in chunks of 100.

logic of the following seems off

max_id = result.get("next_max_id")

if I enter max_amount as parameter, I expect the func to return the amount of users specified by max amount, and the next_id to related to the max amount. if it is a cursor, I can call the function again and get the next chunk of items starting off where max_amount finished. right now if I use this function as a "generator", every time I call function to get next block of users, there are some users from previous block that are not included.

example:

max_amount = 89 max id returned would be 100 users at index 90-100 inclusive would not be included in next call

Osanchez commented 3 years ago

https://github.com/adw0rd/instagrapi/issues/68

looks like the tests only validate a small chunk of the followers. Mobile api seems to be much faster, but for some reason is not returning all of the followers. so instead, using client.user_followers(user_id: int, amount: int = 0). but it takes much longer

Time to get 2008 followers: 209.34544134140015 seconds

adw0rd commented 3 years ago

@Osanchez the logic in user_followers_gql_chunk works as intended, but the name max_amount is confusing, I agree

This is an internal mechanism (user_followers_gql_chunk) for other functions (such as user_followers...), it returns all the data without slice

Osanchez commented 3 years ago

@adw0rd and what about the different API methods returning a different number of followers in results?