instaloader / instaloader

Download pictures (or videos) along with their captions and other metadata from Instagram.
https://instaloader.github.io/
MIT License
8.62k stars 1.17k forks source link

[question] export all likes on a profile #120

Closed nikartz closed 6 years ago

nikartz commented 6 years ago

Is there a way to export all profiles, that liked pictures on a given profile? I'm thinking of a way similar to this approach for followers. I've played around with get_likes from structures.py, but I've had no luck. Maybe someone could help!

If anyone is interested: I am trying to get a list of people, who follow someone (done, thanks to get_followers or get_followees) and compare that list to all profiles, that liked a picture (or maybe a picture in the last 6 months or so). This way I want to filter all followers, that haven't liked anything (=ghost followers) and manually block them (I don't want to block someone automated, so that I can really select on which profiles to block).

nikartz commented 6 years ago

So I now found a working solution and want to share it with anyone interested:

I now use InstaPy to export alle the likes on my profile. If set the quickstart.py up with:

try:
    session.login()
    session.set_dont_unfollow_active_users(enabled=True, posts=1000, boundary=50000)
    session.unfollow_users(amount=0, onlyInstapyFollowed = True, onlyInstapyMethod = 'FIFO', sleep_delay=600)

which works for me to output a list of profiles that liked my images. Of course the output needs to be logged somewhere and in the instapy.py you need to add print(active_users) somewhere around line 1977 (at least in my case) in def set_dont_unfollow_active_users so that it looks something like this:

def set_dont_unfollow_active_users(self, enabled=False, posts=4, boundary=500):
        """Prevents unfollow followers who have liked one of
        your latest X posts"""

        # do nothing
        if not enabled:
            return

        # list of users who liked our media
        active_users = get_active_users(self.browser,
                                        self.username,
                                        posts,
                                        boundary,
                                        self.logger)

        print(active_users)

        for user in active_users:
            # include active user to not unfollow list
            self.dont_include.append(user)

After that the output file should contain a list of all profiles that liked your posts.

No head on to outputting all followers. Again the output has to be logged somewhere, but my instaloader (I made a file called export_followers and threw it in the instaloader folder) looks like this:

import instaloader
import time

L = instaloader.Instaloader()

USER = 'your_account'
PASSWORD = 'your_password'
PROFILE = USER

L.login(USER, PASSWORD)

profile = instaloader.Profile.from_username(L.context, PROFILE)

for follower in profile.get_followers():
    print(follower.username)

That should output a file containing all your followers each in a new line.

That file I take to word and automatically replace every newline character with ', ' which means I search for ^p and replace it with ', '. After that there has to be done some cleaning, like adding [' at the beginning and taking a look at the beginning and the end if everything looks like a proper python list.

Those two lists I take into another simple python-script, which compares two lists and outputs everything that doesn't match up into a file. The script looks like this:

#list of followers:
follower = []

#list of likes:
liker = []

#compare
follower.sort()
liker.sort()

matches_literal = [set(follower) & set(liker)]

matches = str(matches_literal).replace('{','').replace('}','')

print('All matches:')
print(matches)
print()
print()
print('Inactive followers:')

ban = [n for n in follower if n not in liker]

print(ban)

print(ban, file=open("/YOUR PATH/inactive-users.txt", "w"))

print()
print()
print('Done')

Where ist says follower = [] and liker = [] you of course need to add your own lists, that have been outputted.

This way there after all of that there is a file containing all inactive users, that follow you. Now I go ahead and manually decide, if I want to ban them in order to get rid of ghost followers.

Maybe this approach is helpful to someone. I know, that it is a bit of work required, but I a beginner at python and couldn't automate the process more.

Thammus commented 6 years ago

Hello nikartz, your goal can easily be achieved using Instaloader only. No need to use other python modules or text editing software. To store inactive followers into a file you can use following approach:

import instaloader

L = instaloader.Instaloader()

USER = 'your_account'
PROFILE = USER

# Your preferred way of logging in:
L.load_session_from_file(USER)

profile = instaloader.Profile.from_username(L.context, PROFILE)

likes = set()
print('Fetching likes of all posts of profile {}.'.format(profile.username))
for post in profile.get_posts():
    print(post)
    likes = likes | set(post.get_likes())

print('Fetching followers of profile {}.'.format(profile.username))
followers = set(profile.get_followers())

ghosts = followers - likes

print('Storing ghosts into file.')
with open('/YOUR PATH/inactive-users.txt', 'w') as f:
    for ghost in ghosts:
        print(ghost.username, file=f)
kenstowe commented 6 years ago

@Thammus is there a way to have instaloader only pull likes from the latest five posts? Like @nikartz is utilizing instapy to do?

I'm trying to compare likes to ghosts on an account with 500 posts. It obviously takes a very long time to scrape and likes that are older than a couple weeks are stale and don't really prove currently active users.

aandergr commented 6 years ago

@Thammus is there a way to have instaloader only pull likes from the latest five posts? Like @nikartz is utilizing instapy to do?

Sure. profile.get_posts() returns an iterator, which can be sliced with islice() from itertools. So instead of

for post in profile.get_posts():
    ...

you can use

from itertools import islice
for post in islice(profile.get_posts(), 5):
    ...

You can also use the post's age as stop condition, where takewhile() comes handy. For example,

from datetime import datetime, timedelta
from itertools import takewhile
NOW = datetime.now()
for post in takewhile(lambda p: NOW - p.date < timedelta(days=7), profile.get_posts()):
    ...
kenstowe commented 6 years ago

Thanks @aandergr for so many options. I'll try them out. :)

kenstowe commented 6 years ago

The islice() worked perfectly for finding recent activity.

I would like to have a list of all likes also as in Thammus' original example. But my profile has 500 posts and when I run .get_posts() it always errors out with the 429 too many requests halfway-ish through. How do I insert longer wait times between requests? Or is there a better way to prevent the 429 errors? I have no other instances of instaloader or anything related to instagram running on this machine or even from the same IP.

aandergr commented 6 years ago

A general note about the notorious 429 - Too Many Requests: Instaloader has a logic to keep track of its requests to Instagram and to obey their rate limits. Since they are nowhere documented, we try them out experimentally. We have a daily cron job running to confirm that Instaloader still stays within the rate limits. Nevertheless, the rate control logic assumes that

The latter one implies that restarting or reinstantiating Instaloader often within short time is prone to cause a 429. When a request is denied with a 429, Instaloader retries the request as soon as the temporary ban is assumed to be expired.

(copy of my recent comment in https://github.com/instaloader/instaloader/issues/128#issuecomment-396171282)