marcopompili / django-instagram

Instagram application for Django.
BSD 3-Clause "New" or "Revised" License
76 stars 27 forks source link

ERROR - path to profile media not found, stopped working in Production #27

Open Zalkota opened 4 years ago

Zalkota commented 4 years ago

Django Instagram was working, but now I receive the following error only in Production. In my development environment it works fine. It was working in production yesterday.

Traceback (most recent call last):
django_1    |   File "/usr/local/lib/python3.6/site-packages/django_instagram/templatetags/instagram_client.py", line 34, in get_profile_media
django_1    |     edges = profile['entry_data']['ProfilePage'][page]['graphql']['user']['edge_owner_to_timeline_media']['edges']
django_1    | KeyError: 'ProfilePage'
Daenith commented 4 years ago

Same here. Did you find a fix?

Zalkota commented 4 years ago

I have not found a fix.

anoleose commented 4 years ago

Hey there ! KeyError: 'ProfilePage' line 34 Error - path to profile media not found How can I fix it?

Zalkota commented 4 years ago

Anyone figure it out?

Daenith commented 4 years ago

The only thing I found was that Instagram redirects to login in production but somehow not in development. If you curl the Instagram feed you want to embed, you'll see that you get a 200 on localhost and a 300 on the server. At least that's what I got and that is why django-instgram can't find the context it needs. Haven't found a fix though.

FabienP89 commented 4 years ago

hi guys i have exactly the same problem did you find a solution ?

chocoflaps commented 4 years ago

Same issue here using Heroku

BryOliver commented 4 years ago

Hi, has anyone managed to find a solution?

maxwhosevillage commented 4 years ago

I'm having the same Issue. The instagramUser in my example is 'leyendeckerbn' Looking deeper into it I see that I'm forwarded to the login page. Then there is no ProfilePage.

https://www.instagram.com:443 "GET /leyendeckers_bn/ HTTP/1.1" 302 0 https://www.instagram.com:443 "GET /accounts/login/?next=/leyendeckers_bn/ HTTP/1.1" 200 11288

profile['entry_data'] looks like this: {'LoginAndSignupPage': [{'captcha': {'enabled': False, 'key': ''}, 'gdpr_required': False, 'tos_version': 'row', 'username_hint': ''}]}

Don't know how to fix that at the moment. Looking forward to more comments here.

marcopompili commented 4 years ago

I'll see if I can replicate the problem, it sounds like a change from the Instagram side is causing problems. If more people would do like @maxwhosevillage and send also the type of interrogation/user it would be more helpful.

Also add what type of configuration you are running for production.

maxwhosevillage commented 4 years ago

Just to find out what is happening i simply did: wget https://www.instagram.com/leyendeckers_bn

On my local development-machine (MacOS) the output is:

➜ ~ wget https://www.instagram.com/leyendeckers_bn --2020-06-21 22:41:00-- https://www.instagram.com/leyendeckers_bn Resolving www.instagram.com (www.instagram.com)... 2a03:2880:f23f:e5:face:b00c:0:4420, 157.240.27.174 Connecting to www.instagram.com (www.instagram.com)|2a03:2880:f23f:e5:face:b00c:0:4420|:443... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: https://www.instagram.com/leyendeckers_bn/ [following] --2020-06-21 22:41:00-- https://www.instagram.com/leyendeckers_bn/ Reusing existing connection to [www.instagram.com]:443. HTTP request sent, awaiting response... 200 OK Length: 35934 (35K) [text/html] Saving to: ‘leyendeckers_bn’ leyendeckers_bn 100%[===================>] 35.09K --.-KB/s in 0.02s
2020-06-21 22:41:01 (1.52 MB/s) - ‘leyendeckers_bn’ saved [130129]

On the production-maching (ubuntu18.04) i also get the redirect:

wget https://www.instagram.com/leyendeckers_bn --2020-06-21 22:42:21-- https://www.instagram.com/leyendeckers_bn Resolving www.instagram.com (www.instagram.com)... 31.13.84.174, 2a03:2880:f207:e5:face:b00c:0:4420 Connecting to www.instagram.com (www.instagram.com)|31.13.84.174|:443... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: https://www.instagram.com/leyendeckers_bn/ [following] --2020-06-21 22:42:21-- https://www.instagram.com/leyendeckers_bn/ Reusing existing connection to www.instagram.com:443. HTTP request sent, awaiting response... 302 Found Cookie coming from www.instagram.com attempted to set domain to i.instagram.com Cookie coming from www.instagram.com attempted to set domain to i.instagram.com Location: https://www.instagram.com/accounts/login/?next=/leyendeckers_bn/ [following] --2020-06-21 22:42:21-- https://www.instagram.com/accounts/login/?next=/leyendeckers_bn/ Reusing existing connection to www.instagram.com:443. HTTP request sent, awaiting response... 200 OK Cookie coming from www.instagram.com attempted to set domain to i.instagram.com Cookie coming from www.instagram.com attempted to set domain to i.instagram.com Length: 45887 (45K) [text/html] Saving to: ‘leyendeckers_bn’ leyendeckers_bn 100%[===================>] 44.81K --.-KB/s in 0.02s
2020-06-21 22:42:21 (2.86 MB/s) - ‘leyendeckers_bn’ saved [45887/45887]

The idea that Instagram changed something seems to be true! Hope you/we can find a fix for that.

BuiltWithLogic commented 4 years ago

Hmm seems odd, question when redisplaying the images, are you linking them back to instagram? I wonder if somehow they are in fact doing something. Only ask as mine is working fine on the server, so I’m looking at what might be different to my setup and yours? I do link the image back to instagram. If you do too, then I’m outta ideas :(

:On 21 Jun 2020, at 21:47, Max Wessendorf notifications@github.com wrote:

Just to find out what is happening i simply did: wget https://www.instagram.com/leyendeckers_bn

On my local development-machine (MacOS) the output is:

➜ ~ wget https://www.instagram.com/leyendeckers_bn https://www.instagram.com/leyendeckers_bn --2020-06-21 22:41:00-- https://www.instagram.com/leyendeckers_bn https://www.instagram.com/leyendeckers_bn Resolving www.instagram.com http://www.instagram.com/ (www.instagram.com http://www.instagram.com/)... 2a03:2880:f23f:e5:face:b00c:0:4420, 157.240.27.174 Connecting to www.instagram.com http://www.instagram.com/ (www.instagram.com)|2a03:2880:f23f:e5:face:b00c:0:4420|:443 http://www.instagram.com)%7C2a03:2880:f23f:e5:face:b00c:0:4420%7C:443... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: https://www.instagram.com/leyendeckers_bn/ https://www.instagram.com/leyendeckers_bn/ [following] --2020-06-21 22:41:00-- https://www.instagram.com/leyendeckers_bn/ https://www.instagram.com/leyendeckers_bn/ Reusing existing connection to [www.instagram.com]:443. HTTP request sent, awaiting response... 200 OK Length: 35934 (35K) [text/html] Saving to: ‘leyendeckers_bn’ leyendeckers_bn 100%[===================>] 35.09K --.-KB/s in 0.02s 2020-06-21 22:41:01 (1.52 MB/s) - ‘leyendeckers_bn’ saved [130129]

On the production-maching (ubuntu18.04) i also get the redirect:

wget https://www.instagram.com/leyendeckers_bn https://www.instagram.com/leyendeckers_bn --2020-06-21 22:42:21-- https://www.instagram.com/leyendeckers_bn https://www.instagram.com/leyendeckers_bn Resolving www.instagram.com http://www.instagram.com/ (www.instagram.com http://www.instagram.com/)... 31.13.84.174, 2a03:2880:f207:e5:face:b00c:0:4420 Connecting to www.instagram.com http://www.instagram.com/ (www.instagram.com)|31.13.84.174|:443 http://www.instagram.com)|31.13.84.174|:443/... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: https://www.instagram.com/leyendeckers_bn/ https://www.instagram.com/leyendeckers_bn/ [following] --2020-06-21 22:42:21-- https://www.instagram.com/leyendeckers_bn/ https://www.instagram.com/leyendeckers_bn/ Reusing existing connection to www.instagram.com:443 http://www.instagram.com:443/. HTTP request sent, awaiting response... 302 Found Cookie coming from www.instagram.com http://www.instagram.com/ attempted to set domain to i.instagram.com Cookie coming from www.instagram.com http://www.instagram.com/ attempted to set domain to i.instagram.com Location: https://www.instagram.com/accounts/login/?next=/leyendeckers_bn/ https://www.instagram.com/accounts/login/?next=/leyendeckers_bn/ [following] --2020-06-21 22:42:21-- https://www.instagram.com/accounts/login/?next=/leyendeckers_bn/ https://www.instagram.com/accounts/login/?next=/leyendeckers_bn/ Reusing existing connection to www.instagram.com:443 http://www.instagram.com:443/. HTTP request sent, awaiting response... 200 OK Cookie coming from www.instagram.com http://www.instagram.com/ attempted to set domain to i.instagram.com Cookie coming from www.instagram.com http://www.instagram.com/ attempted to set domain to i.instagram.com Length: 45887 (45K) [text/html] Saving to: ‘leyendeckers_bn’ leyendeckers_bn 100%[===================>] 44.81K --.-KB/s in 0.02s 2020-06-21 22:42:21 (2.86 MB/s) - ‘leyendeckers_bn’ saved [45887/45887]

The idea that Instagram changed something seems to be true! Hope you/we can find a fix for that.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/marcopompili/django-instagram/issues/27#issuecomment-647179738, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJKJHC3F45ZS7KXOWRJ4HDRXZWXLANCNFSM4NSFA6LQ.

preinhart commented 4 years ago

Same here.

local development-machine:

Location: https://www.instagram.com/windfluechter_surfboards/ [following]
--17:31:28--  https://www.instagram.com/windfluechter_surfboards/
           => `index.html'
Resolving www.instagram.com... 69.171.250.174
Connecting to www.instagram.com[69.171.250.174]:443... connected.
HTTP request sent, awaiting response... 200 OK

production-machine:

Location: https://www.instagram.com/accounts/login/?next=/windfluechter_surfboards/ [following]
--2020-06-23 17:32:48--  https://www.instagram.com/accounts/login/?next=/windfluechter_surfboards/
Reusing existing connection to www.instagram.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 46579 (45K) [text/html]
Saving to: `index.html'

What I also noticed is that on the production machine ipv6 and local ipv4 is running. I hope somebody's got an idea.

josemarevalo commented 4 years ago

I've got the same issue :/

timwilson commented 4 years ago

Not sure if it's helpful to have another example, but here's what I'm getting.

local development machine (Mac running Django in Docker)

% wget https://www.instagram.com/tdwilson/
--2020-06-25 22:33:22--  https://www.instagram.com/tdwilson/
Resolving www.instagram.com (www.instagram.com)... 157.240.2.174
Connecting to www.instagram.com (www.instagram.com)|157.240.2.174|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 90473 (88K) [text/html]
Saving to: ‘index.html’

index.html          100%[===================>]  88.35K  --.-KB/s    in 0.06s

2020-06-25 22:33:23 (1.49 MB/s) - ‘index.html’ saved [90473/90473]

heroku instance

~ $ wget https://www.instagram.com/tdwilson/
--2020-06-26 03:35:18--  https://www.instagram.com/tdwilson/
Resolving www.instagram.com (www.instagram.com)... 31.13.66.174, 2a03:2880:f211:e5:face:b00c:0:4420
Connecting to www.instagram.com (www.instagram.com)|31.13.66.174|:443... connected.
GnuTLS: Resource temporarily unavailable, try again.
GnuTLS: Resource temporarily unavailable, try again.
HTTP request sent, awaiting response... 302 Found
Cookie coming from www.instagram.com attempted to set domain to i.instagram.com
Cookie coming from www.instagram.com attempted to set domain to i.instagram.com
Location: https://www.instagram.com/accounts/login/?next=/tdwilson/ [following]
--2020-06-26 03:35:18--  https://www.instagram.com/accounts/login/?next=/tdwilson/
Reusing existing connection to www.instagram.com:443.
HTTP request sent, awaiting response... 200 OK
Cookie coming from www.instagram.com attempted to set domain to i.instagram.com
Cookie coming from www.instagram.com attempted to set domain to i.instagram.com
Length: 45463 (44K) [text/html]
Saving to: ‘index.html’

index.html          100%[===================>]  44.40K  --.-KB/s    in 0.002s

2020-06-26 03:35:19 (20.0 MB/s) - ‘index.html’ saved [45463/45463]
timwilson commented 4 years ago

So from what I'm reading online, it looks like those of us who want to display Instagram photos from public accounts are dead in the water unless we do it using Instagram's Basic Display API.

Is there any interest among the maintainers of this package to make those changes?

dfirst commented 4 years ago

Some short research and possible temporary hotfix:

  1. Seems like now Instagram required to use authorized session(I failed to get set of images with anonymous session).
  2. I solved the problem by coping from browser network panel set of headers and pass them here: https://github.com/marcopompili/django-instagram/blob/master/django_instagram/scraper.py#L26
marcopompili commented 4 years ago

Ok so if I forward the headers of the client to the request it should stop redirecting to the login page?

dfirst commented 4 years ago

Ok so if I forward the headers of the client to the request it should stop redirecting to the login page?

Yes, this will solve the problem.

timwilson commented 4 years ago

So just to be clear, is this something that can be incorporated into a new release?

marcopompili commented 4 years ago

Addressed in commit: 2e30732afaad695bbbf2c40fdaf92515fd68346d Changing UA and Accept should be enough, If anyone could test the master branch on their prod env for confirmation so I would know if the fix works.

Claudio9701 commented 4 years ago

I have tried the master branch, it works fine locally but still giving the same error in prod env.

2020-07-24T04:54:39.536728+00:00 app[web.1]: django_instagram.templatetags.instagram_client - ERROR - path to profile media not found
2020-07-24T04:54:39.536738+00:00 app[web.1]: Traceback (most recent call last):
2020-07-24T04:54:39.536740+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/django_instagram/templatetags/instagram_client.py", line 34, in get_profile_media
2020-07-24T04:54:39.536741+00:00 app[web.1]: edges = profile['entry_data']['ProfilePage'][page]['graphql']['user']['edge_owner_to_timeline_media']['edges']
2020-07-24T04:54:39.536742+00:00 app[web.1]: KeyError: 'ProfilePage'

I'm using a heroku app to deploy the web.

I have tried to implement this solution found on StackOverflow, but it didn't work.

Implementation:

...
    try:
        url_login = 'https://www.instagram.com/accounts/login/'
        url_main = url_login + 'ajax/'
        auth = {'username': os.environ.get('IG_USER'), 'password': os.environ.get('IG_PASSWORD')}

        with requests.Session() as s:
            req = s.get(url_login)
            s.post(url_main, data=auth, headers={
                'x-csrftoken': req.cookies['csrftoken'],
                'referer': "https://www.instagram.com/accounts/login/",
                'User-Agent': headers['User-Agent'],
                'Accept': headers['Accept']
            })
            url = "https://www.instagram.com/{}/".format(username)
            page = s.get(url, headers={
                'User-Agent': headers['User-Agent'],
                'Accept': headers['Accept']
            })
            # Raise error for 404 cause by a bad profile name
            page.raise_for_status()
            return html.fromstring(page.content)
...

"Extended" logs:

2020-07-24T05:50:48.163614+00:00 app[web.1]: Profile: {'LoginAndSignupPage': [{'captcha': {'enabled': False, 'key': ''}, 'gdpr_required': False, 'tos_version': 'row', 'username_hint': ''}]}
2020-07-24T05:50:48.164208+00:00 app[web.1]: django_instagram.templatetags.instagram_client - ERROR - Profile: {'LoginAndSignupPage': [{'captcha': {'enabled': False, 'key': ''}, 'gdpr_required': False, 'tos_version': 'row', 'username_hint': ''}]}
2020-07-24T05:50:48.164209+00:00 app[web.1]: Traceback (most recent call last):
2020-07-24T05:50:48.164210+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/django_instagram/templatetags/instagram_client.py", line 34, in get_profile_media
2020-07-24T05:50:48.164210+00:00 app[web.1]: edges = profile['entry_data']['ProfilePage'][page]['graphql']['user']['edge_owner_to_timeline_media']['edges']
2020-07-24T05:50:48.164214+00:00 app[web.1]: KeyError: 'ProfilePage'
2020-07-24T05:50:48.164389+00:00 app[web.1]: django_instagram.templatetags.instagram_client - ERROR - path to profile media not found
2020-07-24T05:50:48.164390+00:00 app[web.1]: Traceback (most recent call last):
2020-07-24T05:50:48.164390+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/django_instagram/templatetags/instagram_client.py", line 34, in get_profile_media
2020-07-24T05:50:48.164391+00:00 app[web.1]: edges = profile['entry_data']['ProfilePage'][page]['graphql']['user']['edge_owner_to_timeline_media']['edges']
2020-07-24T05:50:48.164394+00:00 app[web.1]: KeyError: 'ProfilePage' 

I think the POST request sent with the auth param return a 400 response.

Thanks for your work @marcopompili . Let me know if more information is needed.


UPDATE

I found this explanation in a PHP Instagram scrapper Github gist. TLDR: Instagram "bans" IPs that make constant requests to the same URL. I'm not sure if it is right but it makes sense. Currently, I'm in a hurry so I decided to make the request on the front-end using js (example). Now it does retrieve the images in dev and prod env.

Maybe there is a way to make the request in the front-end (so the Instagram server gets our user IP and not the server IP for each request) and "receive" the response in the back-end so we can still use the handy templatetags and functionally of this Django app.

timwilson commented 4 years ago

@marcopompili Any update on this issue?

preinhart commented 4 years ago

Still this problem. I get (only in the product environment) the error message:

File "/env/lib/python2.7/site-packages/django_instagram/templatetags/instagram_client.py", line 28, in get_profile_media edges = profile['entry_data']['ProfilePage'][page]['graphql']['user']['edge_owner_to_timeline_media']['edges']

TypeError: 'NoneType' object has no attribute 'getitem'

thiagolara commented 4 years ago

Hi folks, I'm facing the same issue in production. Any update or an idea how to solve? What you guys did ?

rzw-gh commented 4 years ago

hi folks any fix yet?

Claudio9701 commented 4 years ago

Hi folks, I'm facing the same issue in production. Any update or an idea how to solve? What you guys did ?

My quick-fix solution was doing the scrapping over the front-end as I posted in https://github.com/marcopompili/django-instagram/issues/27#issuecomment-663360226.

I'm trying to find time to fork the repository and trying to implement a better solution.

rzw-gh commented 4 years ago

I have tried the master branch, it works fine locally but still giving the same error in prod env.

2020-07-24T04:54:39.536728+00:00 app[web.1]: django_instagram.templatetags.instagram_client - ERROR - path to profile media not found
2020-07-24T04:54:39.536738+00:00 app[web.1]: Traceback (most recent call last):
2020-07-24T04:54:39.536740+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/django_instagram/templatetags/instagram_client.py", line 34, in get_profile_media
2020-07-24T04:54:39.536741+00:00 app[web.1]: edges = profile['entry_data']['ProfilePage'][page]['graphql']['user']['edge_owner_to_timeline_media']['edges']
2020-07-24T04:54:39.536742+00:00 app[web.1]: KeyError: 'ProfilePage'

I'm using a heroku app to deploy the web.

I have tried to implement this solution found on StackOverflow, but it didn't work.

Implementation:

...
    try:
        url_login = 'https://www.instagram.com/accounts/login/'
        url_main = url_login + 'ajax/'
        auth = {'username': os.environ.get('IG_USER'), 'password': os.environ.get('IG_PASSWORD')}

        with requests.Session() as s:
            req = s.get(url_login)
            s.post(url_main, data=auth, headers={
                'x-csrftoken': req.cookies['csrftoken'],
                'referer': "https://www.instagram.com/accounts/login/",
                'User-Agent': headers['User-Agent'],
                'Accept': headers['Accept']
            })
            url = "https://www.instagram.com/{}/".format(username)
            page = s.get(url, headers={
                'User-Agent': headers['User-Agent'],
                'Accept': headers['Accept']
            })
            # Raise error for 404 cause by a bad profile name
            page.raise_for_status()
            return html.fromstring(page.content)
...

"Extended" logs:

2020-07-24T05:50:48.163614+00:00 app[web.1]: Profile: {'LoginAndSignupPage': [{'captcha': {'enabled': False, 'key': ''}, 'gdpr_required': False, 'tos_version': 'row', 'username_hint': ''}]}
2020-07-24T05:50:48.164208+00:00 app[web.1]: django_instagram.templatetags.instagram_client - ERROR - Profile: {'LoginAndSignupPage': [{'captcha': {'enabled': False, 'key': ''}, 'gdpr_required': False, 'tos_version': 'row', 'username_hint': ''}]}
2020-07-24T05:50:48.164209+00:00 app[web.1]: Traceback (most recent call last):
2020-07-24T05:50:48.164210+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/django_instagram/templatetags/instagram_client.py", line 34, in get_profile_media
2020-07-24T05:50:48.164210+00:00 app[web.1]: edges = profile['entry_data']['ProfilePage'][page]['graphql']['user']['edge_owner_to_timeline_media']['edges']
2020-07-24T05:50:48.164214+00:00 app[web.1]: KeyError: 'ProfilePage'
2020-07-24T05:50:48.164389+00:00 app[web.1]: django_instagram.templatetags.instagram_client - ERROR - path to profile media not found
2020-07-24T05:50:48.164390+00:00 app[web.1]: Traceback (most recent call last):
2020-07-24T05:50:48.164390+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/django_instagram/templatetags/instagram_client.py", line 34, in get_profile_media
2020-07-24T05:50:48.164391+00:00 app[web.1]: edges = profile['entry_data']['ProfilePage'][page]['graphql']['user']['edge_owner_to_timeline_media']['edges']
2020-07-24T05:50:48.164394+00:00 app[web.1]: KeyError: 'ProfilePage' 

I think the POST request sent with the auth param return a 400 response.

Thanks for your work @marcopompili . Let me know if more information is needed.

UPDATE

I found this explanation in a PHP Instagram scrapper Github gist. TLDR: Instagram "bans" IPs that make constant requests to the same URL. I'm not sure if it is right but it makes sense. Currently, I'm in a hurry so I decided to make the request on the front-end using js (example). Now it does retrieve the images in dev and prod env.

Maybe there is a way to make the request in the front-end (so the Instagram server gets our user IP and not the server IP for each request) and "receive" the response in the back-end so we can still use the handy templatetags and functionally of this Django app.

sir i don't know JavaScript can you help me use the script that you put in the UPDATE section in a right place

Claudio9701 commented 4 years ago

sir i don't know JavaScript can you help me use the script that you put in the UPDATE section in a right place

Yes I can, you have to add the code within a Githubissues.

  • Githubissues is a development platform for aggregating issues.