Closed gthedev closed 6 years ago
I am getting the same thing, just started happening an hour or so ago. Other methods are working.
Same here, work only if login with credentials
Same here.
+1 Same here
Hey, according last news about private data leak, Insta to close some anonymous api.
oh, sorry. this news https://www.instagram.com/developer/changelog/ clarified the situation?
+1 . But If use login with credentials, sometimes I get this exception 'InstagramScraper\Exception\InstagramAuthException' with message 'Something went wrong. Please report issue.'
Fatal error: Uncaught exception 'InstagramScraper\Exception\InstagramException' with message 'Response code is 403. Body: message => forbidden; status => fail; Something went wrong. Please report issue.' in /InstagramScraper/Instagram.php:315 Stack trace: #0 /InstagramScraper/Instagram.php(272): InstagramScraper\Instagram->getMediasByUserId(4919194635, 3, '') #1
@zaivst I tried to get the next page of medias but without success 😕
I think query_id, or query_hash is one of thinks to make this work again, but i'm not a developer just a layman 😔
I made a solution for this one, but in python using automated browser to retrieve cookies and new URL. Really don't know how PHP implementation would look like, but this are the steps to do:
'https://www.instagram.com/graphql/query/?query_hash=42323d64886122307be10013ad2dcc44&variables={"id":"<user_id>","first":<items_to_retrieve>,"after":"<end_cursor>"}'
where Disclaimer 1: no authorization needed! Disclaimer 2: actually I reused the same cookies several times and it worked. The expiry seams to be set in one year. But I don't know if Instagram will catch the usage of cookies from many different clients if hardcoded to this scapper!
Python implementation:
# ! Error handling is omitted for clarity
import requests
from selenium import webdriver
media_url = 'https://www.instagram.com/graphql/query/?query_hash=42323d64886122307be10013ad2dcc44&variables={"id":"%s","first":20,"after":"%s"}'
browser = webdriver.Chrome()
# first get https://instagram.com to obtain cookies
browser.get('https://instagram.com')
browser_cookies = browser.get_cookies()
# set a session with cookies
session = requests.Session()
for cookie in browser_cookies:
c = {cookie['name']: cookie['value']}
session.cookies.update(c)
# get response as JSON
# > using id '25025320' - profile of Instagram for this example
response = session.get(media_url % ('25025320', ''), verify=False).json()
https://www.instagram.com/vasiliizaikovskii/?__a=1 this works!
@zaivst this one works and always worked, but, unfortunately, only for first 12 records of the profile. It worked before to retrieve next chunk of media, adding max_id
parameter, but now it is just ignored.
this one works and always worked
All other features works too until 4 apr :)
Same problem here.
@raiym @rhcarlosweb @gthedev hi! I really don't know PHP to help with this one, but maybe the quick hotfix would be:
ACCOUNT_MEDIAS = 'https://instagram.com/graphql/query/?query_id=17888483320059182&id={user_id}&first={count}&after={max_id}';
to:
ACCOUNT_MEDIAS = https://www.instagram.com/graphql/query/?query_hash=42323d64886122307be10013ad2dcc44&variables={"id":"{user_id}","first":{count},"after":"{max_id}"}
[
{
"domain": "www.instagram.com",
"httpOnly": false,
"name": "rur",
"path": "/",
"secure": false,
"value": "PRN"
},
{
"domain": "www.instagram.com",
"httpOnly": false,
"name": "ig_vw",
"path": "/",
"secure": false,
"value": "1038"
},
{
"domain": "www.instagram.com",
"expiry": 1554672942.248612,
"httpOnly": false,
"name": "csrftoken",
"path": "/",
"secure": true,
"value": "ObRXje2ByOUmAnxqPaoFsD0CHvBEK8dQ"
},
{
"domain": "www.instagram.com",
"expiry": 2153943342.248646,
"httpOnly": false,
"name": "mid",
"path": "/",
"secure": false,
"value": "WsqLMgALAAFkkaMz9rbL568BCU5N"
},
{
"domain": "www.instagram.com",
"httpOnly": false,
"name": "ig_vh",
"path": "/",
"secure": false,
"value": "532"
},
{
"domain": "www.instagram.com",
"httpOnly": false,
"name": "ig_pr",
"path": "/",
"secure": false,
"value": "2.5"
}
]
Maybe this is not final solution, but at least media queries will work (for some time 😅)
@myrs thanks bro! i tryed the same here, and works! :D
@myrs when i try this in browser - it works, and when I try to do this changes in scraper it returns 403 status
@myrs how to edit the cookies we send? I didnt understand the step-2. Thanks for your time
@dionii1 As i said, unfortunately, I'm not really familiar with PHP =S As far as I understand, you should send cookies in header for this request. Maybe this piece of code from https://github.com/postaddictme/instagram-php-scraper/blob/master/src/InstagramScraper/Instagram.php could be relevant to make necessary changes:
$mid = $cookies['mid'];
$csrfToken = $cookies['csrftoken'];
$headers = ['cookie' => "csrftoken=$csrfToken; mid=$mid;",
'referer' => Endpoints::BASE_URL . '/',
'x-csrftoken' => $csrfToken,
];
$response = Request::post(Endpoints::LOGIN_URL, $headers,
['username' => $this->sessionUsername, 'password' => $this->sessionPassword]);
@zaivst what changes have you made?
@myrs I changed ACCOUNT_MEDIAS constant in Endpoints.php and Request::get() function in getMediasByUserId() returns 403 status. But if I try to use string which is returns by Endpoints::getAccountMediasJsonLink($id, $maxId) in browser - it returns correct response.
@myrs this
{
"domain": "www.instagram.com",
"httpOnly": false,
"name": "ig_pr",
"path": "/",
"secure": false,
"value": "2.5"
}
You know what is it? With this i maded work recent medias without credentials
@zaivst as I understand, this is because cookies are not set. @zaivst @dionii1 could you check this one? One more time, I'm not familiar with PHP and structure of this project, buy I imagine here https://github.com/postaddictme/instagram-php-scraper/blob/master/src/InstagramScraper/Instagram.php#L209 (line 209) is where cookies are set and this part should be changed to the following logic: if there is no session, use default cookies (actually, this is what I'm doing in python):
private function generateHeaders($session)
{
$headers = []
if ($session) {
$cookies = '';
foreach ($session as $key => $value) {
$cookies .= "$key=$value; ";
}
$headers = [
'cookie' => $cookies,
'referer' => Endpoints::BASE_URL . '/',
'x-csrftoken' => $session['csrftoken'],
];
} else {
$rur = "PRN";
$ig_vw = "1038"
$csrftoken = "ObRXje2ByOUmAnxqPaoFsD0CHvBEK8dQ"
$mid = "WsqLMgALAAFkkaMz9rbL568BCU5N"
$ig_vh = "532"
$ig_pr = "2.5"
$headers = ['cookie' => "rur=$rur; ig_vw=$ig_vw; csrftoken=$csrftoken; mid=$mid; ig_vh=$ig_vh; ig_pr=$ig_pr;",
'referer' => Endpoints::BASE_URL . '/',
'x-csrftoken' => $csrftoken,
];
}
if ($this->getUserAgent()) {
$headers['user-agent'] = $this->getUserAgent();
}
return $headers;
}
Also, this line should be changed to:
https://github.com/postaddictme/instagram-php-scraper/blob/master/src/InstagramScraper/Instagram.php#L313
$response = Request::get(Endpoints::getAccountMediasJsonLink($id, $count, $maxId), $this->generateHeaders($this->userSession));
to include count parameter.
Disclaimer: none of this code was run and supposed to work fine 😅
@rhcarlosweb wow! Only this cookie is really needed? Sincerely I don't know what this are. Just was sending all cookies I get. And so, you have a working version?
@rhcarlosweb wow!! This actually worked for me!! Seams to be some magic, but just using this cookie resolved a whole deal!! So, there is no need to switch to URL I provided before, but using only this cookies with new URL works fine too.
Nice @myrs and thanks for the cookie value, because i don't know how to get this value haha
@myrs Strange because i have test with a blank value of $this->userSession['ig_pr'] = "";
and works too..
😕 confused haha
@ryantbrown 👌 So, just waiting for this pull request to be approved
Strange because i have test with a blank value of $this->userSession['ig_pr'] = ""; and works too..
M.. maybe Instagram is just waiting this cookie name, no matter the value. Because setting it to some random value, e.g. 42
works fine too!
But yes, when ig_pr
not present, returns 403 code.
Nice user private data protection system, anyway 😅
@myrs ya hopefully @rhcarlosweb's PR gets approved quickly. The only thing that is unclear is whether the csrftoken
should be set if missing. It seems to work with an empty string but the code you provided suggests it could be set using to ObRXje2ByOUmAnxqPaoFsD0CHvBEK8dQ
as well.
Either way I think its good to go.
@myrs thanks. it's worked for me.
You guys can get the "csrf_token" on login page HTML with REGEX.
https://instagram.com/accounts/login/
I'm getting the csrf_token from this page and using on cookies.
Next on: "{"activity_counts":null,"config":{"csrf_token":"SDYkHgQQsFkO1bCPKDWh35HEaoSOV7rM","viewer":null},""
I can't send the code because i'm using C#
@luanrox i think if added
$cookies = static::parseCookies($response->headers['Set-Cookie']);
$this->userSession['csrftoken'] = $cookies['csrftoken'];
like the others requests getMediasByTag
it's gonna work
Today the error back again =\ someone with the same issue?
I have tested again with new userSession cookie and $this->userSession['ig_pr']
not work anymore, now needs to use a $this->userSession['sessionid']
and i think is only generate with auth =p
Shit :( Stop works here too...
Haven't cracked it yet, but here's what I know so far.
When logged in, cookie 'sessionid' is required.
When not logged in, a new header is required: 'x-instagram-gis: 4b698621d4a2ef5913f90aec25475d04'
I don't know how x-instagram-gis
is calculated, but it appears to be a encryption of the parameters. The x-instagram-gis
is recalculated for each pagination request, but is the same for the same request. It looks to be some crypto hashing function definitely involving the variables parameter and who knows what else.
I've tried to look at the obfuscated js to see what kind of encryption they are doing, but I haven't found it. Maybe someone can help take a look as well.
Perhaps they are using the same encryption technique that changed query_id
to query_hash
? Does anyone know how that's encrypted? It is a 32 char output, so i tried to play with md5, but no go.
@andrewyoo x-instagram-gis
calculated with csrf_token
, rhx_gis
, window.navigator.userAgent
and variables from API call. Here is my refactored hashing function:
function gishash(n,r,t){function e(n,r){var t=(65535&n)+(65535&r);return(n>>16)+(r>>16)+(t>>16)<<16|65535&t}function o(n,r,t,o,u,c){return e((f=e(e(r,n),e(o,c)))<<(a=u)|f>>>32-a,t);var f,a}function u(n,r,t,e,u,c,f){return o(r&t|~r&e,n,r,u,c,f)}function c(n,r,t,e,u,c,f){return o(r&e|t&~e,n,r,u,c,f)}function f(n,r,t,e,u,c,f){return o(r^t^e,n,r,u,c,f)}function a(n,r,t,e,u,c,f){return o(t^(r|~e),n,r,u,c,f)}function i(n,r){var t,o,i,h,g;n[r>>5]|=128<<r%32,n[14+(r+64>>>9<<4)]=r;var v=1732584193,d=-271733879,l=-1732584194,A=271733878;for(t=0;t<n.length;t+=16)o=v,i=d,h=l,g=A,d=a(d=a(d=a(d=a(d=f(d=f(d=f(d=f(d=c(d=c(d=c(d=c(d=u(d=u(d=u(d=u(d,l=u(l,A=u(A,v=u(v,d,l,A,n[t],7,-680876936),d,l,n[t+1],12,-389564586),v,d,n[t+2],17,606105819),A,v,n[t+3],22,-1044525330),l=u(l,A=u(A,v=u(v,d,l,A,n[t+4],7,-176418897),d,l,n[t+5],12,1200080426),v,d,n[t+6],17,-1473231341),A,v,n[t+7],22,-45705983),l=u(l,A=u(A,v=u(v,d,l,A,n[t+8],7,1770035416),d,l,n[t+9],12,-1958414417),v,d,n[t+10],17,-42063),A,v,n[t+11],22,-1990404162),l=u(l,A=u(A,v=u(v,d,l,A,n[t+12],7,1804603682),d,l,n[t+13],12,-40341101),v,d,n[t+14],17,-1502002290),A,v,n[t+15],22,1236535329),l=c(l,A=c(A,v=c(v,d,l,A,n[t+1],5,-165796510),d,l,n[t+6],9,-1069501632),v,d,n[t+11],14,643717713),A,v,n[t],20,-373897302),l=c(l,A=c(A,v=c(v,d,l,A,n[t+5],5,-701558691),d,l,n[t+10],9,38016083),v,d,n[t+15],14,-660478335),A,v,n[t+4],20,-405537848),l=c(l,A=c(A,v=c(v,d,l,A,n[t+9],5,568446438),d,l,n[t+14],9,-1019803690),v,d,n[t+3],14,-187363961),A,v,n[t+8],20,1163531501),l=c(l,A=c(A,v=c(v,d,l,A,n[t+13],5,-1444681467),d,l,n[t+2],9,-51403784),v,d,n[t+7],14,1735328473),A,v,n[t+12],20,-1926607734),l=f(l,A=f(A,v=f(v,d,l,A,n[t+5],4,-378558),d,l,n[t+8],11,-2022574463),v,d,n[t+11],16,1839030562),A,v,n[t+14],23,-35309556),l=f(l,A=f(A,v=f(v,d,l,A,n[t+1],4,-1530992060),d,l,n[t+4],11,1272893353),v,d,n[t+7],16,-155497632),A,v,n[t+10],23,-1094730640),l=f(l,A=f(A,v=f(v,d,l,A,n[t+13],4,681279174),d,l,n[t],11,-358537222),v,d,n[t+3],16,-722521979),A,v,n[t+6],23,76029189),l=f(l,A=f(A,v=f(v,d,l,A,n[t+9],4,-640364487),d,l,n[t+12],11,-421815835),v,d,n[t+15],16,530742520),A,v,n[t+2],23,-995338651),l=a(l,A=a(A,v=a(v,d,l,A,n[t],6,-198630844),d,l,n[t+7],10,1126891415),v,d,n[t+14],15,-1416354905),A,v,n[t+5],21,-57434055),l=a(l,A=a(A,v=a(v,d,l,A,n[t+12],6,1700485571),d,l,n[t+3],10,-1894986606),v,d,n[t+10],15,-1051523),A,v,n[t+1],21,-2054922799),l=a(l,A=a(A,v=a(v,d,l,A,n[t+8],6,1873313359),d,l,n[t+15],10,-30611744),v,d,n[t+6],15,-1560198380),A,v,n[t+13],21,1309151649),l=a(l,A=a(A,v=a(v,d,l,A,n[t+4],6,-145523070),d,l,n[t+11],10,-1120210379),v,d,n[t+2],15,718787259),A,v,n[t+9],21,-343485551),v=e(v,o),d=e(d,i),l=e(l,h),A=e(A,g);return[v,d,l,A]}function h(n){var r,t="",e=32*n.length;for(r=0;r<e;r+=8)t+=String.fromCharCode(n[r>>5]>>>r%32&255);return t}function g(n){var r,t=[];for(t[(n.length>>2)-1]=void 0,r=0;r<t.length;r+=1)t[r]=0;var e=8*n.length;for(r=0;r<e;r+=8)t[r>>5]|=(255&n.charCodeAt(r/8))<<r%32;return t}function v(n){var r,t,e="";for(t=0;t<n.length;t+=1)r=n.charCodeAt(t),e+="0123456789abcdef".charAt(r>>>4&15)+"0123456789abcdef".charAt(15&r);return e}function d(n){return unescape(encodeURIComponent(n))}function l(n){return h(i(g(r=d(n)),8*r.length));var r}return v(l(r+":"+t+":"+window.navigator.userAgent+":"+n))}
Call this function like this: gishash("{\"id\":\"5821462185\",\"first\":40,\"after\":\"\"}", rhx_gis, csrf_token)
.
rhx_gis
and csrf_token
can be parsed from any embed
page source (CORS
available on this links);
I've tried to archive this via javascript but here is the problem: I can't set these custom headers due allow-origin
limitation for custom headers on instagram side, but this is not a problem in php I guess.
@350d, wow, that actually worked! How did you go about extracting (or building) that hashing function? Just curious for future endeavors. Thanks!
@andrewyoo just simple debug in browser
Seems like mentioned solutions here are no longer working :(
Need to find out the way the cookie param is generated for non-authorized users.
I've just realized that x-instagram-gis
is just an md5
hash 😀
@350d have you tried to make some requests? Because x-instagram-gis
is not enough now for non-authorized users. As i understand need valid cookie data which is generated on every request.
@footniko you should generate this header for every request
Yes, but only this header is not enough...
var data = null;
var xhr = new XMLHttpRequest();
xhr.withCredentials = true;
xhr.addEventListener("readystatechange", function () {
if (this.readyState === 4) {
console.log(this.responseText);
}
});
xhr.open("GET", "https://www.instagram.com/graphql/query/?query_hash=bfe6fc64e0775b47b311fc0398df88a9&variables=%7B%22user_id%22%3A%224502807%22%2C%22include_chaining%22%3Afalse%2C%22include_reel%22%3Afalse%2C%22include_suggested_users%22%3Afalse%2C%22include_logged_out_extras%22%3Atrue%7D");
xhr.setRequestHeader("x-instagram-gis", "2b92887dc6325064cf6294b95aa04586");
xhr.setRequestHeader("cache-control", "no-cache");
xhr.send(data);
Returns:
{
"message": "forbidden",
"status": "fail"
}
add X-CSRFToken
and X-Instagram-AJAX: 1
headers
@350d so you mean following cookies are enough? x-instagram-gis : generate randomized md5 hash X-CSRFToken : csrftoken X-Instagram-AJAX: 1
@knissophiliac these headers are not randomized, they are calculated from current csrf_token
and rhx_gis
and other vars.
@350d ok. i read your comment above, but i couldn't find rhx_gis
one in my responses.
Using
getMediasByUserId
returns error, the returned body is:{"message": "forbidden", "status": "fail"}
Is there way to get around this?