sherlock-project / sherlock

Hunt down social media accounts by username across social networks
https://sherlockproject.xyz
MIT License
60.37k stars 6.95k forks source link

Yandex Music has a captcha #2071

Closed 00-kat closed 6 months ago

00-kat commented 7 months ago

Checklist

Description

Here's a random username that can't possibly exist: ecfhlmiuewfimcuhem.

Here's the username from data.json: ya.playlist

When I visit either, I get a captcha (note: JS is disabled in my browser): image

Unless Sherlock uses Selenium/Pyppeteer, which i highly doubt (it's not in requirements.txt), this captcha isn't really avoidable (I think). Maybe it even shows up with JS enabled, which I didn't check.

I'm not opening a PR removing YandexMusic because it could be an issue that only happens for me, or maybe it's possible to bypass this captcha.

ppfeister commented 7 months ago

@cd-CreepArghhh Can you share the raw html used for that page? I'll likely be able to add it to #2068

It won't bypass the captcha until circumvention is added, but it would avoid F+ hits due to the captcha when it's presented

00-kat commented 7 months ago

Huh, interestingly there's no captcha now (so it's not a JS issue) but there's a 404 page and a profile. Maybe I'll run Sherlock a couple times then try again.

ppfeister commented 7 months ago

If you do end up hitting it again drop a ping

Testing yandex in a PITA on my end having to use vpns and such, and even when I do, it apparently trusts me implicitly and refuses to rate limit or captcha me

ppfeister commented 7 months ago

(if the captcha page returns a status code other than 200, we can also use that as a simpler resolution)

00-kat commented 7 months ago

Okay, found out that spamming them with requests gets you a captcha fast. Running Sherlock 4 times resulted in one captcha, and my browser got 2 in 6 requests.

You're going to have to run the HTML through some prettifier though (I don't know any) since it's all on one line.

Note: Github won't let me upload .html files, so rename the .txt to a .html, thanks.

Oops, Captcha!.txt Oops, Captcha!_files.zip

I'll spam a few requests with python now to check the status code.

Edit: the captcha page (some long URL with a hash or Base64 string in it) returns 200, I'll see what I get when redirected from the profile page (probably 200, so don't wait for me to finish).

00-kat commented 7 months ago

Finished. Out of 100 requests, the first request was a 404 (i.e. no captcha) then the rest were all 200s (thus captcha). No 302s either I think, since IIRC requests doesn't automatically resolve those. Status code isn't going to be of any use.

ppfeister commented 7 months ago

Gonna push a hopeful fix. If you want to be added as a co-author you can drop your github no-reply email/other github email here and a name. Or link to somewhere that has it.

Otherwise I'll push as a single committer.

00-kat commented 7 months ago

Just push as single committer

ppfeister commented 7 months ago

Done. Seems to have not broken anything on my end -- can you pull and validate all 3 cases as well

(captcha, valid, not valid)

ppfeister commented 7 months ago

Just realized I forgot a case --- 'not valid in country'. Will add that now. Shouldn't make a difference for the captcha tests.

Edit::: that's actually accounted for by the 404 msg I added, so we're good

00-kat commented 7 months ago

I don't think it worked, since there's still a false-positive. By the way, I'm pretty sure I'm still in the blacklist or whatever Yandex Music has going on, so it will be a while before I can test the other two cases.

$ git clone https://github.com/ppfeister/sherlock.git  # hope I cloned the right repo...
$ cd sherlock
$ python sherlock ecfhlmiuewfimcuhem --site YandexMusic
[*] Checking username ecfhlmiuewfimcuhem on:
[+] YandexMusic: https://music.yandex/users/ecfhlmiuewfimcuhem/playlists

[*] Search completed with 1 results
ppfeister commented 7 months ago

hm......... lemme re eval and get back

ppfeister commented 7 months ago

@cd-CreepArghhh Just got back

Noticed that you didn't run with the --local flag. When you don't use this flag, it pulls from the repo by default instead of our local patched data.json. Can you test one more time but while using that flag? (this won't be necessary if the patch gets merged upstream)

When using that flag on my end, it seems to give the expected result for each of the four cases (not valid, valid, captcha, geoblock).

(that flag messes with me quite a bit.....)

Edit: you do not need to re-pull unless it's been deleted

00-kat commented 7 months ago

Yay, it works! ecfhlmiuewfimcuhem doesn't show up, ya.playlist does, and I didn't get any false positives even after spamming the command 30+ times. I didn't realise that it grabbed a data.json from GitHub instead of the local one by default (probably so you don't need to git pull as often).

Also, I'm not sure what the geoblock case is so I can't really test that. (I assume I could try running it through a bunch of tor nodes until I hit it, but I don't have time for that right now).

ppfeister commented 7 months ago

I get geoblocked here in the USA, so it was an easy test for me to run, lol

I'll go ahead and link your Issue to that PR so it gets closed when and if it (hopefully) gets merged