sherlock-project / sherlock

Hunt down social media accounts by username across social networks
https://sherlockproject.xyz
MIT License
59.84k stars 6.89k forks source link

False positives #541

Closed dd-pardal closed 4 years ago

dd-pardal commented 4 years ago

False positives for any username

Here's an example output for a "random" username:

https://www.capfriendly.com/users/amffdsfjvidsvck
https://www.codechef.com/users/amffdsfjvidsvck
https://www.ebay.com/usr/amffdsfjvidsvck
https://www.gpsies.com/mapUser.do?username=amffdsfjvidsvck
https://www.twitter.com/amffdsfjvidsvck
Total Websites Username Detected On : 5

GPSies moved its website. CapFriendly is especially weird. It seems to generate random details for nonexistent users. The other ones say that the user doesn't exist.

False positives for usernames with .

https://ask.fm/fghfgn.tiojydf
https://profil.chatujme.cz/fghfgn.tiojydf
https://coderwall.com/fghfgn.tiojydf
https://my.flightradar24.com/fghfgn.tiojydf
https://www.house-mixes.com/profile/fghfgn.tiojydf
https://www.ifttt.com/p/fghfgn.tiojydf
http://fghfgn.tiojydf.insanejournal.com/profile
https://tamtam.chat/fghfgn.tiojydf
https://www.taringa.net/fghfgn.tiojydf
https://t.me/fghfgn.tiojydf
https://trashbox.ru/users/fghfgn.tiojydf
https://easyen.ru/index/8-0-fghfgn.tiojydf
https://elwo.ru/index/8-0-fghfgn.tiojydf
http://ingvarr.net.ru/index/8-0-fghfgn.tiojydf
https://www.metacritic.com/user/fghfgn.tiojydf
http://pedsovet.su/index/8-0-fghfgn.tiojydf
https://radioskot.ru/index/8-0-fghfgn.tiojydf

False positives for usernames with _

https://fghfgn_tiojydf.en.aptoide.com/

Hostnames can't contain underscores, by the way. Aptoide redirects to the homepage.

False positives for usernames with -

https://my.flightradar24.com/fghfgn-tiojydf
https://t.me/fghfgn-tiojydf
https://www.opennet.ru/~fghfgn-tiojydf

Others

Yandex sometimes redirects to a captcha, originating a false positive.

DASnoeken commented 4 years ago

I've just written a webcrawler using BeautifulSoup that can (right now) only check Twitter. That seems to fix the problem (only for twitter of course, i.e. 'realdonaldtrump' comes back positive and 'amffdsfjvidsvck' comes back negative). I think this is more like a (inefficient) dirty quick fix though, especially since I'm only a beginner Python programmer. It could be extended to other websites, but I don't know if it's really worth it.

sdushantha commented 4 years ago

@dd-pardal I have dealt with these sites, which are some of the ones you mentioned above: https://trashbox.ru https://ask.fm https://www.house-mixes.com http://insanejournal.com https://www.ifttt.com https://flightradar24.com

jeroenev commented 4 years ago

i'm getting different false positives.

[+] GPSies: https://www.gpsies.com/mapUser.do?username=GArbadgershkj3484153sdf35s4f1s5a3df4a6s81f5d3sa486se4f53
[+] MeetMe: https://www.meetme.com/GArbadgershkj3484153sdf35s4f1s5a3df4a6s81f5d3sa486se4f53
[+] OpenCollective: https://opencollective.com/GArbadgershkj3484153sdf35s4f1s5a3df4a6s81f5d3sa486se4f53
[+] SportsTracker: https://www.sports-tracker.com/view_profile/GArbadgershkj3484153sdf35s4f1s5a3df4a6s81f5d3sa486se4f53
[+] YandexCollection: https://yandex.ru/collections/user/GArbadgershkj3484153sdf35s4f1s5a3df4a6s81f5d3sa486se4f53/
[+] boingboing.net: https://bbs.boingboing.net/u/GArbadgershkj3484153sdf35s4f1s5a3df4a6s81f5d3sa486se4f53
jeroenev commented 4 years ago

TamTam and RubyGem seem to return false positives for strings starting with a Number. For RubyGem: using 58any8random7string4here returns user with profile "58", so the site silently drops everything after the first non-numerical character. If the first character is NOT a numerical character, it does a search on profile name. So basically it gets the profile by ID if the first character is numerical, and searches on profile name otherwise.

sdushantha commented 4 years ago

I think it is best to remove these sites as I cant find any username rules for these sites: https://easyen.ru/index/8-0-fghfgn.tiojydf https://elwo.ru/index/8-0-fghfgn.tiojydf http://ingvarr.net.ru/index/8-0-fghfgn.tiojydf http://pedsovet.su/index/8-0-fghfgn.tiojydf https://radioskot.ru/index/8-0-fghfgn.tiojydf

richardgetz commented 4 years ago

Just to clarify, some of these "false positives" may occur because your IP is being flagged as suspicious. If this is happening, capturing the error/captcha page would be helpful to note a fail/error.

GandelXIV commented 4 years ago

There is also a problem with https://forum.redsun.tf/.

sdushantha commented 4 years ago

I'm just gonna put this list here so that we can keep track of the sites that have been listed in this thread:


Let me know if I have made any mistakes

richardgetz commented 4 years ago

Some of these are likely occurring due to username format. If you added regex checks not disallow periods in usernames the large majority of these will disappear.

rodrigograca31 commented 4 years ago

I'm having lots of false positives.

I don't know how you guys check if the user exists or not but when I manually check the found URLs a good portion says "user not found 404".

Here are some examples: https://www.investing.com/ https://opencollective.com/ https://www.tiktok.com/ https://www.wikipedia.org/

tiktok would be a very important one to get fixed....🤔

sdushantha commented 4 years ago

@rodrigograca31 TikTok was removed a while ago. Are you using an older version of Sherlock? https://github.com/sherlock-project/sherlock/blob/master/removed_sites.md#tiktok image

rodrigograca31 commented 4 years ago

Oh... True... I git cloned the repo 3 months ago... I should update... My bad.

I was about to ask why to remove TikTok but I gave it a trie and seems not easy to figure out if a user exists or not.

EDIT: Actually Im not sure if this will be useful but doing a wget on an existing user returns a page with JSON that includes metaParams object/string in the code... (regex could detect that.)

sdushantha commented 4 years ago

@rodrigograca31 Regex would be nice. That means that I'd have to change the code a little in sherlock.py. Because at the moment, we are check if the errorMsg is in r.text. Instead, we could do a re.findall(REGEX, r.text).

I'll try do add that into sherlock.py and see if everything works properly. But it might be a while before I get started because I'm pretty busy

roopeshvs commented 4 years ago

Polarsteps seem to report false positives most of the time too.

Screenshot (88)

P.S: It always redirects to /user-not-found when the user is a false positive. Maybe it can help in patching this specifically.

sdushantha commented 4 years ago

@roopeshvs If remember correctly, the checking of the redirect url does not actually work. https://github.com/sherlock-project/sherlock/blob/master/sherlock/sherlock.py#L356

Its been a very long time since I've properly looked at the source code, so Im not entirely sure what is going on. But Im sure if I take look at it when I get some time, I'll get a better understanding of whats going on

sdushantha commented 4 years ago

@roopeshvs

Polarsteps seem to report false positives most of the time too.

I did some research and I found out that we can use this endpoint check usernames:

https://www.polarsteps.com/validation/unique

With this data

field=users.username
value={{USERNAME}}

The only problem is that it is a POST request, and Sherlock currently does not do POST requests. So I'll have to implement that and a way to tell Sherlock to do a POST request by looking at the data.json

roopeshvs commented 4 years ago

@sdushantha Found a GET API from Polarsteps that would suit us better.

https://api.polarsteps.com/users/byusername/USERNAME

Also, the previously claimed username is a mistake, it didn't exist. :(

sdushantha commented 4 years ago

Metacritic can fixed by using their API endpoints, but again, I will need to add the ability to do POST requests:

import requests

data = {
  'check_username': '1',
  'userName': 'username'
}

response = requests.post('https://www.metacritic.com/signup', data=data)

output

{
  "viewer": {},
  "mixpanelToken": "6e219fd5dbf2cb77082a6cebb50b01a5",
  "mixpanelDistinctId": "123.12.314.14",
  "omnitureDebug": 0,
  "errors": {
    "username": "The username you have entered is not available."
  }
}
roopeshvs commented 4 years ago

@sdushantha Metacritic is working fine. The only case that is wrong is when we have a dot(.) in the username which is an illegal character for username in the site! Just change Regex and we are good, I guess.

nohupt commented 4 years ago

also: freelance.habr

https://freelance.habr.com/freelancers/testing123boyeeeee

nohupt commented 4 years ago

and: tracr.co

https://tracr.co/users/1/testing123boyeeeee

GrbavaCigla commented 4 years ago

I get these false positives: https://500px.com/ https://cash.me/ https://www.clozemaster.com/ https://www.colourlovers.com/ https://www.wikipedia.org/

sdushantha commented 4 years ago

@GrbavaCigla What version of Sherlock are you using? Some of the sites you mentioned had been removed in the past due to false positives. Also, Sherlock now automatically fetches the site list from GitHub instead of using the local one.

Please try using the latest version of Sherlock and let me know if you still get the false positives you mentioned above.

enodr commented 4 years ago

I can confirm with latest version false positives for: https://www.clozemaster.com/ (uses method status_code but non existing accounts 302 redirects to /dashboard) https://4pda.ru - displays an error for non existing account but sherlock gives me false positive

sdushantha commented 4 years ago

@enodr I have now fixed the false positive for Closemaster in 87483b5 Regarding 4pda, I'm not getting any false positives:


Screenshot 2020-08-31 at 20 25 34


enodr commented 4 years ago

Regarding 4pda, I'm not getting any false positives:

I figured what the issue is with 4pda: I am landing on an anti-robot page because my IP is flagged for whatever reason on their site. The current rule for 4pda is to match an error message if the account is not found. It would be more reliable to invert the logic and check for a regex only if the account is found.

sdushantha commented 4 years ago

@enodr

check for a regex only if the account is found.

That would be a great idea, but that would be something we would need to add to Sherlock. I currently dont have much time to work it. But when I do, I'll work on it.

enodr commented 4 years ago

I found a more reliable way for 4pda: https://4pda.ru/forum/index.php?act=auth&action=chkname&login=greenxxx This url returns a json array with 3 elements. If the first element is 0, the username exists, if 1 it does not exist. Can sherlock handle just test condition (check if json and if item[0] == 0) ?

sdushantha commented 4 years ago

@enodr We can do a simple check for an error message, where the error message is [1,false,0]. I removed 4pda yesterday, but since we found a solution, we can add it back. I'll do it later do today.

sdushantha commented 4 years ago

@enodr I have fixed 4pda now in ddecc14

sdushantha commented 4 years ago

@nohupt Not sure why you are getting false positives. Sherlock seems to give me the correct response:

$ python3 sherlock -l --site "tracr.co" testing123boyeeeee

[*] Checking username testing123boyeeeee on:
[-] tracr.co: Not Found!
$ python3 sherlock -l --site "freelance.habr" testing123boyeeeee

[*] Checking username testing123boyeeeee on:
[-] Freelance.habr: Not Found!

I will be closing this issue because it looks all the sites that have been mentioned in this issue has been dealt with.