sherlock-project / sherlock

Hunt down social media accounts by username across social networks
https://sherlockproject.xyz
MIT License
59.99k stars 6.91k forks source link

Contributing Bulk Dataset #855

Open theabbie opened 3 years ago

theabbie commented 3 years ago

Checklist

Description

I have been creating accounts on websites which allow a public profile page for a long time, And, I keep track of all those pages. Here I have, the list of all those pages.

https://theabbie.github.io/list.json

It has 725 links, which is surely greater than what we have currently here (301)

Most of these have username theabbie, so, it will be easy to convert them to required format.

Thus, I would love to contribute the list to this project.

I could create a PR for this once I get approval that it is useful.

Thank you ...

theabbie commented 3 years ago

@sdushantha

darvell commented 3 years ago

There are some cases here where usernames need more than the name, some need logging in, but there is a possibility of writing a quick and dirty script to generate some rules if either the page title or page text is different.

theabbie commented 3 years ago

@darvell I have checked all my URLs, they are public profiles, they don't need logging in. we have to visit the page and check if it's not 404 and that's it.

darvell commented 3 years ago

Well, the discord one for example needs both logging in and it's not an indicator for a username. LinkedIn is another, you won't get a 404. There's a lot of great things in this data set, I know I'm willing to try to parse this all in to a data.json but I am just saying that there are some things that do need authentication and it's not as simple as a 404 check, the fact you have this collected in a nice JSON file makes a lot of this easy to test so I hope to come back to you with how many of these can work with sherlock. Some of the URL's also require a a UUID/Hash/ID on top of the name which also wouldn't work.

In another issue I mentioned the idea of adding support to sherlock to allow adding 'authentication modules' to some websites like LinkedIn so a user can be logged in and check the username instead of just getting redirected to the sign in instantly.

theabbie commented 3 years ago

@darvell Discord doesn't have public profile, we can ignore that. We can bypass LinkedIn authentication by pretending to be googlebot (googlebot can crawl it), The hash problem is legit, I agree.

Can't we have an alternate method to check username? ie. Go to the sign up page, try to create account with given username and check if it's available. That would be foolproof, but slow.

darvell commented 3 years ago

Discord does technically have a way to do usernames, there's some odd ways to do things with queries I believe through private API, but that's not the point.

The methods currently available in sherlock are pretty simplistic, hence the desire to write and implement a proposal to allow the addition of different ways of checking things, being able to override user agents for certain sites, performing certain operations if required, etc.

I'm writing my little script now and hopefully I'll tell you how much of this could be ported in to sherlock as is right now. I hope this isn't coming across as negative, just want to make sure it's noted that sherlock is quite simple in how it works and the ability to do any kind of more advanced checks is hard.

One idea I had about the hash problem is using Google/Bing/etc. to check for the username on the site and see if any URL's show up for it, that way we'd get the hash.

theabbie commented 3 years ago

@darvell Great then, you could also scrape their sitemap if they add profile pages in them, anyways, I find this project highly potential and would love to help in any way possible.

darvell commented 3 years ago

Yeah this project is great, between the issue with 'adult sites' not being wanted as well as some needs that may be wanted by some but not others (e.g. finding related usernames via searches), I find myself curious if it's possible to turn Sherlock in to something greater, I get very excited about the possibilities of what could be done, like taking screenshots if accounts are found, integrating with tools like snscrape and others to archive profiles automatically when found, using public leaked databases to find other usernames the person uses, etc. but that's all off-topic.

Hope to get you an answer to how many of the sites on this list we can use by tomorrow (probably within the hour honestly.)

theabbie commented 3 years ago

@darvell Once this project gets mature enough, you can make it to scarpe profile information from the profiles found. Many people would have kept some public information somewhere and we can use it to further find their profile even if they have different usernames. maybe, use reverse image search from their profile picture. That's a really exciting project.

darvell commented 3 years ago

As an extra note, one of the sites (SoloLearn) where it's just an ID, after knowing the name, a search for https://www.sololearn.com/Profile/ Abhishek Chaudhary was enough to get the profile ID. An experimental fork of Sherlock is sounding more and more interesting.

theabbie commented 3 years ago

@darvell most sites have internal search engines, or just a google search with site: example.com might help.