Fixed false positives #2273

sherlock-project / sherlock

Hunt down social media accounts by username across social networks

https://sherlockproject.xyz

MIT License

60.53k stars 6.96k forks source link

Fixed false positives #2273 #2285

Closed rsb-23 closed 1 week ago

rsb-23 commented 2 months ago

Updated user-agent in header
Removed redundant user-agents from data.json
Fixed false positives for username zqxzxzcvj {except few}

ppfeister commented 2 months ago

Removed redundant user-agents from data.json

Did you test those targets for both positive and negatives after removal? Some of those user agents were specifically tailored and the targets were non functional without, such as with YouTube where you need to masquerade as googlebot, or all queries fail.

rsb-23 commented 2 months ago

Removed redundant user-agents from data.json

Did you test those targets for both positive and negatives after removal? Some of those user agents were specifically tailored and the targets were non functional without, such as with YouTube where you need to masquerade as googlebot, or all queries fail.

Yes, All test cases passed. Also, I had tested --site Youtube using mohak_mangal for positive and mohak_mangal2z for negative. I'll test and confirm for Linkedin and Spotify too.

Btw, we must include testcases for each site for both positives and negatives while adding the site.

PS: Can we add a test/function to check 'probable false-positives' using random string made of fjvxqz?

ppfeister commented 2 months ago

Yeah, the unit tests themselves don't check individual targets yet. They just test sections of the code itself. That would require a manual check before/after change. I believe there used to be some form of this, but it broke at some point and was later removed.

I don't think it'd be wise to test every single target as part of the PR/push ci, which would generate a lot of unnecessary traffic, but the original behavior was either when prompted by the developer or via a weekly ci run. It's on my to-do list to re-implement this.

Note that the negative test cases won't be hardcoded as that is too easily broken (and can very easily be intentionally broken by someone, by simply registering the hardcoded names). Rather, they will be generated at runtime based on either the given regexCheck or a simple string, with a few attempts possible for the unlikely chance it hits a real username.

I'll jump on in a bit to review these changes here

rsb-23 commented 2 months ago

I have successfully tested all 3 (Linkedin, spotify and youtube which had unique user-agent) for both positives and negatives using claimed and zqxzxzcvj username.

Yeah, I absolutely agrees with your points. So, I have few approaches to it. (feel free to ignore 😜 )

We can have check.py script for all the cases and checks required by developer for contribution. This can then be run in local as pre-commit hook or manually. Hence no additional traffic is generated and also simplifies developers' task.
For negative testcases, generate 2 usernames from less frequent chars like q, v, etc. and then showing common sites as probable false positives.