internetarchive / brozzler

brozzler - distributed browser-based web crawler
Apache License 2.0
648 stars 96 forks source link

Add more stealth evasions #248

Closed vbanos closed 2 years ago

vbanos commented 2 years ago

Set navigator.platform = 'Win32' instead of the default Linux as we usualy run Brozzler on Linux.

Randomize the navigator.deviceMemory and navigator.hardwareConcurrency to avoid browser fingerprinting.

Define window.Notification which is not defined because we run Chrome with CLI parameter --disable-notifications.

vbanos commented 2 years ago

The window.Notification improvement is very important. Check out 2 screenshots of https://bot.sannysoft.com/ before and after using it. Something is crashing in the browser detection library if it isn't there.

BEFORE: initial

AFTER: improvement

galgeek commented 2 years ago

Thank you, @vbanos!

We'll get set up for a test crawl early next week.