Open PePinodemrs opened 2 years ago
In short, that's not sure and it's probably a never ending endeavor. It's marked in bold in the documentation: headless is still WIP. Raising issues is needless.
There are many ways to be detected because headless Chrome doesn't behave like regular Chrome and anti-bots companies regularly discover new client-side detection techniques.
I've added some basic evasions here. Fell free to create a pull requests to add new ones and/or discuss those here.
@QIN2DIM, WIP means Work in progress
WIP = Work in Progress. So its a not Finished Feature
In short, that's not sure and it's probably a never ending endeavor. It's marked in bold in the documentation: headless is still WIP. Raising issues is needless.
There are many ways to be detected because headless Chrome doesn't behave like regular Chrome and anti-bots companies regularly discover new client-side detection techniques.
I've added some basic evasions here. Fell free to create a pull requests to add new ones and/or discuss those here.
@QIN2DIM, WIP means Work in progress
I don’t understand what we have to do with your issue about headless mode 😁. And i have another question : How to save internet network with chrome driver ? Because I use proxy but it consumes a lot of data.
I have the same problem trying to automate ETrade API key renewal. ETrade, unlike TDA, rather annoyingly expires API keys daily and requires manual steps in the middle of OAuth protocol to renew them. Fully automated trading software is thus prevented, for no obvious reason since that should be a legitimate use of their API. In any case, I get this kind of exception when my Python program runs:
Exception: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//input[@value='Accept']"} (Session info: headless chrome=100.0.4896.88)
Obviously I've been redirected to a bot-detection page that doesn't have the element I'm looking for (an Accept button for a popup). When headless = False the script works just fine. I have no idea what's being detected or even what the bot detection page looks like.
In short, that's not sure and it's probably a never ending endeavor. It's marked in bold in the documentation: headless is still WIP. Raising issues is needless.
There are many ways to be detected because headless Chrome doesn't behave like regular Chrome and anti-bots companies regularly discover new client-side detection techniques.
I've added some basic evasions here. Fell free to create a pull requests to add new ones and/or discuss those here.
@QIN2DIM, WIP means Work in progress
Please, i don't understand your issues
I don't have any issue. I just stated (as the documentation does) that undetected-chromedriver is not meant to be used headless if you want to stay undetected.
If you want headless working you'll have to inject javascript code to evade detection. It's a cat and mouse game, and there are many detection tricks regularly discovered and they are always obfuscated.
I've pointed to some evasions I've added for my own use. If it doesn't work for you, you'll have to add new ones depending on what the site you try to access actually use to detect you're headless.
I don't have any issue. I just stated (as the documentation does) that undetected-chromedriver is not meant to be used headless if you want to stay undetected.
If you want headless working you'll have to inject javascript code to evade detection. It's a cat and mouse game, and there are many detection tricks regularly discovered and they are always obfuscated.
I've pointed to some evasions I've added for my own use. If it doesn't work for you, you'll have to add new ones depending on what the site you try to access actually use to detect you're headless.
Man you are the best, do you have discord to chat with you ? I'm have try your evasion, but i think the website wich i try to access look plugins length but i find nothing about it . Look waht i do driver.execute_cdp_cmd( "Page.addScriptToEvaluateOnNewDocument", { "source": """ Object.defineProperty(window, 'navigator', { value: new Proxy(navigator, { has: (target, key) => (key === 'webdriver' ? false : key in target), get: (target, key) => key === 'webdriver' ? false : typeof target[key] === 'function' ? target[key].bind(target) : target[key] }) }); """ }, ) driver.execute_cdp_cmd( "Network.setUserAgentOverride", { "userAgent": driver.execute_script( "return navigator.userAgent" ).replace("Headless", "") }, ) driver.execute_cdp_cmd( "Page.addScriptToEvaluateOnNewDocument", { "source": """ Object.defineProperty(navigator, 'maxTouchPoints', { get: () => 1 })""" }, ) driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", { "source": "const newProto = navigator.__proto__;" "delete newProto.webdriver;" "navigator.__proto__ = newProto;" }, ) driver.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', { "source": """ Object.defineProperty(navigator, 'plugins', { get: () => [1, 2, 3, 4, 5] }); """},)
` driver.execute_cdp_cmd( "Page.addScriptToEvaluateOnNewDocument", { "source": """ Object.defineProperty(window, 'navigator', { value: new Proxy(navigator, { has: (target, key) => (key === 'webdriver' ? false : key in target), get: (target, key) => key === 'webdriver' ? false : typeof target[key] === 'function' ? target[key].bind(target) : target[key] }) }); """ }, ) driver.execute_cdp_cmd( "Network.setUserAgentOverride", { "userAgent": driver.execute_script( "return navigator.userAgent" ).replace("Headless", "") }, ) driver.execute_cdp_cmd( "Page.addScriptToEvaluateOnNewDocument", { "source": """ Object.defineProperty(navigator, 'maxTouchPoints', { get: () => 1 })""" }, ) driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", { "source": "const newProto = navigator.proto;" "delete newProto.webdriver;" "navigator.proto = newProto;" }, ) driver.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', { "source": """ Object.defineProperty(navigator, 'plugins', { get: () => [1, 2, 3, 4, 5] });
sorry i don't know how to put code correctly lol
Use Syntax highlighting explained here with python3 keyword instead.
Here's some javascript to thoroughly spoof navigator.plugins
& navigator.mimeTypes
.
Use Syntax highlighting explained here with python3 keyword instead.
Here's some javascript to thoroughly spoof
navigator.plugins
&navigator.mimeTypes
.
I see that, but how to use it in my python code?
you check how I dit it (for some other evasions) here.
okay ty
it continues to not work
driver.execute_cdp_cmd(
"Page.addScriptToEvaluateOnNewDocument",
{
"source": """
Object.defineProperty(window, 'navigator', {
value: new Proxy(navigator, {
has: (target, key) => (key === 'webdriver' ? false : key in target),
get: (target, key) =>
key === 'webdriver' ?
false :
typeof target[key] === 'function' ?
target[key].bind(target) :
target[key]
})
});
"""
},
)
driver.execute_cdp_cmd(
"Network.setUserAgentOverride",
{
"userAgent": driver.execute_script(
"return navigator.userAgent"
).replace("Headless", "")
},
)
driver.execute_cdp_cmd(
"Page.addScriptToEvaluateOnNewDocument",
{
"source": """
Object.defineProperty(navigator, 'maxTouchPoints', {get: () => 1});
Object.defineProperty(navigator.connection, 'rtt', {get: () => 100});
// https://github.com/microlinkhq/browserless/blob/master/packages/goto/src/evasions/chrome-runtime.js
window.chrome = {
app: {
isInstalled: false,
InstallState: {
DISABLED: 'disabled',
INSTALLED: 'installed',
NOT_INSTALLED: 'not_installed'
},
RunningState: {
CANNOT_RUN: 'cannot_run',
READY_TO_RUN: 'ready_to_run',
RUNNING: 'running'
}
},
runtime: {
OnInstalledReason: {
CHROME_UPDATE: 'chrome_update',
INSTALL: 'install',
SHARED_MODULE_UPDATE: 'shared_module_update',
UPDATE: 'update'
},
OnRestartRequiredReason: {
APP_UPDATE: 'app_update',
OS_UPDATE: 'os_update',
PERIODIC: 'periodic'
},
PlatformArch: {
ARM: 'arm',
ARM64: 'arm64',
MIPS: 'mips',
MIPS64: 'mips64',
X86_32: 'x86-32',
X86_64: 'x86-64'
},
PlatformNaclArch: {
ARM: 'arm',
MIPS: 'mips',
MIPS64: 'mips64',
X86_32: 'x86-32',
X86_64: 'x86-64'
},
PlatformOs: {
ANDROID: 'android',
CROS: 'cros',
LINUX: 'linux',
MAC: 'mac',
OPENBSD: 'openbsd',
WIN: 'win'
},
RequestUpdateCheckStatus: {
NO_UPDATE: 'no_update',
THROTTLED: 'throttled',
UPDATE_AVAILABLE: 'update_available'
}
}
}
// https://github.com/microlinkhq/browserless/blob/master/packages/goto/src/evasions/navigator-permissions.js
if (!window.Notification) {
window.Notification = {
permission: 'denied'
}
}
const originalQuery = window.navigator.permissions.query
window.navigator.permissions.__proto__.query = parameters =>
parameters.name === 'notifications'
? Promise.resolve({ state: window.Notification.permission })
: originalQuery(parameters)
const oldCall = Function.prototype.call
function call() {
return oldCall.apply(this, arguments)
}
Function.prototype.call = call
const nativeToStringFunctionString = Error.toString().replace(/Error/g, 'toString')
const oldToString = Function.prototype.toString
function functionToString() {
if (this === window.navigator.permissions.query) {
return 'function query() { [native code] }'
}
if (this === functionToString) {
return nativeToStringFunctionString
}
return oldCall.call(oldToString, this)
}
// eslint-disable-next-line
Function.prototype.toString = functionToString
"""
},
)
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument",
{
"source": """
const fns = {};
fns.generatePluginArray = (utils, fns) => pluginsData => {
return fns.generateMagicArray(utils, fns)(
pluginsData,
PluginArray.prototype,
Plugin.prototype,
'name'
)
}
fns.generateFunctionMocks = utils => (
proto,
itemMainProp,
dataArray
) => ({
/** Returns the MimeType object with the specified index. */
item: utils.createProxy(proto.item, {
apply(target, ctx, args) {
if (!args.length) {
throw new TypeError(
`Failed to execute 'item' on '${proto[Symbol.toStringTag]
}': 1 argument required, but only 0 present.`
)
}
// Special behavior alert:
// - Vanilla tries to cast strings to Numbers (only integers!) and use them as property index lookup
// - If anything else than an integer (including as string) is provided it will return the first entry
const isInteger = args[0] && Number.isInteger(Number(args[0])) // Cast potential string to number first, then check for integer
// Note: Vanilla never returns `undefined`
return (isInteger ? dataArray[Number(args[0])] : dataArray[0]) || null
}
}),
/** Returns the MimeType object with the specified name. */
namedItem: utils.createProxy(proto.namedItem, {
apply(target, ctx, args) {
if (!args.length) {
throw new TypeError(
`Failed to execute 'namedItem' on '${proto[Symbol.toStringTag]
}': 1 argument required, but only 0 present.`
)
}
return dataArray.find(mt => mt[itemMainProp] === args[0]) || null // Not `undefined`!
}
}),
/** Does nothing and shall return nothing */
refresh: proto.refresh
? utils.createProxy(proto.refresh, {
apply(target, ctx, args) {
return undefined
}
})
: undefined
})
fns.generateMagicArray = (utils, fns) =>
function (
dataArray = [],
proto = MimeTypeArray.prototype,
itemProto = MimeType.prototype,
itemMainProp = 'type'
) {
// Quick helper to set props with the same descriptors vanilla is using
const defineProp = (obj, prop, value) =>
Object.defineProperty(obj, prop, {
value,
writable: false,
enumerable: false, // Important for mimeTypes & plugins: `JSON.stringify(navigator.mimeTypes)`
configurable: false
})
// Loop over our fake data and construct items
const makeItem = data => {
const item = {}
for (const prop of Object.keys(data)) {
if (prop.startsWith('__')) {
continue
}
defineProp(item, prop, data[prop])
}
// We need to spoof a specific `MimeType` or `Plugin` object
return Object.create(itemProto, Object.getOwnPropertyDescriptors(item))
}
const magicArray = []
// Loop through our fake data and use that to create convincing entities
dataArray.forEach(data => {
magicArray.push(makeItem(data))
})
// Add direct property access based on types (e.g. `obj['application/pdf']`) afterwards
magicArray.forEach(entry => {
defineProp(magicArray, entry[itemMainProp], entry)
})
// This is the best way to fake the type to make sure this is false: `Array.isArray(navigator.mimeTypes)`
const magicArrayObj = Object.create(proto, {
...Object.getOwnPropertyDescriptors(magicArray),
// There's one ugly quirk we unfortunately need to take care of:
// The `MimeTypeArray` prototype has an enumerable `length` property,
// but headful Chrome will still skip it when running `Object.getOwnPropertyNames(navigator.mimeTypes)`.
// To strip it we need to make it first `configurable` and can then overlay a Proxy with an `ownKeys` trap.
length: {
value: magicArray.length,
writable: false,
enumerable: false,
configurable: true // Important to be able to use the ownKeys trap in a Proxy to strip `length`
}
})
// Generate our functional function mocks :-)
const functionMocks = fns.generateFunctionMocks(utils)(
proto,
itemMainProp,
magicArray
)
// We need to overlay our custom object with a JS Proxy
const magicArrayObjProxy = new Proxy(magicArrayObj, {
get(target, key = '') {
// Redirect function calls to our custom proxied versions mocking the vanilla behavior
if (key === 'item') {
return functionMocks.item
}
if (key === 'namedItem') {
return functionMocks.namedItem
}
if (proto === PluginArray.prototype && key === 'refresh') {
return functionMocks.refresh
}
// Everything else can pass through as normal
return utils.cache.Reflect.get(...arguments)
},
ownKeys(target) {
// There are a couple of quirks where the original property demonstrates "magical" behavior that makes no sense
// This can be witnessed when calling `Object.getOwnPropertyNames(navigator.mimeTypes)` and the absense of `length`
// My guess is that it has to do with the recent change of not allowing data enumeration and this being implemented weirdly
// For that reason we just completely fake the available property names based on our data to match what regular Chrome is doing
// Specific issues when not patching this: `length` property is available, direct `types` props (e.g. `obj['application/pdf']`) are missing
const keys = []
const typeProps = magicArray.map(mt => mt[itemMainProp])
typeProps.forEach((_, i) => keys.push(`${i}`))
typeProps.forEach(propName => keys.push(propName))
return keys
}
})
return magicArrayObjProxy
}
fns.generateMimeTypeArray = (utils, fns) => mimeTypesData => {
return fns.generateMagicArray(utils, fns)(
mimeTypesData,
MimeTypeArray.prototype,
MimeType.prototype,
'type'
)
}
const data = {
"mimeTypes": [
{
"type": "application/pdf",
"suffixes": "pdf",
"description": "",
"__pluginName": "Chrome PDF Viewer"
},
{
"type": "application/x-google-chrome-pdf",
"suffixes": "pdf",
"description": "Portable Document Format",
"__pluginName": "Chrome PDF Plugin"
},
{
"type": "application/x-nacl",
"suffixes": "",
"description": "Native Client Executable",
"__pluginName": "Native Client"
},
{
"type": "application/x-pnacl",
"suffixes": "",
"description": "Portable Native Client Executable",
"__pluginName": "Native Client"
}
],
"plugins": [
{
"name": "Chrome PDF Plugin",
"filename": "internal-pdf-viewer",
"description": "Portable Document Format",
"__mimeTypes": ["application/x-google-chrome-pdf"]
},
{
"name": "Chrome PDF Viewer",
"filename": "mhjfbmdgcfjbbpaeojofohoefgiehjai",
"description": "",
"__mimeTypes": ["application/pdf"]
},
{
"name": "Native Client",
"filename": "internal-nacl-plugin",
"description": "",
"__mimeTypes": ["application/x-nacl", "application/x-pnacl"]
}
]
};
// That means we're running headful
const hasPlugins = 'plugins' in navigator && navigator.plugins.length
if (hasPlugins) {
return // nothing to do here
}
const mimeTypes = fns.generateMimeTypeArray(utils, fns)(data.mimeTypes)
const plugins = fns.generatePluginArray(utils, fns)(data.plugins)
// Plugin and MimeType cross-reference each other, let's do that now
// Note: We're looping through `data.plugins` here, not the generated `plugins`
for (const pluginData of data.plugins) {
pluginData.__mimeTypes.forEach((type, index) => {
plugins[pluginData.name][index] = mimeTypes[type]
plugins[type] = mimeTypes[type]
Object.defineProperty(mimeTypes[type], 'enabledPlugins', {
value: JSON.parse(JSON.stringify(plugins[pluginData.name])),
writable: false,
enumerable: false, // Important: `JSON.stringify(navigator.plugins)`
configurable: false
})
})
}
const patchNavigator = (name, value) =>
utils.replaceProperty(Object.getPrototypeOf(navigator), name, {
get() {
return value
}
})
patchNavigator('mimeTypes', mimeTypes)
patchNavigator('plugins', plugins)
"""})```On https://bot.sannysoft.com/ it's saying to me that i have no plugins
You're right it fails (I haven't read the JS code)... can't help you more.
You're right it fails (I haven't read the JS code)... can't help you more.
I just tested, everything is good on https://bot.sannysoft.com/ and it still doesn't work. I do not understand what they are based on, but I try on the site of zalando, you have an idea?
No and that's exactly the issue, anti-bot protections are obfuscated and there are quite a few and best one might not be public... pupeteer-extra-plugin-stealth is supposed to have the last updated ones, till they don't...
Non et c'est exactement le problème, les protections anti-bot sont obfusquées et il y en a pas mal et la meilleure n'est peut-être pas publique... pupeteer-extra-plugin-stealth est censé avoir les dernières mises à jour, jusqu'à ce qu'elles ne le fassent pas ...
it does not work too
At worst there is no alternative? For example a tiny window to not consume too much of the graphics card. At least something that allows chrome to consume less power and network?
A tiny window would make a lot clickable element inaccessible
so nothing but the headless mode?
I saw that in headless mode there were ram leaks, however I quit well with driver.close() followed by driver.quit()
If you are running in a Linux environment with X server, you can use a virtual display as a workaround. https://github.com/ponty/PyVirtualDisplay https://stackoverflow.com/questions/6183276/how-do-i-run-selenium-in-xvfb
I don't have any issue. I just stated (as the documentation does) that undetected-chromedriver is not meant to be used headless if you want to stay undetected. If you want headless working you'll have to inject javascript code to evade detection. It's a cat and mouse game, and there are many detection tricks regularly discovered and they are always obfuscated. I've pointed to some evasions I've added for my own use. If it doesn't work for you, you'll have to add new ones depending on what the site you try to access actually use to detect you're headless.
Man you are the best, do you have discord to chat with you ? I'm have try your evasion, but i think the website wich i try to access look plugins length but i find nothing about it . Look waht i do
driver.execute_cdp_cmd( "Page.addScriptToEvaluateOnNewDocument", { "source": """ Object.defineProperty(window, 'navigator', { value: new Proxy(navigator, { has: (target, key) => (key === 'webdriver' ? false : key in target), get: (target, key) => key === 'webdriver' ? false : typeof target[key] === 'function' ? target[key].bind(target) : target[key] }) }); """ }, ) driver.execute_cdp_cmd( "Network.setUserAgentOverride", { "userAgent": driver.execute_script( "return navigator.userAgent" ).replace("Headless", "") }, ) driver.execute_cdp_cmd( "Page.addScriptToEvaluateOnNewDocument", { "source": """ Object.defineProperty(navigator, 'maxTouchPoints', { get: () => 1 })""" }, ) driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", { "source": "const newProto = navigator.__proto__;" "delete newProto.webdriver;" "navigator.__proto__ = newProto;" }, ) driver.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', { "source": """ Object.defineProperty(navigator, 'plugins', { get: () => [1, 2, 3, 4, 5] }); """},)
This just work for me Thanks
Hello, i've juste tried the headless mode with : driver = uc.Chrome(headless = True)
But the driver is detected. But, the headless mode is much less energy consuming than the normal one, is there a way to run chrome normally and lower its power consumption considerably?