Trainline website not scraping

goldenking0412 commented 3 years ago

my scraper is being detected by thetrainline.com

In this link https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/46

I've found chrome debug could be potential solution. So I've implemented like this

options.add_argument('--remote-debugging-port=7102')

Is this wrong?

And I've added so many features but it get success only 10% or 20%

Thanks

goldenking0412 commented 3 years ago

def new_driver proxy_port:, production_headless: false, rotate_useragent: false, dev_proxy: false
    maximized = (rand(15) > rand(8) ? true : false)
    maximized = true
    using_incognito = (rand(30) > rand(20) ? true : false)
    options = Selenium::WebDriver::Chrome::Options.new

    navigator_sample = navigator_data.sample

    options.add_argument('--disable-blink-features=AutomationControlled')
    options.add_argument('--no-sandbox')
    options.add_argument("user-agent=#{navigator_sample[:user_agent]}")
    options.add_argument("--disable-web-security")
    options.add_argument("--disable-xss-auditor")
    options.add_argument('--remote-debugging-port=7102')
    if maximized
      options.add_argument("start-maximized")
    end
    if using_incognito
      options.add_argument('--incognito')
    end
    if production_mode?
      if production_headless
        options.add_argument('--headless')
      end
      options.add_argument("--proxy-server=127.0.0.1:#{proxy_port}")
    else
      if dev_proxy
        options.add_argument("--proxy-server=apisearchredirect.saveatrain.com:#{proxy_port}")
      end
    end

    options.add_option("excludeSwitches", ["enable-automation", "load-extension"])
    options.add_option('useAutomationExtension', false)

    options.add_preference('profile.content_settings.exceptions.clipboard', {
      '*': {'setting': 1}
    })

    driver = Selenium::WebDriver.for :chrome, options: options
    driver.execute_cdp("Page.addScriptToEvaluateOnNewDocument", {
      "source": "
        Object.defineProperty(navigator, 'webdriver', {
          get: () => undefined
        });
        Object.defineProperty(navigator, 'maxTouchPoints', {
          get: () => 0
        });
        Object.defineProperty(navigator, 'languages', {
          get: () => ['en-US', 'en']
        });
        Object.defineProperty(navigator, 'cookieEnabled', {
          get: () => true
        });
        Object.defineProperty(navigator, 'deviceMemory', {
          get: () => #{[2, 4, 8, 12, 16, 24, 32, "undefined"].sample}
        });
        Object.defineProperty(navigator, 'hardwareConcurrency', {
          get: () => #{[1, 2, 4, 8, 16, ""].sample}
        });
        Object.defineProperty(navigator, 'vendor', {
          get: () => #{navigator_sample[:vendor]}
        });
        Object.defineProperty(navigator, 'platform', {
          get: () => #{navigator_sample[:platform]}
        });
        Object.defineProperty(navigator, 'productSub', {
          get: () => #{navigator_sample[:product_sub]}
        })"
    })

    if !maximized
      driver.manage.window.resize_to(1200+rand(740), 900+rand(400))
      driver.manage.window.move_to(rand(300), rand(350))
    end
    driver
  end

Potentially we have everything to bypass trainline And we did with your help bypass botdetect and datadom

But for some reason trainline we don’t

Thanks

goldenking0412 commented 3 years ago

@czoins, Can you check this thread and help me? Thanks

ultrafunkamsterdam commented 3 years ago

Hodl on, hodl on... Whats wrong with using just:

import undetected_chromedriver.v2 as uc
driver = uc.Chrome()
with driver:
    driver.get('https://yourhighlysecuredsite.xyz')

?

goldenking0412 commented 3 years ago

We're writing it in ruby not python So I've checked your code and implemented your logic in our scraper

ultrafunkamsterdam commented 3 years ago

You'd better check v2.py in that case . Oh, and not raising an issue here just to discuss this.

quangpham commented 2 years ago

@goldenking0412 have you got your issue solved? I'm writing a similiar crawler in ruby and I'm hitting the same problem :) Would you mind sharing me some experience making it works in Ruby? Cheers!

ultrafunkamsterdam / undetected-chromedriver

Trainline website not scraping #102