Issue with Capybara Gem - Scraping Blocked on Indeed Site

Problem: I'm facing an issue with my Rails application that involves scraping data from different sites using the Capybara gem. Everything works fine for most sites, but I'm encountering a problem specifically with Indeed.

Description: When I attempt to scrape data from Indeed with the headless option set to true, I get blocked. However, when I set the headless option to false, the scraping works fine. Upon inspecting the screenshot generated by @session.save_screenshot, it clearly indicates that I've been blocked.

capybara-202401171752145506410800

Steps to Reproduce:

Set headless: true in browser options. Attempt to scrape data from Indeed. Observe the blocking issue. Expected Behavior: Scraping should work seamlessly with headless mode enabled, just as it does for other sites.

Environment:

Rails Version: 7 Capybara Version: 3.39.2 Nokogiri Version: 1.15.4-x86_64-linux

Additional Information:

Adding a proxy service did not resolve the issue. The problem seems specific to the interaction between Indeed and Capybara with headless mode.

Workaround:

Setting headless: false resolves the blocking issue, but this is not an ideal solution.

Request for Assistance: I'm seeking guidance on potential solutions or workarounds to enable headless scraping for Indeed without being blocked. Any insights or recommendations would be greatly appreciated.

Thank you for your assistance!

teamcapybara / capybara

Issue with Capybara Gem - Scraping Blocked on Indeed Site #2735