kaliiiiiiiiii / Selenium-Profiles

undetected Selenium using chromedriver and emulation / device profiles
Other
255 stars 29 forks source link

"download.default_directory" option doesn't work #47

Closed juanfrilla closed 1 year ago

juanfrilla commented 1 year ago

Hello, I'm trying to download a pdf file with: self.driver.get(pdf_url)

Before that I have this code:

profile = profiles.Windows()
options = ChromeOptions()
current_directory = os.getcwd()
download_directory = f"{current_directory}/temp"
prefs = {
    "download.default_directory": download_directory,
    "download.prompt_for_download": False,  # To auto download the file
    "download.directory_upgrade": True,
    "plugins.always_open_pdf_externally": True,
}  # It will not show PDF directly in chrome}
options.add_experimental_option("prefs", prefs)
options.add_argument("--headless")

mydriver = Chrome(profile, options=options, uc_driver=False)

driver = mydriver.start()

if not os.path.exists(download_directory):
    os.makedirs(download_directory)

And it does not store the file in the temp folder but in the same path that I have the script , why is that?

kaliiiiiiiiii commented 1 year ago

Thanks for raising this nicely formatted issue:) I'll have a look what's really happening in some time

juanfrilla commented 1 year ago

thanks @kaliiiiiiiiii and I'm also noticing that when I try to download a pdf combining selenium-profiles with scrapy from a page with cloudflare, sometimes downloads the pdfs and sometimes not. I don't know why it happens

kaliiiiiiiiii commented 1 year ago
  1. Do other prefs work?
  2. Is the issue resolved with bare selenium?
juanfrilla commented 1 year ago

I did some research and the issue of not saving the file in the directory is solved with

params = {"behavior": "allow", "downloadPath": download_directory}
mydriver.execute_cdp_cmd("Page.setDownloadBehavior", params)

But the first time I execute it downloads the first pdf (If i combine with scrapy) and the rest of pdfs will not be downloaded, It could be something related with scrapy or some security measures of cloudflare, this second issue.

kaliiiiiiiiii commented 1 year ago

I did some research and the issue of not saving the file in the directory is solved with

params = {"behavior": "allow", "downloadPath": download_directory}
mydriver.execute_cdp_cmd("Page.setDownloadBehavior", params)

seems great to me => issue resolved? => close?

Note: On my Platform (Windows) I needed to change f"{current_directory}/temp" to f"{current_directory}\\temp"

But the first time I execute it downloads the first pdf (If i combine with scrapy) and the rest of pdfs will not be downloaded, It could be something related with scrapy or some security measures of cloudflare, this second issue.

Mhh might be related to some SSL fingerprinting. You could try the following instead of Scrapy (Python-based requests). It indirectly uses the javascript fetch api, and should have the same fingerprint as the browser directly

# Start Driver
profile = profiles.Windows()
options = ChromeOptions()

mydriver = Chrome(profile, options=options, uc_driver=False)
mydriver.options.extend_arguments(["--disable-web-security"]) # we don't want CORS :|
driver = mydriver.start()

pdf_url = "https://www.orimi.com/pdf-test.pdf"

domain = "/".join(pdf_url.split("/")[:3])
# driver.get(domain) # CORS, instead of "--disable-web-security"

file_as_bytes = driver.profiles.fetch(pdf_url)["content"]

Note: This doesn't work with big files because of driver.execute_async_ascript timeout (I think 60 sec. max.)

juanfrilla commented 1 year ago

Thanks for the reply, I'm going to close it