"download.default_directory" option doesn't work #47

Closed juanfrilla closed 1 year ago

juanfrilla commented 1 year ago

Hello, I'm trying to download a pdf file with: self.driver.get(pdf_url)

Before that I have this code:

profile = profiles.Windows()
options = ChromeOptions()
current_directory = os.getcwd()
download_directory = f"{current_directory}/temp"
prefs = {
    "download.default_directory": download_directory,
    "download.prompt_for_download": False,  # To auto download the file
    "download.directory_upgrade": True,
    "plugins.always_open_pdf_externally": True,
}  # It will not show PDF directly in chrome}
options.add_experimental_option("prefs", prefs)

mydriver = Chrome(profile, options=options, uc_driver=False)

driver = mydriver.start()

if not os.path.exists(download_directory):

And it does not store the file in the temp folder but in the same path that I have the script , why is that?

kaliiiiiiiiii commented 1 year ago

Thanks for raising this nicely formatted issue:) I'll have a look what's really happening in some time

juanfrilla commented 1 year ago

thanks @kaliiiiiiiiii and I'm also noticing that when I try to download a pdf combining selenium-profiles with scrapy from a page with cloudflare, sometimes downloads the pdfs and sometimes not. I don't know why it happens

kaliiiiiiiiii commented 1 year ago
  1. Do other prefs work?
  2. Is the issue resolved with bare selenium?
juanfrilla commented 1 year ago

I did some research and the issue of not saving the file in the directory is solved with

params = {"behavior": "allow", "downloadPath": download_directory}
mydriver.execute_cdp_cmd("Page.setDownloadBehavior", params)

But the first time I execute it downloads the first pdf (If i combine with scrapy) and the rest of pdfs will not be downloaded, It could be something related with scrapy or some security measures of cloudflare, this second issue.

kaliiiiiiiiii commented 1 year ago

seems great to me => issue resolved? => close?

Note: On my Platform (Windows) I needed to change f"{current_directory}/temp" to f"{current_directory}\\temp"

Mhh might be related to some SSL fingerprinting. You could try the following instead of Scrapy (Python-based requests). It indirectly uses the javascript fetch api, and should have the same fingerprint as the browser directly

# Start Driver
profile = profiles.Windows()
options = ChromeOptions()

mydriver = Chrome(profile, options=options, uc_driver=False)
mydriver.options.extend_arguments(["--disable-web-security"]) # we don't want CORS :|
driver = mydriver.start()

pdf_url = "https://www.orimi.com/pdf-test.pdf"

domain = "/".join(pdf_url.split("/")[:3])
# driver.get(domain) # CORS, instead of "--disable-web-security"

file_as_bytes = driver.profiles.fetch(pdf_url)["content"]

Note: This doesn't work with big files because of driver.execute_async_ascript timeout (I think 60 sec. max.)

juanfrilla commented 1 year ago

Thanks for the reply, I'm going to close it