Closed juanfrilla closed 1 year ago
Thanks for raising this nicely formatted issue:) I'll have a look what's really happening in some time
thanks @kaliiiiiiiiii and I'm also noticing that when I try to download a pdf combining selenium-profiles with scrapy from a page with cloudflare, sometimes downloads the pdfs and sometimes not. I don't know why it happens
I did some research and the issue of not saving the file in the directory is solved with
params = {"behavior": "allow", "downloadPath": download_directory}
mydriver.execute_cdp_cmd("Page.setDownloadBehavior", params)
But the first time I execute it downloads the first pdf (If i combine with scrapy) and the rest of pdfs will not be downloaded, It could be something related with scrapy or some security measures of cloudflare, this second issue.
I did some research and the issue of not saving the file in the directory is solved with
params = {"behavior": "allow", "downloadPath": download_directory} mydriver.execute_cdp_cmd("Page.setDownloadBehavior", params)
seems great to me => issue resolved? => close?
Note: On my Platform (Windows) I needed to change f"{current_directory}/temp"
to f"{current_directory}\\temp"
But the first time I execute it downloads the first pdf (If i combine with scrapy) and the rest of pdfs will not be downloaded, It could be something related with scrapy or some security measures of cloudflare, this second issue.
Mhh might be related to some SSL fingerprinting. You could try the following instead of Scrapy (Python-based requests). It indirectly uses the javascript fetch api
, and should have the same fingerprint as the browser directly
# Start Driver
profile = profiles.Windows()
options = ChromeOptions()
mydriver = Chrome(profile, options=options, uc_driver=False)
mydriver.options.extend_arguments(["--disable-web-security"]) # we don't want CORS :|
driver = mydriver.start()
pdf_url = "https://www.orimi.com/pdf-test.pdf"
domain = "/".join(pdf_url.split("/")[:3])
# driver.get(domain) # CORS, instead of "--disable-web-security"
file_as_bytes = driver.profiles.fetch(pdf_url)["content"]
Note: This doesn't work with big files because of driver.execute_async_ascript
timeout (I think 60 sec. max.)
Thanks for the reply, I'm going to close it
Hello, I'm trying to download a pdf file with:
self.driver.get(pdf_url)
Before that I have this code:
And it does not store the file in the temp folder but in the same path that I have the script , why is that?