Closed wizpresso-steve-cy-fan closed 1 year ago
Regarding the preference args it seems to be a separate issue:
"C:\Program Files\Google\Chrome\Application\chrome.exe" --disable-field-trial-config --disable-background-networking --enable-features=NetworkService,NetworkServiceInProcess --disable-background-timer-throttling --disable-backgrounding-occluded-windows --disable-back-forward-cache --disable-breakpad --disable-client-side-phishing-detection --disable-component-extensions-with-background-pages --disable-component-update --no-default-browser-check --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-features=ImprovedCookieControls,LazyFrameLoading,GlobalMediaControls,DestroyProfileOnBrowserClose,MediaRouter,DialMediaRouteProvider,AcceptCHFrame,AutoExpandDetailsElement,CertificateTransparencyComponentUpdater,AvoidUnnecessaryBeforeUnloadCheckSync,Translate --allow-pre-commit-input --disable-hang-monitor --disable-ipc-flooding-protection --disable-popup-blocking --disable-prompt-on-repost --disable-renderer-backgrounding --disable-sync --force-color-profile=srgb --metrics-recording-only --no-first-run --enable-automation --password-store=basic --use-mock-keychain --no-service-autorun --export-tagged-pdf --no-sandbox "--initial-preferences-file=\"C:\Users\SteveFan\AppData\Local\Temp\tmpd3j4weyc.json\"" --user-data-dir=C:\Users\SteveFan\AppData\Local\Temp\playwright_chromiumdev_profile-Aw5szN --remote-debugging-pipe --no-startup-window
Why don't you do just the following?
response = await page.request.get("https://www1.hkexnews.hk/listedco/listconews/gem/2023/0209/2023020900150_c.pdf")
print(response.status)
@mxschmitt This is just an example and I will try to do a download interaction based on button click later. As if I clicked a button to download the file, it also uses goto
behind the scene, so I think both should behave the same, I just want to do a simplification.
So far, this seems to be working:
import asyncio
from playwright.async_api import async_playwright
import json
from anyio import Path
from aiofiles.tempfile import TemporaryDirectory
preference = {
"plugins": {
"always_open_pdf_externally": True,
},
}
async def handle(route):
response = await route.fetch()
if 'content-type' in response.headers and response.headers['content-type'] == 'application/pdf':
response.headers['Content-Disposition'] = 'attachment'
await route.fulfill(response=response, headers=response.headers)
async def main():
async with TemporaryDirectory() as d:
preference_dir = Path(d) / "Default"
await preference_dir.mkdir(777, parents=True, exist_ok=True)
await (preference_dir / "Preferences").write_text(json.dumps(preference))
async with async_playwright() as p:
context = await p.chromium.launch_persistent_context(d, headless=False, accept_downloads=True)
try:
await context.route("*", handle)
page = await context.new_page()
async with page.expect_download() as download_info:
try:
await page.goto("https://www1.hkexnews.hk/listedco/listconews/gem/2023/0209/2023020900150_c.pdf")
except:
download = await download_info.value
print(await download.path())
finally:
await context.close()
asyncio.run(main())
Combining the trick on https://github.com/microsoft/playwright/issues/3509#issuecomment-675441299 and https://stackoverflow.com/a/75201448/3289081
My end goal is to capture the PDF download and send the file stream into stdout/remote pipe.
N.B. Although I can go without making a persistent context to trigger the PDF download if I go headless, it apparently does not behave well in non-headless mode, so the suggestion at https://github.com/microsoft/playwright/issues/3509#issuecomment-1369299639 is not working.
Closing this as it seems like posting on the Python repo would be better.
Context:
Code Snippet
Describe the bug
I want to capture the PDF download so I have tested it by directly accessing to the PDF url, but it seems like it does not work as expected.
Also I cannot set the initial-preferences-file to use my config in non-headless mode. The internal PDF viewer still opens.