miyakogi / pyppeteer

Headless chrome/chromium automation library (unofficial port of puppeteer)
Other
3.56k stars 372 forks source link

Set Download Location in Pyppeteer in headless mode #77

Open naqushab opened 6 years ago

naqushab commented 6 years ago

In there is a method to store the downloaded files in headless mode of puppeteer as described in this so answer await page._client.send('Page.setDownloadBehavior', {behavior: 'allow', downloadPath: './myAwesomeDownloadFolder'});

Is there any similar method that exists in pyppeteer?

miyakogi commented 6 years ago

Doesn't the same method work in pyppeteer? That method is using chromium feature, so i guess it should work also in pyppeteer.

EmadMokhtar commented 6 years ago

It is indeed a Chrome feature, it the default download folder. Did you try to get the Response object as binary, save it to a file on disk?

response = await page.goto("http://www.irs.gov/pub/irs-pdf/f1040.pdf")
with open("f1040.pdf", "wb") as new_file
    new_file.write(response)

Note: I'm not sure if this is working, I'm introducing a solution.

tpoulton commented 5 years ago

It's been awhile since this was asked, however I think whats covered in this feature will get you what you need by interfacing with the DevTools Protocol directly: https://github.com/GoogleChrome/puppeteer/pull/1770

sandou78 commented 5 years ago

after some debug, I found this way will work

cdp = await page.target.createCDPSession(); await cdp.send('Page.setDownloadBehavior', { 'behavior': 'allow', 'downloadPath': '/temp/'});

the reason is python has different grammer

you can't use behavior in python dict, you need use 'behavior'

justinr1234 commented 4 years ago

Converted the javascript function to python: https://github.com/puppeteer/puppeteer/issues/299#issuecomment-474435547

It isn't 1-to-1, as the generated random directory name is different, but the same number of characters. I had trouble figuring out how to convert Javascript's Number.toString function to Python.

import os
import random

def base36encode(number, alphabet="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"):
        base36 = ""
        sign = ""

        if number < 0:
            sign = "-"
            number = -number

        if 0 <= number < len(alphabet):
            return sign + alphabet[number]

        while number != 0:
            number, i = divmod(number, len(alphabet))
            base36 = alphabet[i] + base36

        return sign + base36

async def download_file(page, f):
        randNum = random.random()
        intPart = int(str(randNum)[2:])
        base36num = base36encode(intPart)
        downloadPath = f"{os.getcwd()}/download-{base36num}"
        try:
            os.mkdir(downloadPath)
        except OSError as err:
            print(f"Creation of directory {downloadPath} failed: {err}")
        else:
            print(f"Successfully created download directory: {downloadPath}")

        cdp = await page.target.createCDPSession()
        await cdp.send(
            "Page.setDownloadBehavior",
            {"behavior": "allow", "downloadPath": downloadPath},
        )

        await f()

        print("Downloading...")
        fileName = ""
        theList = os.listdir(downloadPath)
        if len(theList) > 0:
            fileName = theList[0]
        while fileName is "" or fileName.endswith(".crdownload"):
            time.sleep(0.100)
            theList = os.listdir(downloadPath)
            if len(theList) > 0:
                fileName = theList[0]

        filePath = os.path.join(downloadPath, fileName)
        print(f"Downloaded file: {filePath}")
        return filePath