microlinkhq / html-get

Get the HTML from any website, using prerendering when necessary.
MIT License
86 stars 13 forks source link

Thorwing TypeError: browser.createIncognitoBrowserContext is not a function; for some URLs #195

Closed athrvk closed 8 months ago

athrvk commented 8 months ago

i am trying fetch html content for url : चंद्रयान-3

the url contains character in language other than english image

which looks like this when copied : https://hi.wikipedia.org/wiki/%E0%A4%9A%E0%A4%82%E0%A4%A6%E0%A5%8D%E0%A4%B0%E0%A4%AF%E0%A4%BE%E0%A4%A8-3

here is my code :

import createBrowserless from 'browserless'
import getHTML from 'html-get'

export var browserlessFactory

// Kill the process when Node.js exit
process.on('exit', () => {
    console.log('Closing browser!')
    browserlessFactory && browserlessFactory.close()
})

function initializeBrowserless() {
    console.log('Creating browserless...')
    browserlessFactory = createBrowserless()
}

const getContent = async url => {
    // Spawn Chromium process once
    if (!browserlessFactory) {
        initializeBrowserless()
    }

    // create a browser context inside Chromium process
    const browserContext = browserlessFactory.createContext()
    const getBrowserless = () => browserContext
    const result = await getHTML(url, { getBrowserless, rewriteUrls: true })
    // close the browser context after it's used
    await getBrowserless((browser) => browser.destroyContext())

    if (!result) {
        throw new Error('Failed to get HTML content')
    }
    if (result.statusCode !== 200) {
        throw new Error(`Failed to get HTML content. Status code: ${result.status}`)
    }

    // browserlessFactory.close()
    return result.html
}

export { getContent }

this is throwing error for the url :

<path>/node_modules/browserless/src/index.js:62
    getBrowser().then(browser => browser.createIncognitoBrowserContext(contextOpts))
                                         ^

TypeError: browser.createIncognitoBrowserContext is not a function
    at <path>/node_modules/browserless/src/index.js:62:42
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

can somebody help me with this, if i am doing smething wrong?

Kikobeats commented 8 months ago

Hello, this is because you need to use puppeteer@21.

I will prepare this library for using puppeteer@22 this week 🙂