microlinkhq / browserless

The headless Chrome/Chromium driver on top of Puppeteer.
https://browserless.js.org
MIT License
1.6k stars 79 forks source link

'which' is not recognized as an internal or external command, operable program or batch file. #585

Closed vanarebane closed 3 months ago

vanarebane commented 3 months ago

Prerequisites

Subject of the issue

Getting always this message in the console, whatever I do: 'which' is not recognized as an internal or external command, operable program or batch file. Sample code works anyways

Steps to reproduce

Run the sample code from the readme OS is Windows 10 Enterprise 64 bit.

I get the same in console when I type which. So somewhere in the bazillion dependencies of node_modules, there is some script that tries shell code which for no real need. I've been digging around the code to try to uncomment any line that might cause this, but I still get that message.

Expected behaviour

Not get the console message ever.

Actual behaviour

Getting console message: 'which' is not recognized as an internal or external command, operable program or batch file.

Kikobeats commented 3 months ago

Humm there is nothing related with which in the project: https://github.com/search?q=repo%3Amicrolinkhq%2Fbrowserless+%22which%22&type=code

Can you paste what is the code you are running to make possible to reproduce it?

vanarebane commented 3 months ago

@Kikobeats, no, I'm sorry. I did mention wrong readme. So here is a full story what I did:

  1. Start with a empty folder

  2. Take the exact sample code from the Metascraper readme, that uses browserless

  3. I commented out metascraper-clearbit requirement and added helping console.logs:

//test.js
const getHTML = require('html-get')

/**
 * `browserless` will be passed to `html-get`
 * as driver for getting the rendered HTML.
 */
const browserless = require('browserless')()

const getContent = async url => {
  // create a browser context inside the main Chromium process
  const browserContext = browserless.createContext()
  console.log("Between here")
  const promise = getHTML(url, { getBrowserless: () => browserContext })
  console.log("And here")
  // close browser resources before return the result
  promise.then(() => browserContext).then(browser => browser.destroyContext())
  return promise
}

/**
 * `metascraper` is a collection of tiny packages,
 * so you can just use what you actually need.
 */
const metascraper = require('metascraper')([
  require('metascraper-author')(),
  require('metascraper-date')(),
  require('metascraper-description')(),
  require('metascraper-image')(),
  require('metascraper-logo')(),
  // require('metascraper-clearbit')(),
  require('metascraper-publisher')(),
  require('metascraper-title')(),
  require('metascraper-url')()
])

/**
 * The main logic
 */
getContent('https://microlink.io')
  .then(metascraper)
  .then(metadata => console.log(metadata))
  .then(browserless.close)
  .then(process.exit)
  1. Run command in the folder npm install browserless puppeteer --save

  2. Then add metascraper and html-get to the package.json, for dependencies and run npm i to make sure all is installed. Here's my package.json:

{
  "dependencies": {
    "browserless": "^10.5.1",
    "html-get": "^2.16.7",
    "metascraper": "^5.45.10",
    "metascraper-author": "^5.45.10",
    "metascraper-date": "^5.45.10",
    "metascraper-description": "^5.45.10",
    "metascraper-image": "^5.45.10",
    "metascraper-logo": "^5.45.10",
    "metascraper-publisher": "^5.45.10",
    "metascraper-title": "^5.45.10",
    "metascraper-url": "^5.45.10",
    "puppeteer": "^22.11.0"
  }
}
  1. Then running node test, the full output is
PS C:\code\Test123> node test
Between here
'which' is not recognized as an internal or external command,
operable program or batch file.
And here
{
  author: 'Microlink HQ',
  date: '2024-06-14T11:44:23.000Z',
  description: 'Enter a URL, receive information. Normalize metadata. Get HTML markup. Take a screenshot. Identify tech stack. Generate a PDF. Automate web scraping. Run Lighthouse.',
  image: 'https://cdn.microlink.io/logo/banner.jpeg',
  logo: 'https://cdn.microlink.io/logo/trim.png',
  publisher: 'Microlink',
  title: 'Turns websites into data — Microlink',
  url: 'https://microlink.io/'
}
node:internal/bootstrap/node:123
        validateInteger(value, 'code');
        ^

TypeError [ERR_INVALID_ARG_TYPE]: The "code" argument must be of type number. Received an instance of Object
    at process.set [as exitCode] (node:internal/bootstrap/node:123:9)
    at exit (node:internal/process/per_thread:180:24) {
  code: 'ERR_INVALID_ARG_TYPE'
}

Node.js v20.14.0
Kikobeats commented 3 months ago

I can't reproduce it because I don't have windows, but I the error can come from html-get:

https://github.com/microlinkhq/html-get/blob/fb7844a81a9b60756fcdb17fc538708d3e0f685d/src/index.js#L194

which it's a bit weird since it's wrapper with a try/catch block 🤔.

Can you continue debugging, remove all the stuff that is not necesssary to reproduce the issue?