jshemas / openGraphScraper

Node.js scraper service for Open Graph Info and More!
MIT License
668 stars 105 forks source link

Constant error when running from a local server #220

Closed netgfx closed 5 months ago

netgfx commented 5 months ago

Describe the bug I'm running a simple script to scrape og tags from my local nodejs server, but I constantly get:

{
  error: true,
  result: {
    success: false,
    requestUrl: 'https://ogp.me/',
    error: 'terminated',
    errorDetails: TypeError: terminated
        at Fetch.onAborted (/home/mike/projects/metascraper/node_modules/undici/lib/web/fetch/index.js:2031:49)
        at Fetch.emit (node:events:513:28)
        at Fetch.terminate (/home/mike/projects/metascraper/node_modules/undici/lib/web/fetch/index.js:93:10)
        at /home/mike/projects/metascraper/node_modules/undici/lib/web/fetch/index.js:503:30
        at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
      [cause]: [TypeError]
    }
  },
  response: undefined,
  html: undefined
}

I have tried many websites with the same result.

The server is running on Ubuntu from WSL (Node 19.0)

To Reproduce Running an express server and this inside an API:

try {
    const options = { url: url };
    ogs(options).then((data) => {
    const { error, html, result, response } = data;
    console.log('error:', error);  // This returns true or false. True if there was an error. The error itself is inside the result object.
    console.log('html:', html); // This contains the HTML of page
    console.log('result:', result); // This contains all of the Open Graph results
    console.log('response:', response); // This contains response from the Fetch API

    if(error){
        console.log(error)
        return res.status(500).json({ error: `Error scraping metadata ,${JSON.stringify(error)}` });
    }
    else {
        res.status(200).json(result);
    }
  }).catch((error) => {
    console.log(error)
    return res.status(500).json({ error: `Error scraping metadata ,${JSON.stringify(error)}` });
  })

Expected behavior Results being returned instead of error

Actual behavior The above error shown

Screenshots

Additional context

jshemas commented 5 months ago

I'm not able to reproduce the issue using node 19.0. Are you able to use a LTS version of node?

Can you try running the code below? This might give you a more clear error.

const { fetch } = require('undici');

const getHtml = async () => {
  try {
    const request = await fetch('https://ogp.me/');
    const html = await request.text();
    console.log('html:', html); 
  } catch (error) {
    console.log('error:', error);
  }
};

getHtml();

It might also be worth trying to use got or axios to make a basic request just to see if there is some kind of network issue on your server.

netgfx commented 5 months ago

The error with fetch code is:

error: TypeError: terminated
    at Fetch.onAborted (/home/mike/projects/metascraper/node_modules/undici/lib/web/fetch/index.js:2031:49)
    at Fetch.emit (node:events:513:28)
    at Fetch.terminate (/home/mike/projects/metascraper/node_modules/undici/lib/web/fetch/index.js:93:10)
    at /home/mike/projects/metascraper/node_modules/undici/lib/web/fetch/index.js:503:30
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  [cause]: TypeError [ERR_INVALID_ARG_TYPE]: The "stream" argument must be an instance of Stream. Received an instance of ReadableStream
      at new NodeError (node:internal/errors:393:5)
      at eos (node:internal/streams/end-of-stream:65:11)
      at fetchFinale (/home/mike/projects/metascraper/node_modules/undici/lib/web/fetch/index.js:1090:5)
      at mainFetch (/home/mike/projects/metascraper/node_modules/undici/lib/web/fetch/index.js:757:5)
      at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
    code: 'ERR_INVALID_ARG_TYPE'
  }
}

Btw I was able to run my previous code successfully when I deployed it to render.com, so there must be something locally that is preventing it from working

jshemas commented 5 months ago

Interesting. I think if you open a issue on https://github.com/nodejs/undici/issues they might be able to help debug your network problems. I'm going to close this issue since this doesn't seem like a problem with OGS.