jshemas / openGraphScraper

Node.js scraper service for Open Graph Info and More!
MIT License
643 stars 102 forks source link

Twitter / X.com Open graph tags not working anymore #194

Closed NadirBelhaj closed 9 months ago

NadirBelhaj commented 11 months ago

TL;DR: I'm trying to read the Open Graph tags of tweets but since it was rebranded to X it's not working anymore.

What I did so far:

Step by Step to reproduce the problem:

Run: npm install open-graph-scraper --save

Open example.js and change url to : https://twitter.com/SpaceX/status/1687551185509888000?s=20

Then run: node example.js and It shows the following error:

node:internal/process/promises:289 triggerUncaughtException(err, true / fromPromise /);

I corrected the issue by wrrapping the code inside an async function to use await like this:

(async () => {
    try {
      const data = await ogs(options)
      .then((data) => {
        const { error, result, response } = data;
        console.log('error:', error); // This is returns true or false. True if there was a error. The error it self is inside the results object.
        console.log('result:', result); // This contains all of the Open Graph results
        console.log('response:', response); // This contains the HTML of page
      });
      console.log(data);
    } catch (error) {
      console.error('Error:', error);
    }
  })();

Then when I run I get the following error:

error: 'redirect count exceeded', errorDetails: Error: redirect count exceeded

estebanabaroa commented 10 months ago

I'm also getting issues with twitter, seems like other apps like telegram can fetch twitter thumbnails so not sure why we can't

quentingrchr commented 10 months ago

I encountered a similar issue in a project where I needed to fetch data from Twitter. The problem arose when the fetching process abruptly stopped working, resulting in the error:'redirect count exceeded', errorDetails: Error: redirect count exceeded.

Based on my observations, it appears that Twitter might have recently implemented more stringent policies or rules against web scraping by bots. Although I can't be certain if this is directly related to your issue with the open-graph-scraper, it's worth considering.

In my case, I managed to resolve the problem quite easily. I found that adding a specific header to my fetch requests made a difference:

fetch(url, {
    headers: {
      'User-Agent': 'mycompanyname-bot/1.0',
    },
  })

This change seemed to align with Twitter's potential new restrictions on scraping bots. However, I'd like to mention that I wasn't using the openGraphScraper library, so it's possible that this solution might not directly address your problem. Nonetheless, I speculate that the openGraphScraper library might not yet be updated to account for these new bot scraping regulations that Twitter might have recently implemented.

jshemas commented 10 months ago

Hello,

It looks like that URL is leading to redirect loop which causes fetch to return the redirect count exceeded error. @quentingrchr is right about how major websites like Twitter are trying to block web scraping. Setting the User Agent doesn't seem to solve the problem. If this is use case you need to support you can try using a proxy or doing the request yourself and then pass the html into open-graph-scraper.

jshemas commented 9 months ago

I don't think this is going to be something I can fix. Even if you use got and do something like got('('https://twitter.com/SpaceX/status/1687551185509888000?s=20') it will return a ERR_TOO_MANY_REDIRECTS error.