Closed adarhef closed 1 year ago
Hello.
Normally I would say this is proxy/headers issue since most big sites try to block scrapers. In this case it looks like there is something wrong with the request being made.
If you use packages GOT
or node-fetch
and pass in just the facebook URL, it will send back a page that looks something like...
But if you use node's fetch API await fetch('https://www.facebook.com/')
I'm guessing one(or more) of the default options undici uses is causing facebook to return a error page.
Even something like the follow leads to an error page:
const userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36';
const headers = new Headers({
'user-agent': userAgent,
});
const request = await fetch('https://www.facebook.com/', { credentials: 'omit', redirect: 'follow', headers });
const html = await request.text();
console.log('html:', html);
Actually, I had one last idea.
const request = await fetch('http://www.facebook.com/', { referrer: 'http://www.facebook.com' });
const html = await request.text();
console.log('html:', html);
Setting the referrer to the same site you are requesting seems to fix the issue.
So to get this working in OGS, you would do the following:
ogs({ url: 'https://www.facebook.com/', fetchOptions: { referrer: 'https://www.facebook.com' } })
Not sure if I want to do this by default in OGS, but this should unblock your current issue.
Actually, I had one last idea.
const request = await fetch('http://www.facebook.com/', { referrer: 'http://www.facebook.com' }); const html = await request.text(); console.log('html:', html);
Setting the referrer to the same site you are requesting seems to fix the issue.
So to get this working in OGS, you would do the following:
ogs({ url: 'https://www.facebook.com/', fetchOptions: { referrer: 'https://www.facebook.com' } })
Not sure if I want to do this by default in OGS, but this should unblock your current issue.
I actually haven't checked the final request that was sent in this case. I had to revert to 5.2.3 and it'll be a while before I can experiment again. Does got contain a referer at all? If so what was it? Maybe it was hardcoded to something. What if I were to set fetch to some other referer? (Like the website I'm actually sort of referring from, instead of Facebook). I imagine other websites might not like the fetch api for similar reasons but I haven't done extensive testing.
Hello, this should be fixed in open-graph-scraper@6.1.0
.
Short answer: It looks like fetch always sets the sec-fetch-mode
header and there doesn't seem to be a way to remove it. Facebook errors out when this header is set and the referrer/origin header is null, so for now I'm going to default the origin header to request url. Users can overwrite this header if needed.
Hello, this should be fixed in
open-graph-scraper@6.1.0
.Short answer: It looks like fetch always sets the
sec-fetch-mode
header and there doesn't seem to be a way to remove it. Facebook errors out when this header is set and the referrer/origin header is null, so for now I'm going to default the origin header to request url. Users can overwrite this header if needed.
Sounds great! Thank you!
Describe the bug Regression from 5.2.3: fetching for
https://facebook.com
on 6.0.1 yields no image. On 5.2.3 it does.To Reproduce Try fetching for the aforementioned link
Expected behavior Expecting to see a non-null
ogImage
array