jshemas / openGraphScraper

Node.js scraper service for Open Graph Info and More!
MIT License
643 stars 102 forks source link

Fixed OGP acquisition failure due to Content Negotiation #204

Closed KentarouTakeda closed 5 months ago

KentarouTakeda commented 5 months ago

Summary

Fixed a case where OGP could not be scraped from URLs that supported Content Negotiation. This has been occurring since version 6 and was not a problem in versions 5 and below.

I guess this is caused by the change "Replace GOT with fetch!" in version 6.

Detail and Solution

The cause is that when performing fetch() to the target URL, the Accept header is not specified, so a response whose Content-Type is not text/htnl may be returned. In this case, the request was successful and no error occurred, but all OGPs that should be returned were empty.

For example, let's take a look at the URL to retrieving the README.md of this repository.

$ curl https://github.com/jshemas/openGraphScraper/blob/master/README.md
{"payload":{"allShortcutsEnabled":false,"fileTree":{ ... }}}

The above example is a request using curl, but if we use fetch() without specifying an HTTP header, json will be returned in the same way and OGP acquisition will fail.

The Content-Type required when obtaining OGP is text/html, so specifying it in the Accept header will solve this problem.

$ curl -H 'Accept: text/html' curl https://github.com/jshemas/openGraphScraper/blob/master/README.md
<!DOCTYPE html>
<html
   lang="en"

   data-color-mode="auto" data-light-theme="light" data-dark-theme="dark"
   data-a11y-animated-images="system" data-a11y-link-underlines="true"
   >
...
</html>

The same modification was made to the execution of fetch().

Note

It seems that the CI test was not passed on the master branch from which I branched. Since the exact same test result (failure) was obtained before and after the pull request was created, I have not fixed anything regarding the test failure in this pull request.

jshemas commented 5 months ago

Hello, thanks for the fix and the write up on the issue. I will update the tests and push out this fix.

jshemas commented 5 months ago

Fix is in open-graph-scraper@6.3.3

KentarouTakeda commented 5 months ago

Thanks!