hapijs / hapi

The Simple, Secure Framework Developers Trust
https://hapi.dev
Other
14.62k stars 1.34k forks source link

Hapijs server not being scrapped by facebook behind cloudfront #3132

Closed ScottDowne closed 8 years ago

ScottDowne commented 8 years ago

I wonder if anyone has hit this?

I have this https://donate.mozilla.org/en-US/

Which is a hapi server. In this case it's serving a static html file: https://github.com/mozilla/donate.mozilla.org/blob/master/server.js#L352-L358

 server.route([{
      method: 'GET',
      path: '/{params*}',
      handler: {
        directory: {
          path: Path.join(__dirname, 'public')
        }
      }
}

Seems to work fine as a file server. However, when it interacts with cloudfront and facebooks scrapper, something breaks. Not fully understanding what's happening, but what I can piece together is:

The hapi server sends the file contents as Transfer-Encoding: chunked

Cloudfront then has "If the viewer makes a Range GET request and the origin returns Transfer-Encoding: chunked, CloudFront returns the entire object to the viewer instead of the requested range." from http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RangeGETs.html

Facebook's scrapper then chokes on the size of the range not being expected.

You can test that here: https://developers.facebook.com/tools/debug/og/object/

Paste in: http://donate.mozilla.org/en-US/thunderbird/

Then click "fetch new scrape information"

Facebook has provided me with a curl command that simulates what their scrapper does:

curl -G -v --compressed -H "Range: bytes=0-500000" -H "Connection: close" -A "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" "https://donate.mozilla.org/en-US/thunderbird/share/"

It responds with curl: (18) transfer closed with 4317 bytes remaining to read

It also doesn't respond with Transfer-Encoding: chunked

If I curl directly to the server without cloudfront:

curl -G -vv --compressed -H "Range: bytes=0-500000" -H "Connection: close" -A "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" "https://donate-mozilla-org-us-prod.herokuapp.com/en-US/"

I get back Transfer-Encoding: chunked which I then think cloudfront returns another thing to facebook which facebook doesn't expect.

Thoughts? Can I just turn off Transfer-Encoding: chunked and how would I do that with the static directory server?

hueniverse commented 8 years ago

It's probably chunked because it pipes the response vs reading it in full into memory and serving it with a known length. Probably worth moving this issue to the hapijs/inert repo where @kanongil can help.

ScottDowne commented 8 years ago

Cool I'll try there. Thanks. :tada: