hapijs / inert

Static file and directory handlers for hapi.js
Other
238 stars 49 forks source link

Hapijs directory route not being scrapped by facebook behind cloudfront #60

Closed ScottDowne closed 8 years ago

ScottDowne commented 8 years ago

I initially filed this here: https://github.com/hapijs/hapi/issues/3132 Was directed here.

I wonder if anyone has hit this?

I have this https://donate.mozilla.org/en-US/

Which is a hapi server. In this case it's serving a static html file: https://github.com/mozilla/donate.mozilla.org/blob/master/server.js#L352-L358

 server.route([{
      method: 'GET',
      path: '/{params*}',
      handler: {
        directory: {
          path: Path.join(__dirname, 'public')
        }
      }
}

Seems to work fine as a file server. However, when it interacts with cloudfront and facebooks scrapper, something breaks. Not fully understanding what's happening, but what I can piece together is:

The hapi server sends the file contents as Transfer-Encoding: chunked

Cloudfront then has "If the viewer makes a Range GET request and the origin returns Transfer-Encoding: chunked, CloudFront returns the entire object to the viewer instead of the requested range." from http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RangeGETs.html

Facebook's scrapper then chokes on the size of the range not being expected.

You can test that here: https://developers.facebook.com/tools/debug/og/object/

Paste in: http://donate.mozilla.org/en-US/thunderbird/

Then click "fetch new scrape information"

Facebook has provided me with a curl command that simulates what their scrapper does:

curl -G -v --compressed -H "Range: bytes=0-500000" -H "Connection: close" -A "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" "https://donate.mozilla.org/en-US/thunderbird/share/"

It responds with curl: (18) transfer closed with 4317 bytes remaining to read

It also doesn't respond with Transfer-Encoding: chunked

If I curl directly to the server without cloudfront:

curl -G -vv --compressed -H "Range: bytes=0-500000" -H "Connection: close" -A "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" "https://donate-mozilla-org-us-prod.herokuapp.com/en-US/"

I get back Transfer-Encoding: chunked which I then think cloudfront returns another thing to facebook which facebook doesn't expect.

Thoughts? Can I just turn off Transfer-Encoding: chunked and how would I do that with the static directory server?

kanongil commented 8 years ago

Thanks for the detailed report. I have investigated further, and it appears that you have encountered a bug in how inert handles range requests for compressed responses.

The response to the request should be a plain 200 without any range shenanigans. I will look into a fix.

hueniverse commented 8 years ago

@kanongil anything to do here?

kanongil commented 8 years ago

I have published inert@3.2.1 which contains a fix for range requests on compressible contents. This should fix the issue. Let me know how it goes.

hueniverse commented 8 years ago

No milestone?

ScottDowne commented 8 years ago

Fantastic! :tada: I pulled in the changes today, worked like a charm! Thanks very much for the quick response!

It's not shipped to our prod site yet, but I'll do that tomorrow.

lock[bot] commented 4 years ago

This thread has been automatically locked due to inactivity. Please open a new issue for related bugs or questions following the new issue template instructions.