mozilla / petitions.mozilla.org

Petitions for Mozilla Advocacy
https://petitions.mozilla.org
Mozilla Public License 2.0
1 stars 9 forks source link

Facebook unable to scrape pages #113

Closed jpetto closed 8 years ago

jpetto commented 8 years ago

While testing the Facebook share button on the newly pushed /stay-secure/thank-you/ page, noticed that no Open Graph information is being shown/populated on the resulting Facebook share page.

Tried to debug the URL using Facebook's Open Graph object debugger, which returns no open graph information about the page at all. The "See exactly what our scraper sees for your URL" link results in a blank page. Tested https://petitions.mozilla.org/data-retention/ as well with the same results. :disappointed:

Sharing is an important part of the Data Privacy Day campaign launching today.

@jbuck @ScottDowne - Any ideas here?

jbuck commented 8 years ago

@alicoding did we ever figure this out?

jbuck commented 8 years ago

@jpetto is it as simple as there is no opengraph data on that page?

alicoding commented 8 years ago

@jbuck no, but for our staging is still seem to be broken and our prod for donate.mozilla.org works perfectly. I think this is something has to do with CDN or something I'm not sure.

jpetto commented 8 years ago

There is definitely open graph data on the pages. I put a static version of the page up yesterday for testing, which Facebook's debugger scrapes successfully.

jbuck commented 8 years ago

I'm going to run an invalidation on the CDN just to be sure it didn't cache anything it shouldn't have...

jbuck commented 8 years ago

Man, I have no frigging clue why it's doing this.

jpetto commented 8 years ago

One curious piece of data from the Facebook debugger is that the response code (for both /data-retention/ and /stay-secure/) is 206 (partial content).

jpetto commented 8 years ago

Doing a little more digging, I see the following header included when running curl -I https://petitions.mozilla.org/stay-secure/:

Accept-Ranges: bytes

Based on a whole 10 minutes of research, this header and a 206 response code go hand-in-hand. Also, it seems this header is generally used for streaming large content (e.g. videos). Is there a specific reason to include this header? Would it be difficult to (possibly temporarily) turn it off?

jpetto commented 8 years ago

Final observation before lunch - donate.mozilla.org (which @alicoding says is fine in this regard) does not appear to have the Accept-Ranges header set.

jbuck commented 8 years ago

@jpetto fascinating! Okay, I'll see if I can figure out why that is.

jpetto commented 8 years ago

@jgmize found a related bug. Might be of some help, but I still would like to see what happens when that Accept-Ranges header is removed.

jpetto commented 8 years ago

Just checking in - any update?

jbuck commented 8 years ago

@jpetto um, I don't know why, but it appears to be working better now: https://developers.facebook.com/tools/debug/og/object/?q=https%3A%2F%2Fpetitions.mozilla.org%2Fstay-secure%2F

Should we add the FB share button back?

jpetto commented 8 years ago

Well, that certainly is curious. :confused: I'd love to know why it's suddenly working...

Anyway, yeah, I'll make a PR to re-add the FB button.

jpetto commented 8 years ago

@jbuck Should we leave this issue open until we know why things weren't working and then suddenly were? Or just chalk it up to Facebook being a black box and close? Your call.

jbuck commented 8 years ago

Facebook is a black box alright :/

jbuck commented 8 years ago

There's a fix in Hapi 13 for this now: https://github.com/hapijs/hapi/commit/0c8109e1b573528bfebea771da6a83d1f5bf7043