nfl / react-helmet

A document head manager for React
MIT License
17.38k stars 661 forks source link

Helmet not working with Facebook scraper. #26

Closed sampurcell93 closed 9 years ago

sampurcell93 commented 9 years ago

Hi there, I've just started using Helmet to implement meta tags, so apps like Facebook and Twitter can get some useful information about my page and put it into previews. However, I notice that there is a delay between when I render a page, and when the meta tags/title get set on the page. This seems to be causing Facebook and Twitter to miss the tags, as they are not present at the instant of page load. I'm wondering if I am doing something wrong, or there is a fix I am not aware of.

I am currently using the recommended server-side technique - IE

str = React.renderToString(componentwithhelmet);
Helmet.rewind();
res.send(str)

I'm using express. And I have confirmed that when I hard code in the description (that is, it is helmet-independent), FB picks up the data.

I really like the package, but if it doesn't take care of this kind of thing, I am afraid I'll have to roll my own architecture for tags like these. Any insight would be appreciated!

Thanks, Sam

potench commented 9 years ago

When you "ViewSource" on your app, you should see the og:meta tags in the source. In this case str should contain the og:meta tags; can you check and see what str contains? Does it contain any og:meta data or perhaps the wrong og:meta data?

sampurcell93 commented 9 years ago

It actually does not contain any at render time. It takes a second for the tags to be injected - I know this, because I see the Tab title flicker from the default to the one set by Helmet after a split second.

doctyper commented 9 years ago

@sampurcell93 This may be an implementation issue. We use Helmet in production and Helmet does successfully pre-render data (view source).

In your example, you are calling Helmet.rewind() but are not ingesting its payload. Note the example, the call returns the stringified payload necessary to prerender your data:

React.renderToString(<Handler />);
let head = Helmet.rewind();
// head = { title, meta, link }

Which you can then send to the server (pseudo-code):

res.send(`
<!DOCTYPE html>
<html>
    <head>
        <meta charset="utf-8" />
        ${head.meta}
        ${head.link}
        <title>${head.title}</title>
    </head>
    <body>
    </body>
</html>
`);
PaulieScanlon commented 7 years ago

Hi, i'm also seeing problems with Facebook scraper not picking up the og tags. I have for the moment just hard coded them on my index but my posts use helmet to inject the image and title. Is this an issue with Helmet or React?

site in question is here: pauliescanlon.io it's the posts page after you view a portfolio item that's my current problem.

rus-yurchenko commented 7 years ago

+1

PaulieScanlon commented 7 years ago

@RuslanYurchenko The issue is not with Helmet. Crawlers can only read what’s in the meta tag that’s hard coded. Using helmet to inject this data at run time doesn’t mean it’s crawlable... unless server side rendered.

If like me you’re doing this on the client it won’t work. Shame but never mind ay!

rus-yurchenko commented 7 years ago

@PaulieScanlon yeah, I realized this a little later. Sorry!

lipenco commented 7 years ago

Is there any other solution I can consider to solve FB problem when using react-helmet on front-end.

cjimmy commented 7 years ago

@lipenco For those coming here from Google, it seems there are few options (after much research). This assumes you've built a client-side React app (with Create React App, for example), and social media crawlers (Open Graph, Twitter Cards) can't see your meta data because it's serving the pre-rendered index.html

Convert your app to server-side rendering This is the most obvious solution but the most onerous. You won't be able to use client-side definitions like window in your js. If you're using React Router, you'll have to find a way to mirror the routes between server and client. If you're like me, you might be serverless, and running a server would be a lot more work. On the other hand, your page will likely load faster, and crawlers will see what your users would. This is a non-exhaustive list of tradeoffs.

Use a pre-rendering service Prerender.io, Render-tron, and Prerender.cloud to name a few, give you a way to server-side render when the user-agent is a bot. Some CDNs like Netlify and Roast.io do this for you so you don't have to run your own server. The downside to this is this is yet another service to pay for. EDIT: Netlify is free, and prerendering is one-click, albeit in beta.

Pre-render on your own A couple of packages exist for rendering your React app statically. Graphcool's Prep, React-Snap, React-Snapshot were ones I've found that all essentially run a local server to render the site and download the html files. The files won't be pretty, but if all you're looking for is the <head> generated by React Helmet, this will do.

In the end, I ended up using react-snap to render static files, and it hardly changed my build workflow. This was sufficient for me!

Any others I'm missing?

Edit 07/20/18: I've recently started hosting on Netlify (which has an option to turn on prerendering) to remove react-snap. It was causing an unsightly flash of unstyled content when loading the page.

riccardolardi commented 6 years ago

Be sure to check the Facebook Crawler Docs & the debugger to get more insight on how it crawls the content. Helped me better understand what's going on: https://developers.facebook.com/docs/sharing/webmasters/crawler

ChristiaanScheermeijer commented 5 years ago

In our case, a GTM trigger caused an <iframe> to be inserted directly after the <html> tag. This made the Netlify prerender or Facebook crawler to move the body tag before the iframe. This caused all of our meta tags to be inside the body tag. Apparently, the Facebook crawler will ignore some of them.

This didn't scrape all og tags: 😭

<!DOCTYPE html>
<html lang="en" class="wf-montserrat-n5-inactive wf-montserrat-n7-inactive wf-averiasanslibre-n5-inactive wf-averiasanslibre-n7-inactive wf-sansserif-n4-inactive wf-inactive">
<body>
<iframe height="0" width="0" style="display: none; visibility: hidden;" src=""></iframe>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<style>
...
</style>
<meta property="og:image" content="http://...." data-react-helmet="true">

After removing the trigger that added the iframe, the prerendered HTML was valid again. All og tags were being scraped again. 🚀

<!DOCTYPE html>
<html lang="en" class="wf-montserrat-n5-inactive wf-montserrat-n7-inactive wf-averiasanslibre-n5-inactive wf-averiasanslibre-n7-inactive wf-sansserif-n4-inactive wf-inactive">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<style>
...
</style>
<meta property="og:image" content="http://...." data-react-helmet="true">
ghost commented 5 years ago

In our case, a GTM trigger caused an <iframe> to be inserted directly after the <html> tag. This made the Netlify prerender or Facebook crawler to move the body tag before the iframe. This caused all of our meta tags to be inside the body tag. Apparently, the Facebook crawler will ignore some of them.

This didn't scrape all og tags: 😭

<!DOCTYPE html>
<html lang="en" class="wf-montserrat-n5-inactive wf-montserrat-n7-inactive wf-averiasanslibre-n5-inactive wf-averiasanslibre-n7-inactive wf-sansserif-n4-inactive wf-inactive">
<body>
<iframe height="0" width="0" style="display: none; visibility: hidden;" src=""></iframe>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<style>
...
</style>
<meta property="og:image" content="http://...." data-react-helmet="true">

After removing the trigger that added the iframe, the prerendered HTML was valid again. All og tags were being scraped again. 🚀

<!DOCTYPE html>
<html lang="en" class="wf-montserrat-n5-inactive wf-montserrat-n7-inactive wf-averiasanslibre-n5-inactive wf-averiasanslibre-n7-inactive wf-sansserif-n4-inactive wf-inactive">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<style>
...
</style>
<meta property="og:image" content="http://...." data-react-helmet="true">

@ChristiaanScheermeijer We are experiencing the same issue. Can I ask what the GTM trigger was and if you modified your implementation of it or just completely removed it? I can't see an option where I can just remove our page view events from happening as that is really the only thing being triggered. Any insight would be greatly appreciated.

ChristiaanScheermeijer commented 5 years ago

Hi @smschick,

At first, we've removed the trigger completely. However, since we really needed the trigger to work, we enabled it again and temporarily (until Netlify fixes the bug) removed the <DOCTYPE html> part.

But beware for some unpredictable side-effects while doing so. Especially in older browsers.

In our case, it was a Floodlight trigger. But it could be caused by any trigger who adds an iframe to the page.

sophylee commented 2 years ago

Hi, I can confirm that this is still an issue. Facebook developer tools only sees the following when it crawls our home page:

<!DOCTYPE html>
<html lang="en">
<head>
<script async src="...">
<title>Pledge Ukraine</title>
<script ...>
</head>
<body>
<noscript>You need to enable JavaScript to run this app.</noscript>
<div id="root"></div>
</body>
</html>

Even though opening up inspector in a browser shows all of the meta tags present. Our un-ideal workaround was to hard code our meta og and twitter tags into index.html.