netlify / prerender

Automatically rendering JS-driven pages for crawlers and social sharing
MIT License
98 stars 12 forks source link

Prerender takes 80 seconds #31

Open tommedema opened 4 years ago

tommedema commented 4 years ago

When running our app locally, window.prerenderReady = true is set in about 3-5 seconds.

However, when running the prerender service, it takes about 80 seconds:

2020-06-07T20:25:39.483Z getting https://app.usebubbles.com/fHJhWgxnWdKDHFzT3hiQWj
2020-06-07T20:26:55.145Z S3 GET failed error="MissingRequiredParameter: Missing required key 'Bucket' in params" url="https://app.usebubbles.com/fHJhWgxnWdKDHFzT3hiQWj"
2020-06-07T20:26:59.611Z got 200 in 80128ms for https://app.usebubbles.com/fHJhWgxnWdKDHFzT3hiQWj
2020-06-07T20:26:59.613Z method=GET status=200 url=/https://app.usebubbles.com/fHJhWgxnWdKDHFzT3hiQWj prerender_url=https://app.usebubbles.com/fHJhWgxnWdKDHFzT3hiQWj cache_type=CACHE_MISS timing=80126 referrer="-" user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36" request_id=unknown

This seems to be causing issues where Slack, Facebook messenger, and other services sometimes just give up on Netlify's prerendering service and fail to show the opengraph unfurled preview of our links.

What can we do to debug this?

fool commented 4 years ago

cf https://netlify.zendesk.com/agent/tickets/30964 in the helpdesk

tommedema commented 4 years ago

The problem seems to be that firePluginEvent is waiting for all plugins to handle the request, including the s3 cache, which is not enabled by default on a local install.

When commenting out // server.use(require("./lib/plugins/s3HtmlCache")); in server.js, it goes from 80 seconds to 3 seconds.

However, this makes me wonder why Slack and Facebook messenger often give up on unfurling our links with Netlify's production prerendering service? E.g. try creating a new link using our chrome extension at usebubbles.com (takes only a minute), and paste that link in a slack channel or PM to yourself, it takes at least 30 seconds.

I've created a screen recording of this here

tommedema commented 4 years ago

One potential issue might be this line:

https://github.com/netlify/prerender/blob/master/lib/browsers/chrome.js#L496

In our code we are setting window.prerenderReady = false at startup and then window.prerenderReady = true as soon as we set our open graph meta tags, which is all we need in terms of prerendering.

However line 496 is also waiting for all inflight requests to have finished. This seems redundant because we explicitly defined that we are ready, and shouldn't wait for any pending requests.

Since our service streams video or large images, it makes no sense to wait for all http requests to have finished.

dyelax commented 3 years ago

Hi @tommedema, I'm facing this same issue. Have you found a solution or workaround?

tommedema commented 3 years ago

I worked around it by preventing any in-flight requests myself, but this required extra logic and isn't ideal

I've found Netlify's support to be lacking unfortunately as my diving into the code above has basically been ignored

dyelax commented 3 years ago

Thanks — fyi I also made a post about this in the support forum, so hopefully some sort of solution comes out of that! https://community.netlify.com/t/prerender-setting-window-prerenderready-true-before-all-http-requests-done/21961

I'm also checking out other (paid) prerendering services (prerender.io and prerender.cloud). To see if anyone else supports this. Will let you know if I find a clean solution.

tommedema commented 3 years ago

The real fix is to remove the redundant check for doneloading when window.prerenderReady = true. Only Netlify can make this change to their fork of prerender

benjaminrancourt commented 3 years ago

After facing this same issue, I finally disabled the Prerendering service on Netlify and now, Google Bot is now able to index my website pages correctly. I hope that one day, Netlify team fix the bug and that I will be able to reactivate it again.

image

dyelax commented 3 years ago

For what it's worth, I dug into the prerender server code to figure out which requests were taking so long, and all instagram requests from react-instagram-embed were hanging indefinitely. (for some reason, this only happens during prerendering). I set const IS_PRERENDER = /HeadlessChrome/.test(window.navigator.userAgent); and used that as a check to not render the instagram embeds, which solved my issue.

If you really need to prerender early while requests are ongoing, the https://prerender.cloud service does this with their window.prerendercloudReady property. (I emailed their support and they confirmed it works as desired) You can set them up on Netlify by contacting them with the api token, similarly to prerender.io.

arimus commented 3 years ago

@fool we are also seeing this now in our pre-rendering. I grabbed the netlify prerender fork per the recommendation of support and also started seeing these errors:

2021-06-17T17:02:06.396Z getting https://www.civicfs.com/private-money-lender
2021-06-17T17:03:01.695Z S3 PUT failed error="MissingRequiredParameter: Missing required key 'Bucket' in params" url="https://www.civicfs.com/private-money-lender"
2021-06-17T17:03:01.696Z S3 GET failed error="MissingRequiredParameter: Missing required key 'Bucket' in params" url="https://www.civicfs.com/private-money-lender"
2021-06-17T17:03:07.002Z got 200 in 60606ms for https://www.civicfs.com/private-money-lender
2021-06-17T17:03:07.007Z method=GET status=200 url=/https://www.civicfs.com/private-money-lender prerender_url=https://www.civicfs.com/private-money-lender cache_type=CACHE_MISS timing=60609 referrer="-" user_agent="curl/7.64.1" request_id=unknown
2021-06-17T17:04:22.585Z S3 PUT failed error="MissingRequiredParameter: Missing required key 'Bucket' in params" url="https://www.civicfs.com/private-money-lender"

And indeed, commenting out the s3HtmlCache resolved the issue. Would love to have help resolving this issue.

tommedema commented 3 years ago

The real fix is to remove the redundant check for doneloading when window.prerenderReady = true. Only Netlify can make this change to their fork of prerender

Just want to bring this up again. This is a simple fix to this issue

takacskalman commented 2 years ago

Any updates on this?

tommedema commented 2 years ago

I would not expect any work on this. Even though it's a 1-line change and the investigative work has already been done by myself, Netlify's team does not seem to consider it worth the time as it has been open for 1.5 years.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had activity in 1 year. It will be closed in 7 days if no further activity occurs. Thanks!

tommedema commented 1 year ago

This is still an issue

convenient commented 1 year ago

In my case the issue was debugged as follows

First thing is to enable logging https://github.com/netlify/prerender#logrequests on an environment only you will be using, just to make understanding your logs easier.

This will produce a lot of output for every request made. The logs look like

    2023-06-27T15:41:07.726Z  + 1   https://foobar.com/baz.js
    2023-06-27T15:41:08.063Z  + 2   https://foobar.com/qux.js
    2023-06-27T15:41:09.063Z  - 1   https://foobar.com/baz.js
    2023-06-27T15:41:10.063Z  - 0   https://foobar.com/qux.js

The number going 1 -> 2 -> 1 -> 0 is the total number of requests in flight. You can see we have a + entry for https://foobar.com/baz.js to show it starts, and a - entry for the same url to show it ends. Same for the request to qux.js. When we have each request start marked as completed, it will increment and decrement for each URL meaning the request completed.

https://github.com/netlify/prerender/blob/58eeaaf4eb1a2463721e02a40412083d51ae8dd9/lib/browsers/chrome.js#L232-L233

This is used when deciding that the page has loaded

https://github.com/netlify/prerender/blob/58eeaaf4eb1a2463721e02a40412083d51ae8dd9/lib/browsers/chrome.js#L493-L494

In my case I was firing off an ajax request to a third party, which I know definitely completes and responds okay but for some reason this chrome.js logic isn't tracking that it completed. So my logs looked more like

    2023-06-27T15:41:07.726Z  + 1   https://foobar.com/baz.js
    2023-06-27T15:41:08.063Z  + 2   https://foobar.com/qux.js
    2023-06-27T15:41:08.563Z  + 3   https://example.com/fail.js
    2023-06-27T15:41:09.063Z  - 2   https://foobar.com/baz.js
    2023-06-27T15:41:10.063Z  - 1   https://foobar.com/qux.js

The tool would then just spin, waiting until it hit the page load timeout https://github.com/netlify/prerender/blob/58eeaaf4eb1a2463721e02a40412083d51ae8dd9/lib/server.js#L9

I havent gotten to the bottom of it yet what is actually broken in this case with the request to https://example.com/fail.js and why it's not being tracked as completed, but a workaround I put in was like so

diff --git a/node_modules/prerender/lib/browsers/chrome.js b/node_modules/prerender/lib/browsers/chrome.js
index f4cc212..dbcf2f4 100644
--- a/node_modules/prerender/lib/browsers/chrome.js
+++ b/node_modules/prerender/lib/browsers/chrome.js
@@ -260,6 +260,13 @@ chrome.setUpEvents = async function (tab) {
    });

    Network.requestWillBeSent((params) => {
+        const urlRegex = /example\.com/;
+
+        if (urlRegex.test(params?.request?.url)) {
+           return;
+        }
+
        tab.prerender.numRequestsInFlight++;
        tab.prerender.requests[params.requestId] = params.request.url;
        if (tab.prerender.logRequests || this.options.logRequests) util.log('+', tab.prerender.numRequestsInFlight, params.request.url);

In this case the request is not vital to the page being loaded as it was only some analytics, I'm not actually sure whether modifying like this in requestWillBeSent will prevent this request being sent or simply stop it being tracked by prerender? better than hitting the 20s timeout anyway.

tommedema commented 1 year ago

The real fix is to remove the redundant check for doneloading when window.prerenderReady = true. Only Netlify can make this change to their fork of prerender

See https://github.com/netlify/prerender/blob/master/lib/browsers/chrome.js#L496

It makes no sense for the library to wait for the request to fully load when the client has already indicated that it is ready for prerendering. This is a one line change and has not been acted on for 3 years