Closed rokerkony closed 5 years ago
I am facing similar kind of issue with puppeteer@1.12.2
where its not able to detect the URL moved from HTTP to HTTPS.
When I load on headful chrome the URL loads fine and I see 301 Moved Permanently in the response however in the headless chrome it just throws Navigation Timeout Exceeded: 30000ms exceeded.
this is a dupe of #4030, check out this comment
LT;DR: the request.continue({url})
doesn't redirect to a given URL - instead, it loads data from the given url as if it was served from the original one.
In order to do a real redirect, use request.respond
.
@aslushnikov that is actually what I was already trying but it doesn't help. Please look into a following example:
const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
let isRedirectStatus = false;
page.on('request', (request) => {
if (isRedirectStatus !== false) {
return request.respond({
status: isRedirectStatus,
headers: {
location: 'https://www.erento.com'
},
});
}
return request.continue({url: request.url()});
});
page.on('response', (response) => {
if (response.request().resourceType() === 'document') {
const status = response.status();
if ([301, 302].includes(status)) {
isRedirectStatus = status;
}
}
});
await page.goto('http://www.erento.de');
await browser.close();
})();
it ends with ERR_TOO_MANY_REDIRECTS
and the expected behaviour would be to return the object in .respond()
method (in example the url is hardcoded just for simplicity)
@rokerkony What your code does is redirecting all requests to 'https://www.erento.com' once there's at least one redirect response from server. Given that http://www.erento.de
immediately redirects to HTTPS, all your future requests are redirected to the same URL and you naturally get ERR_TOO_MANY_REDIRECTS
.
Are you trying to redirect just the navigation request? In this case, use the request.isNavigationRequest()
and check for its target URL before issuing a redirect.
@aslushnikov What I'm trying to accomplish is to get the same behavior as in version 1.6.2 and lower, I hope it will be still possible somehow.
We are using pptr for our prerendering and we would love to return a response with 301 and location header when google goes to e.g.: erento.de
. When we would return a content from followed redirects it would be treated as a duplicate content instead of redirect.
I hope it is well demostrated by a first post in this issue where we used fake http status 399 and pptr returned 301 with location header (see test for it).
@rokerkony so what does Browser.get
do under the hood? I'm still not sure what's the end goal here - can you please elaborate more? Maybe I'd be able to suggest something.
Hi all, we are using pptr for prerendering our SPA. One of the URLs which needs to be prerendered is
example.de
which should end with 301 with a header location pointing toexample.com
. We have a working solution below withpptr@1.6.2
but if we update to>=1.7.0
it stopped working because it is returning the fake status code.Steps to reproduce
Tell us about your environment:
What steps will reproduce the problem?
The code working on 1.6.2
Response before update:
Response after update:
The unit test which is not working anymore:
It is important for crawlers to return 301 if necessary though. Is there any way in new pptr versions to not follow redirects but to return the redirect response?