laurilehmijoki / s3_website

Manage an S3 website: sync, deliver via CloudFront, benefit from advanced S3 website features.
Other
2.25k stars 186 forks source link

301 Redirect from /page to /page/ #233

Open adampetrie opened 8 years ago

adampetrie commented 8 years ago

Im not sure if I've misunderstood something or not but I am attempting to use the redirect directive to create a 301 redirect from the /content version of a url to the /content/ version that s3 will serve when it finds the resource at /content/index.html.

I know that by default s3 will serve a 302 and I'd like to make it a 301 for SEO purposes. Here's a snippet from my config:

redirects:
  /content: /content/

When I push the site the output says that the redirect is created but I cannot find anything confirming the redirect and when I curl the original URL the response is still a 302 redirect.

Am I missing something or perhaps is it not possible to change the default 302 to a 301?

Any input is helpful. Thanks!

derekperkins commented 8 years ago

I've been trying to figure out how to do that and haven't been able to figure it out.

vitaly commented 7 years ago

This is definitely supported by amazon: http://docs.aws.amazon.com/AmazonS3/latest/dev/IndexDocumentSupport.html

when you access /foo it first looks for an object /foo, and only if its not found does it look for /foo/index.html, which if found produces a 302 redirect /foo -> /foo/

iloveip commented 7 years ago

Hi @adampetrie, Have you figured out how to make 301 redirects? I have the same problem. But as the docs say, the root forward slash should be omitted in the requested page:

redirects:
  content: /content/
ghost commented 7 years ago

@iloveip did you see @vitaly comment?

iloveip commented 7 years ago

@jhabdas Yes, sure. But he is talking about 302 redirects, and I need to make 301 redirects for SEO purposes.

ghost commented 7 years ago

Did you use rel=canonical? If so you should be fine. But don't take my word for it. Test.

iloveip commented 7 years ago

I'm trying to make redirects through s3_website.yml file, instead of creating new pages. As far as I understand, it is possible, but for some reason it doesn't seem to work. I created a new issue #280 with my settings.

ghost commented 7 years ago

If you're hosting on S3 the redirect happens automatically for existing pages (if configured). So if you're using canonicalization you should be good without the 301 from an SEO standpoint.

But the only way to know for sure is to A/B test. If you can come back with some solid metrics this issue might get some traction. Otherwise it's really up to the crawler, and crawlers know AWS does this (or at least I would hope so).

ghost commented 7 years ago

Hope that makes sense. It's a stumbling block for sure. I hit it. I've seen others hit it. And so I'm just telling you what I know. And I know DMIAC is the only way to be certain.

iloveip commented 7 years ago

How can I use canonicalization with redirects set in s3_website.yml file? Should I create these pages manually to set rel=canonical? If so, how can I add the proper metadata to them?

ghost commented 7 years ago

Just a sec. I'm digging. To validate my understanding, is it true when I say you're attempting to use the redirects object in the s3 config file to perform site-wide redirects for pretty URLs to ensure they have a trailing / (disregarding how the redirect happens)?

ghost commented 7 years ago

Yep. Just re-reviewed the docs I referenced above (for folders) and looked a level up to see if anything for static websites has changed as AWS recently with regard to how the folder -> folder/index.html redirects occur and it's still a 302.

So if my assumption above is correct, and you're attempting to use folders with an index.html sitting in each one and achieve a 301 you're in the right place, and AWS doesn't handle that (therefore this lib couldn't either as far as I can be certain).

How can I use canonicalization with redirects set in s3_website.yml file?

You don't. You use the redirects in the s3 config for redirects and live with the 302 for folders redirecting to a nested index.html file. Your alternatives would be:

If you add the canonicalization, bots will know where your pages live and should not penalize you. That said, you'll still want to ensure your internal link structure points to the / version of the links to prevent an unnecessary redirect.

Beyond that, and I'm curious too, if you want to know what a 301 would do you could try A/B testing once your organic traffic stabilizes (if it does) and try measuring 301 using some other means.

iloveip commented 7 years ago

So do you mean that AWS can't handle redirects from page.html to /page/, because /page/ has an index.html file in it and it already uses a 302 redirect?

ghost commented 7 years ago

Interesting. So it looks like you've already started using ugly URLs and are attempting to use the redirects object to prettify them. As far as I know the object doesn't support globbing or wildcards so if you're shooting to do this for more than a handful of pages it's going to hurt your deployment speed (redirects seem to be the slowest part of the deployment process).

But since your use case is a little different from the origin case mentioned when this post was opened I'm not entirely certain - though if you're having issue with it my gut tells me it's because of the special way S3 handles folders. But guts can be wrong.

I use redirects on my site to hide certain URL shaping I'm doing, e.g.

redirects:
  gear: /gear/southeast-asia-carry-on-packing-list/

That works fine. But it doesn't have a file extension. So if your scenario with an extension isn't working - and it works with the regular AWS CLI - you've either hit a bug or identified an enhancement opportunity for this library.

Hope that makes sense. 👍 👎

iloveip commented 7 years ago

Actually, I don't know, if I need an extension for the requested page at all. For example, I have only one page /kb/page/. But the search engine some how indexed this page as /kb/page, and now it tells that /kb/page/ is a double. So I want to create a 301 redirect for this page to point to /kb/page/.

iloveip commented 7 years ago

@jhabdas Thank you very much for your help! I've deleted .html extensions for requested pages and now everything works :)

laurilehmijoki commented 7 years ago

@iloveip glad to hear that your problem was solved! Please close this issue if you consider this case closed.

ghost commented 7 years ago

@iloveip pleasure to be at your service. @laurilehmijoki her issue was a little different than the OP. that said this can be closed as in my mind it's functioning as expected.

iloveip commented 7 years ago

@laurilehmijoki I'm not the author, so I can't close it)

ghost commented 7 years ago

@iloveip you might find this interesting: https://moz.com/blog/meta-referrer-tag. the suggestion is even a 301 is thought to cause about a 15% loss in equity of a link. haven't seen any research on what a 302 might do, but if your URLs are well structured they'll always end in a / and anytime someone copies or bookmarks them they'll get the whole link (unless they're using their mouse, in which case they should be shot).

iloveip commented 7 years ago

Hi @jhabdas

Thank you very much for the link! We are right on the point of moving our site from http to https.

nabilfreeman commented 6 years ago

Hi guys - I also encountered this problem when moving my site over to Jekyll.

We already had loads of indexed pages without trailing slashes, so I used AWS Lambda@Edge (linked to CloudFront) to rewrite the URLs to have a trailing slash - WITH A 301 CODE! 🎉

It's quite complicated, but this article is pretty good to start: https://read.acloud.guru/supercharging-a-static-site-with-lambda-edge-da5a1314238b

To be honest, if I were to do this again I probably wouldn't use AWS. I chose it because I'm familiar and also because my company has free credits. But it shouldn't be this hard.

Edit - more info:

ghost commented 6 years ago

@nabilfreeman thanks for sharing. please keep us informed of any changes in organic traffic you do or don't receive as a result of this change so we know whether or not a 301 here is worth the trouble.

my suspicion is the 301 from /someurl to /someurl/ is not going to have a noticeable impact so long as users and crawlers are hitting the correct URLs to begin with, and the tech debt taken on will detract value from your project. i've been wrong before, but data doesn't lie (usually)