Problem invalidating CloudFront item

bakura10 commented 11 years ago

Hi,

I may be configuring my project badly, but here is the problem: I set up a max age of 86400 seconds (1 day) for all my HTML documents.

When I use s3_website push command, it automatically invalidates the item, but it invalidates, for instance: "/blog/some-key/article/index.html". However, most pages are instead accessed using the URL WITHOUT the index.html. It seems that CloudFront has two copies of the object, so that if I go to "/blog/some-key/article" I'm still being served with the old article, while if I had /article/index.html I am being served with the updated version.

Is there a way to force s3_jekyll to invalidate the URL WITHOUT the /index.html suffix ? Or does it come from a bad configuration of my S3+CloudFront ?

Thanks!

laurilehmijoki commented 11 years ago

Unless my knowledge of Cloudfront is outdated, it should be possible to access only exact resources (e.g. /article/index.html) via the CDN. This means that HTTP GET /article should not work.

The only exception is the root resource, for which you can set a default document in your Cloudfront config (GET / -> /index.html).

laurilehmijoki commented 11 years ago

To confirm, Cloudfront supports default document only for the root resource: http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/DefaultRootObject.html.

bakura10 commented 11 years ago

Hi,

My blog is hosted on CloudFront (michaelgallego.fr) and each page does not need the index.html ;-). I asked in CloudFront support, and each URL is considered as a different resource.

As a consequence, even www.example.com/about/ is considered as a different resource than www.example.com/about (note the leading slash). So as I access my pages without the index.html, invalidating those makes nothing.

I also realized two other things: when activating S3 website, if you visit "www.example.com/about", S3 makes a 302 redirection to www.example.com/about/ (once again, leading slash). In context of CloudFront, this means that CloudFront will also cache the redirection.

So the best way to do it is configured URL (with Jekyll for instance) so that trailing slash is always added at the end to avoid useless redirection AND invalidate the url WITHOUT the index.html.

bakura10 commented 11 years ago

@laurilehmijoki , this is only the case if, when you create your CloudFront distribution, you choose the S3 bucket that Amazon automatically auto-complete you. But this is the wrong way to do. In fact you must copy-paste the website endpoint and not using the thing that Amazon auto-complete. I've made a blog post today about it (http://www.michaelgallego.fr/blog/2013/08/27/static-website-on-s3-cloudfront-and-route-53-the-right-way/).

This way, you also have the nice advantage of being able to use the 404 error page even in CloudFront :).

laurilehmijoki commented 11 years ago

This is interesting information. Thanks for pointers, @bakura10!

bakura10 commented 11 years ago

No problem. Maybe adding an option to s3_website to include (or not) index.html would be awesome :D.

laurilehmijoki commented 11 years ago

A brute-force solution would be to invalidate all the three versions (plain, slashed and index.html'ed).

In this case, if you have the document /article/index.html on your local file system, s3_website push would invalidate three objects on Cloudfront. The objects would be

/article
/article/
/article/index.html

How does this sound to you?

What kind of problems do you see in this solution?

Ps. thanks for pointing out the deficiency in the S3 Cloudfront origin. At the moment the command s3_website cfg apply creates a Cloudfront dist that marks the origin as an S3 bucket.

bakura10 commented 11 years ago

It sounds good to me. The only problem is that some user who wrote lot of articles may reach the 1000 free invalidations. Or if you decide to change the permalibk it will invalidate a lot of things.

But I think it's a sane trade off :).

Envoyé de mon iPhone

Le 28 août 2013 à 06:54, Lauri Lehmijoki notifications@github.com a écrit :

A brute-force solution would be to invalidate all the three versions (plain, slashed and index.html'ed).

In this case, if you have the document /article/index.html on your local file system, s3_website push would invalidate three objects on Cloudfront. The objects would be

/article /article/ /article/index.html How does this sound to you?

What kind of problems do you see in this solution?

Ps. thanks for pointing out the deficiency in the S3 Cloudfront origin. At the moment the command s3_website cfg apply creates a Cloudfront dist that marks the origin as an S3 bucket.

— Reply to this email directly or view it on GitHub.

laurilehmijoki commented 11 years ago

I assume only a small part of the s3_website users are affected by this invalidation issue. If we agree that it is better to serve most users well (invalidate only the /article/index.html resources), we should not resort to the above-mentioned brute-force solution.

An alternative solution would be to add an _s3website.yml setting, where you could express that you want to invalidate all the three possible versions of the document (/article, /article/ and /article/index.html). The option could be cloudfront_brute_force_invalidate: true.

Are you able to come up with a more elegant solution?

Ps. I welcome a pull request on this issue, once we agree on the solution. I'm not very likely to have the time to implement it.

bakura10 commented 11 years ago

There is no need to invalidate the "/article" from my understanding, because S3 returns a 302 redirect to the url with "/".

Maybe we could have an option like "invalidate_root". If true => "/article/", otherwise "/article/index.html" ?

laurilehmijoki commented 11 years ago

Option cloudfront_invalidate_root: true sounds good. It's unambiguous.

Let's add the "cloudfront" word into the option name, because the other Cloudfront-related options are also prefixed by that.

bakura10 commented 11 years ago

Ok. I'm not a Ruby developer, but I'll try to come up with something tomorrow.

laurilehmijoki commented 11 years ago

I was able to come up with a rather simple implementation. It's released in version 1.4.0.

Please let me know if you spot any anomalities.

The commit is here: https://github.com/laurilehmijoki/s3_website/commit/9e0dd1ed6bfb85fb18e562c87af03f03c7d17cd1.

laurilehmijoki commented 11 years ago

I created the issue #30, because it is a part of the solution to the problem that @bakura10 has described in this issue.

bakura10 commented 11 years ago

The fix looks perfect ;-).

Envoyé de mon iPhone

Le 28 août 2013 à 20:31, Lauri Lehmijoki notifications@github.com a écrit :

I was able to come up with a rather simple implementation. It's released in version 1.4.0.

Please let me know if you spot any anomalities.

The commit is here: 9e0dd1e.

— Reply to this email directly or view it on GitHub.

laurilehmijoki / s3_website

Problem invalidating CloudFront item #29