greengerong / Prerender_asp_mvc

Prerende client for asp.net mvc
http://prerender.io/
86 stars 39 forks source link

Adding support for prerender token, fixing compatibility with later versions of prerender #1

Closed mgoodfellow closed 10 years ago

mgoodfellow commented 10 years ago

Adding support for prerender token, fixing compatibility with later versions of prerender so the status code (and headers) returned from prerender service are correctly put onto the response back to the crawler, and the pipeline is terminated. Also improved crawler detection by using contains rather than exact case insensitive match. Finally, added support for AWS load balancers with use X-Forwarded-Proto headers

greengerong commented 10 years ago

thanks

mgoodfellow commented 9 years ago

@brgrz I believe it is to do with how AWS do their "Elastic Beanstalk" containers - they have rewrite rules before the application is reached to swap out the Application path etc. (because of the load balancers in front), which causes issues in that use case (Which is what I was using when I was making these changes). It also works fine on dedicated instances which I use now, as we never have application paths in the URL.

Feel free to commit a change to fix it for your use case. I would suggest it might be worth making it controllable, seeing as it is really a hack to make it compatible with the strange deployment container options in AWS Elastic Beanstalk, so maybe an option in the prerender ConfigurationSection would be best?

brgrz commented 9 years ago

ok I guess we should discuss this further, I do not know all the use cases, if I add a PR which removes those lines then it won't work for you, so we need a universal solution

mgoodfellow commented 9 years ago

I would suggest a configuration option in PrerenderConfigSection, called something like "StripApplicationName" which maybe has a default value of false (as this would work either way on my dedicated instances) - I believe the stripping is a special case which was only required (as far as I can see) for AWS Elastic Beanstalk containers, so this should only be enabled if needed in specials cases.

brgrz commented 9 years ago

ok, I will add that now

thoop commented 9 years ago

@mgoodfellow is correct, AWS can terminate SSL at the load balancer, so this middleware would get a URL like http://www.example.com even if a crawler accessed it over https. That would cause Prerender.io to request http://www.example.com and get a 301 redirect to the https site. So basically, a crawler would access https://www.example.com and get a 301 redirect to the same url of https://www.example.com. This would happen on any platform that terminates SSL at the load balancer and uses http from there to your server.

So that behavior is recommended and should not be turned off. What problem were you seeing that caused you to need to remove that?

thoop commented 9 years ago

Oh nevermind, I now see that you commented on the following method, not the X-Forwarded-Proto one:

+            if (!string.IsNullOrEmpty(request.ApplicationPath) && request.ApplicationPath != "/")
+            {
+                // http://test.com/MyApp/?_escape_=/somewhere
+
+                url = url.Replace(request.ApplicationPath, string.Empty);
+            }

Yeah, I'm not sure about that one.

mgoodfellow commented 9 years ago

It's to do with how IIS can use application domains - kind of like apache and virtual hosting

I strip the application name from the URL because the AWS version of windows server for elastic beanstalk does URL rewrite trickery and it was to work around that.

My solution will only work with dedicated IIS hosting and hosting on AWS elastic beanstalk containers.

However, the default should be to not strip the application name which would make it work with virtual app hosting and dedicated hosting, but to have the option to enable a "workaround" for the AWS container URL rewriting rules

thoop commented 9 years ago

ah ok :)