mozmeao / infra

Mozilla Marketing Engineering and Operations Infrastructure
https://mozilla.github.io/meao/
Mozilla Public License 2.0
59 stars 12 forks source link

Analysis of current SCL3 Apache config, and possible conversion to Python #180

Closed bookshelfdave closed 7 years ago

bookshelfdave commented 7 years ago

Parent issue

Once we have the SCL3 Apache config, determine:

I'll need more info on the Python libs that we use for similar tasks.

bookshelfdave commented 7 years ago

I spent a day with the existing SCL3 Apache config.

Here are some crude (low level) notes on my investigation: https://gist.github.com/metadave/f7f84d62e9aac10ca1d116e57b40a0f0

Summary

The existing Apache config is complex, running it in Docker and Kubernetes has proven to be extremely difficult. Even running a partial Apache config for redirects and rewrites presents several technical challenges, such as config conversion to support our AWS infrastructure, module compilation against older version of Apache (2.2), and lots of table flipping.

I recommend migrating the important redirects and rewrites to Django/Python, as I believe this path will take less time than using the existing Apache config running in AWS (either running in Kubernetes OR standalone on EC2 instances). We already have a Jenkins multi branch pipeline that pushes our demos out to Kubernetes, so we won't need to bring the SCL3 deployment scripts/tooling over to AWS/Kubernetes.

Web/App tiers

The existing Apache config runs both Apache and Django on the same instance. This makes packaging to run in Docker significantly more difficult, as we'd either have to run Apache and Django/Kuma from a single Docker image (resulting in a compilation nightmare and huge images), or setup named pipes between multiple containers so Apache can connect to Django. Kubernetes Services and Deployments do this for us "for free".

As Kuma is already running in demo mode in Kubernetes, we can take advantage of the new setup and either have Django handle all incoming requests (w/ ELB terminating TLS), or run nginx pods to do some more web lifting for us.

You can see my attempt to compile some of the additional Apache modules in the context of a new Docker image here.

Rewrites and Redirects

I recommend implementing redirects/rewrites in Python and/or nginx to keep things simple.

@pmac is planning to implement some common redirect functionality in Bedrock that we can leverage.

EDIT: @pmac has released this lib: https://github.com/pmac/django-redirect-urls Grepping through the Apache config, these are the rewrites and redirects:

Rewrites:

mozilla/defaults.conf
10:    RewriteEngine On
11:    RewriteCond %{REQUEST_URI} !^/server-status$
12:    RewriteCond %{REQUEST_URI} !^/server-info$
13:    RewriteCond %{REQUEST_URI} !^/apc.php$
14:    RewriteRule .* - [F]

mozilla/domains/developer.cdn.mozilla.net.conf
26:    RewriteEngine on
27:    RewriteRule ^/media/(redesign/)?img(.*) /static/img$2 [L,R=301]

mozilla/domains/developer.mozilla.org.conf
22:    RewriteEngine On
25:    RewriteCond %{SERVER_NAME} ^developer\.mozilla\.com$
26:    RewriteRule (.*) http://developer.mozilla.org$1 [R=301,L]
66:    RewriteRule ^/media/(redesign/)?css/(.*)-min.css$ /static/build/styles/$2.css [L,R=301]
67:    RewriteRule ^/media/(redesign/)?js/(.*)-min.js$ /static/build/js/$2.js [L,R=301]
69:    RewriteRule ^/media/(redesign/)?img(.*) /static/img$2 [L,R=301]
70:    RewriteRule ^/media/(redesign/)?css(.*) /static/styles$2 [L,R=301]
71:    RewriteRule ^/media/(redesign/)?js(.*) /static/js$2 [L,R=301]
79:    RewriteRule ^/Special:UserLogin\??(.*) /index.php?title=Special:UserLogin&$1 [R]
82:    RewriteRule ^/media/(redesign/)?fonts(.*) /static/fonts$2 [L,R=301]
87:    RewriteRule ^(.*)//(.*)//(.*)$ $1_$2_$3 [R=301,L,NC]
88:    RewriteRule ^(.*)//(.*)$ $1_$2 [R=301,L,NC]

Redirects:

conf/httpd.conf
600:# Redirect allows you to tell clients about documents which used to exist in
604:# Redirect permanent /foo http://www.example.com/bar

conf/httpd.conf.rpmnew
590:# Redirect allows you to tell clients about documents which used to exist in
594:# Redirect permanent /foo http://www.example.com/bar

mozilla/domains/developer.cdn.mozilla.net.conf
32:    RedirectMatch 302 /media/uploads/demos/(.*)$ https://developer.mozilla.org/docs/Web/Demos_of_open_web_technologies/

mozilla/domains/developer.mozilla.org.conf
4:    Redirect permanent / https://developer.mozilla.org/
10:    Redirect permanent / https://developer.mozilla.org/
16:    Redirect permanent / https://developer.mozilla.org/
85:    RedirectMatch 302 /media/uploads/demos/(.*)$ https://developer.mozilla.org/docs/Web/Demos_of_open_web_technologies/

Serving static contant

Kuma server several non-(image, css, js) content from the /data directory. We can use a shared EFS mount to distribute this content to each running Kubernetes pod. Since EFS looks just like a standard filesystem, we can organize dev/stage/prod files into an appropriate directory structure.

Additional Apache modules

The existing Apache config uses some additional modules to handle some log rewriting for incoming Cloudflare request addresses, and communications from Apache to Python:

Misc Apache config

Below are some things that we may have to address when moving to the new config. I'm sure there are additional Important Details that need to be extracted from the Apache config as well.

Custom headers:
# Security headers 1297878
Header set X-Content-Type-Options "nosniff"
Header set X-XSS-Protection "1; mode=block"
Header set Strict-Transport-Security "max-age=63072000"
Additional file types:
AddType application/ogg .ogx
AddType audio/ogg .ogg .spx
AddType video/ogg .ogv
AddType application/json .json
AddEncoding x-gzip .jsonz
AddType application/octet-stream .dump
AddType x-java-archive .jar
AddType image/svg+xml .svg
AddType application/x-xpinstall .xpi
AddType video/webm .webm
AddType text/cache-manifest .appcache
Log format:
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
    ErrorLog "|/usr/sbin/rotatelogs -L /var/log/httpd/developer.cdn.mozilla.net/error_log /var/log/httpd/developer.cdn.mozilla.net/error_log_%Y-%m-%d-%H 3600 -0"
    CustomLog "|/usr/sbin/rotatelogs -L /var/log/httpd/developer.cdn.mozilla.net/access_log /var/log/httpd/developer.cdn.mozilla.net/access_%Y-%m-%d-%H 3600 -0" combined
File upload caching:
<Directory /data/www/developer.mozilla.org/kuma/media/uploads>
    AllowOverride All
    AddType application/x-data .data

    # Set far-future Expires headers for uploads. bug 842751
    ExpiresActive On
    ExpiresDefault "access plus 12 hours"
    ExpiresByType  text/css "access plus 12 hours"
    ExpiresByType  text/javascript "access plus 12 hours"
    ExpiresByType  image/png "access plus 12 hours"
    ExpiresByType  image/jpeg "access plus 12 hours"
    ExpiresByType  image/gif "access plus 12 hours"
    ExpiresByType  image/vnd.microsoft.icon "access plus 12 hours"
    ExpiresByType  video/webm "access plus 12 hours"
    ExpiresByType  video/ogg "access plus 12 hours"
    ExpiresByType  video/x-flv "access plus 12 hours"
    ExpiresByType  application/x-shockwave-flash "access plus 12 hours"
    FileETag MTime
</Directory>
bookshelfdave commented 7 years ago

Additional aliases we may have to implement:

conf/httpd.conf.rpmnew:551:Alias /icons/ "/var/www/icons/"
conf/httpd.conf.rpmnew:576:ScriptAlias /cgi-bin/ "/var/www/cgi-bin/"
conf/httpd.conf.rpmnew:855:Alias /error/ "/var/www/error/"
conf/httpd.conf:561:Alias /icons/ "/var/www/icons/"
conf/httpd.conf:586:ScriptAlias /cgi-bin/ "/var/www/cgi-bin/"
conf/httpd.conf:873:Alias /error/ "/var/www/error/"
mozilla/domains/developer.mozilla.org.conf:21:    ServerAlias developer-local developer.mozilla.com
mozilla/domains/developer.mozilla.org.conf:72:    Alias /media /data/www/developer.mozilla.org/kuma/media
mozilla/domains/developer.mozilla.org.conf:73:    Alias /admin-media /data/www/developer.mozilla.org/kuma/vendor/src/django/django/contrib/admin/media
mozilla/domains/developer.mozilla.org.conf:75:    Alias /presentations /data/www/presentations
mozilla/domains/developer.mozilla.org.conf:76:    Alias /samples /data/www/samples
mozilla/domains/developer.mozilla.org.conf:77:    Alias /diagrams /data/www/diagrams
mozilla/domains/developer.mozilla.org.conf:90:    WSGIScriptAlias /mwsgi /data/www/developer.mozilla.org/kuma/wsgi/kuma.wsgi
mozilla/domains/developer.cdn.mozilla.net.conf:3:    ServerAlias developer-origin.cdn.mozilla.net
mozilla/domains/developer.cdn.mozilla.net.conf:28:    Alias /media /data/www/developer.mozilla.org/kuma/media
mozilla/domains/developer.cdn.mozilla.net.conf:29:    Alias /admin-media /data/www/developer.mozilla.org/kuma/vendor/src/django/django/contrib/admin/media
bookshelfdave commented 7 years ago

I spent some time trying to get https://github.com/pmac/django-redirect-urls to work as a replacement for some of the redirects, but I could use ~30 minute (time-boxed!) assist from @escattone @pmac or @jgmize.

escattone commented 7 years ago

Wow, thanks for all of this detailed work @metadave ! I would be happy to assist as much as I am able.

jgmize commented 7 years ago

I'm available tomorrow, will ping you on IRC.

bookshelfdave commented 7 years ago

related: discussion of serving legacy samples from EFS/Kuma here

bookshelfdave commented 7 years ago

excluding the following rewrite rule, which may be leftover from dekiwiki:

RewriteRule ^/Special:UserLogin\??(.*) /index.php?title=Special:UserLogin&$1 [R]

https://developer.mozilla.org/index.php doesn't currently do anything.

https://github.com/mozilla/kumascript/blob/master/macros/NotFound.ejs#L16

bookshelfdave commented 7 years ago

As noted in https://github.com/mozilla/kuma/pull/4215, this rewrite remains unimplemented:

RewriteCond %{SERVER_NAME} ^developer\.mozilla\.com$
RewriteRule (.*) http://developer.mozilla.org$1 [R=301,L]
bookshelfdave commented 7 years ago

TODO:

jwhitlock commented 7 years ago

@metadave - there is also https://github.com/mozilla/kuma/blob/master/etc/apache/all-servers.conf

bookshelfdave commented 7 years ago

thanks, that list looks pretty simple. I'll get to that in the AM.

bookshelfdave commented 7 years ago

Tracked for MDN in Bugzilla

bookshelfdave commented 7 years ago

Do we need to also implement rewrites from https://github.com/mozilla/kuma/blob/master/configs/htaccess ?

jwhitlock commented 7 years ago

:sigh: Looks like it is active in production:

    DocumentRoot "/data/www/developer.mozilla.org/kuma/webroot"

    <Directory /data/www/developer.mozilla.org/kuma/webroot>
        Options +FollowSymLinks
        AllowOverride All
    </Directory>

I'll leave it up to you if you add more to this PR or finish this one and open a second.

bookshelfdave commented 7 years ago

Some of those rules are a bit more complicated, I suppose I'll follow up in a later PR.

bookshelfdave commented 7 years ago

Closing this for now, followup rewrites will occur here