ualbertalib / jupiter

Jupiter is a University of Alberta Libraries-based initiative to create a sustainable and extensible digital asset management system. This is phase 2 (Digitization).
https://era.library.ualberta.ca/
MIT License
23 stars 10 forks source link

[Spike] Refactor Jupiter Routes/Controllers/Layout to Be Able To Serve Multiple Sites #1707

Closed mbarnett closed 2 years ago

mbarnett commented 4 years ago

Overview

Spike out some ideas for implementing #1684, to see how this approach might work and figure out any pros & cons.

I'm leaning towards serving both sites out of the same underlying webprocess here; rather than trying to make the controllers super-generic and parameterize the models they deal with via config, just having different controllers handle each site and moving shared logic into services. This seems easier overall to test than the alternative – you can test controllers directly without worrying about how the subdomain interacts with routing, etc. With the config route we'd really need to run tests twice, once with each config set.

Basic tasks as I see them (but add or change as needed)

murny commented 4 years ago

What do we want?

We want to allow Jupiter to present different homepages (front-doors), differently themed pages, and have the ability to search from different data.

What are possible solutions to this problem?

With different applications hopefully we can agree this is probably not ideal for applications that are very similar in nature. Discovery and NEOSDiscovery is probably the best example of the pain around this. Both these applications probably share 90% or more of their codebase, with essentially just some different theming and minor content changes. As a result, we end up copying and pasting a lot of the code around and it's a challenge to keep both applications up to date with the latest changes.

Configuration could work. Configuration works really well for theming. But when we start potentially having different pages with different content (e.g: item/theses vs newspapers/maps) it's a bit harder to make this work through a shared configuration. It could work though. However, probably the biggest pain I think would be on the system admin side. Having us host and maintain multiple applications could be quite a challenge. As an example, If we needed to get a security update out, and wanted to get that out simultaneously to all our applications, this could be quite the undertaking with our current setup.

As a result, I think subdomains could be the best solution. We can change the theme and different content via the ingress of a user by subdomain. We can still use the same deployment and codebase, so no major change from that point of view. So this seems like the best option of the three for our current requirements.

Subdomains

Filtering by Host vs Subdomain

There's a couple of ways to handle subdomains in Rails. First is by filtering by host (request.host). Second is by subdomain (request.subdomain). Subdomain is easier as you can use the same code for all environments. With filtering by host you have to configure each environment (probably using secrets.yml or credentials.yml) for your hosts you want to allow ( e.g: era.dev.ca, era.test.ca, era.staging.library.ualberta.ca, era.library.ualberta.ca). As a result, I’d lean to handling this via subdomain as we don’t need to configure for every environment (dev, test, uat, staging, production in our case). How to filter by subdomain? You would simply surround routes in a constraint like so:

constraints(subdomain: 'admin') do
  resources :users, only: [:index, :show] do
    member do
      patch :suspend
      patch :unsuspend
      patch :grant_admin
      patch :revoke_admin
      post :login_as_user
    end
  end
end

This will allow these routes to only work on traffic to admin.domain.com and not expose these routes to any other domain. As a result we can serve different content, theme, data, etc from this subdomain compared to other subdomains which is exactly what we want to achieve.

Development

Getting Subdomains to work in development

Development with subdomains is pretty easy. The big change is changing what url you use to load your application. Instead of going localhost:3000 you would instead now use a proper domain (potentially with a subdomain).

There are a few ways to use real domains in development which I’ll talk about here. The first is a nifty little domain called http://lvh.me. This is a free service that resolves itself along with all subdomains to localhost. There is nothing to install or configure, just works. So if we wanted to test the above admin subdomain, instead of navigating to localhost:3000 we would now just go to http://admin.lvh.me:3000 and everything just works! Another option is adding custom domains in your /etc/hosts.txt file. As an example, you can add the following entries to your /etc/hosts file:

127.0.0.1       demo.com
127.0.0.1       admin.demo.com
127.0.0.1       api.demo.com

So if we wanted to test the above admin subdomain, instead of navigating to localhost:3000 we would now just go to http://admin.demo.com:3000 and our /etc/hosts.txt file will do the hard work and make this work for us. There's a few other options as well. A popular one is Pow (and a few alternatives like Prax and Powder gem). But I think with lvh.me or /etc/hosts.txt is probably all we need.

Whitelisting hosts

Since Rails 6, you will also need to whitelist hosts for development. Rails 6 includes a new middleware named Host Authorization to help prevent DNS rebinding attacks.

By default this feature allows requests from 0.0.0.0, ::, and localhost. There are basically two ways to work around this. The first option is to whitelist the development hostname in config/environments/development.rb.

Rails.application.configure do
  # Whitelist one hostname
  config.hosts << "hostname"
  # Whitelist a test domain. Rails adds \A and \z around
  # your regular expressions.
  config.hosts << /application\.local/
end

Or you can just allow all requests through:

Rails.application.configure do
  config.hosts.clear
end

If using lvh.me we can just whitelist everything on this domain like so:

# whitelist our subdomains as valid hosts
config.hosts << /.*\.lvh\.me/

Test

For integration/controller tests, any subdomain routes will result in 404 errors if the test request does not have a proper subdomain. To get around this, Rails provides a host! helper which can set the proper subdomain for all requests made within a test file.

# Configuring subdomain in Rails integration tests
setup do
  host! 'admin.example.com'
end

For System tests, we should be able to do something similar.

In system tests we can use the host! helper again just like we did before. However, it's a bit different as host! helper in system tests internally sets Capybara.app_host. Capybara.app_host expects a full url this time like so:

  setup do
    host! 'http://admin.example.me'
  end

Note: Testing is a can of worms and I have had some success and lots of randomness that I can't really explain. It basically boils down to the trouble of jumping between subdomains. Which is common as for majority of our tests, we want to login as a user (which means potentially hitting another subdomain) then doing our actual testing (which could be on an entirely different subdomain). I did get some success in integration/controller tests especially if we are making sure we are using url helpers but its very finicky. Testing will require some major investigation and figuring out if we want to pursuit jumping between subdomains. Overall I'm sure a solution is possible, and maybe when we start building this we can find one. However for right now and maybe best practice regardless is I think the rule of thumb is try to keep subdomains self contained and isolated from each other. Each subdomain should maybe have what it requires to be self sufficient. If we move all routes in Jupiter under ERA subdomain then everything works. If we add a new subdomain like Peel, it will then need everything it requires available under its own subdomain. So if it needs authentication then it needs access to its own (maybe authentication routes are exposed to all subdomains, or we just duplicate these routes and share as much code as possible with ERA).

Production

From a development point of view nothing major needs to happen on our end for production. There will probably be some work with our DNS/Apache/etc configuration to allow traffic to be served from these subdomains

Gotchas

Multiple subdomains

If you have multiple levels of subdomains, you need to do a bit more work. Take for example we run the same code on our staging url which may be admin.staging.library.com.

If you look at what our subdomain is for the above staging url, its as follows:

request.subdomain #=> admin.staging

Which means our constraints(subdomain: 'admin') in our routes will no longer work (as we expect admin not admin.staging). How to get around this?

The parsing of the request's subdomain is managed by the config.action_dispatch.tld_length option. By default, this length is 1, which just supports one level of subdomains. In this example we have two level subdomains, so we need to set the value for config.action_dispatch.tld_length to 2.

# config/application.rb
config.action_dispatch.tld_length = Integer(ENV['TLD_LENGTH'] || 1)

We can set it using an environment variable (or maybe better this is configured from secrets/credentials yml) so that we can use the same code in our staging environment as well as in the production environment. With this configuration made our routing setup will now work for admin.staging.library.com

Authentication

By default cookies are set by the browser on the request’s domain. So if we login into our application at era.library.com then the session cookie is set for era.library.com. Which means we will need to login again when we go to admin.library.com. By default user session and other cookies will not be shared across subdomains by default which is not ideal. To fix this, we can set the session cookie on the domain itself so all subdomains can access it. This is accomplished by passing the domain option to the session store settings:

Rails.application.config.session_store :cookie_store, key: "_jupiter_session", domain: :all

By setting domain: :all we tell rails to set the session on the top-level domain (for example library.com) instead of the request host.

(you can also pass a list of domains to the domains option in an array format to support multiple domains)

You will also have to set the tld_length option to tell how rails should parse the top level domain of the domain. So in our case if we want library.ualberta.ca we may need to set this to 3. For most typically applications you would set this value as 2 for demo.com:

Rails.application.config.session_store :cookie_store,
                                       key: "_jupiter_session",
                                       domain: :all,
                                       tld_length: 2

Note: This tld_length option is quite different from config.action_dispatch.tld_length and acts in different ways which can lead to some confusion (there are issues in rails backlog to make this more straightforward). But just be aware here that these are different and will probably not be the same value

URL Helpers with subdomains

URL helpers seem to be working with subdomains.

One caveat is we can no longer can use path helpers like admin_path for subdomains. We will instead have to use the full url helpers like admin_url. If your route file has subdomain constraints then these url helpers will resolve correctly, so admin_url becomes admin.demo.com/admin no matter what domain/subdomain you are on.

Root Routes

Rails has a special root route which is basically the default route of the application. When we have all of our routes under any one of the subdomains, then there can be situations where we don’t have any root route defined at all. Certain gems might depend on the presence of a root route and we need to add checks and balances accordingly.

You also cannot have multiple Root Routes in a Rails application (application will crash). So this puts us in a special place.

So for example this is not allowed as we have two root routes and will error out:

constraints(subdomain: 'admin') do
  root to: 'dashboard#index'

  resources :users, only: [:index, :show] do
    member do
      patch :suspend
      patch :unsuspend
      patch :grant_admin
      patch :revoke_admin
      post :login_as_user
    end
  end
end
root to: 'welcome#index'

How to get around this? There is a simple solution and that is using namespace helpers with an empty path option. If we wrap the routes within the subdomain constraint in a namespace, we are then allowed to have multiple root routes.

Plus we get an interesting bonus which is the fact that everything inside this namespace by default is namespaced when looking up controllers and views. We do this quite a bit with our admin section already. For example, all admin controllers will be expected to live in a controllers/admin folder, and each controller will be namespaced accordingly (Admin::UsersController). This also allows us to share configuration/layouts/themes very easily as you can have all your controllers within the admin folder inherit from BaseController instead of ApplicationController. Which allows us to set a layout in this controller. This layout could inject its own admin css/javascript that is separate from the rest of the application. Lastly this gives a super easy way to establish a hierarchy with views when views look up its view path. If we wanted to have a custom navbar/footer or any other shared partials for the admin subdomain, we can take advantage of this. If everything in Admin namespace is inherited from BaseController we can override the default layout partial view paths by sticking our Admin subdomains versions of navbar/footer within a views/admin/base folder. When application layout attempts to render the navbar, it will look through the view hierarchy and render the views/admin/base/navbar instead of the default views/application/navbar as this view/admin/base/navbar comes first. This gives us lots of flexibility for providing a different theme for each subdomain and helping keep code separate from each other.

Of course you have the option to opt out of this namespace. But given the benefits it could be an easy way to organize our code for each subdomain. So this might be the best route we will want to take when we do subdomains.

So by using a namespace, this now works:

constraints(subdomain: 'admin') do
  namespace :admin, path: '' do

    root to: 'dashboard#index'

    resources :users, only: [:index, :show] do
      member do
        patch :suspend
        patch :unsuspend
        patch :grant_admin
        patch :revoke_admin
        post :login_as_user
      end
    end
  end
end
root to: 'welcome#index'

By setting an empty path, we can continue to serve these routes on the home path (/) and allow the admin root route to take precedence over the generic root route when navigating to admin.library.com

Note: The generic catch all root route should probably redirect to the “default” subdomain as a safe fallback? (in our case this is probably era).

Public folder

By default, static content in the public folder would remain exposed to all subdomains (favicon/images, error pages, robots.txt, etc). This might be mostly okay? If we need to override these, we can add routes/controllers to intercept this traffic and serve it yourself (for example we want Peel to have a separate robots.txt from ERA).

Final Thoughts / Conclusion

Overall as this document hopefully outlines, subdomains for the most part are pretty easy to develop with. It has some trade offs and added complexity but most of these are resolvable. There's a couple gotchas as noted above, but this largely depends on how we want to design our subdomains.

I assume we want to move every route into its own subdomain? So we probably want Peel or any new "frontend" routes into their own subdomains, everything currently in Jupiter under an ERA Subdomain. We could also split these current routes up further such as OAI/API/Admin routes into their own subdomains instead of everything under the ERA subdomain. But this has some major gotchas with testing (which hopefully we can find a solution once we deep dive more into subdomains) as noted above.

I think if we keep each "frontend" under a separate subdomain (like peel.library.ualberta.ca) and everything currently in ERA under its own separate subdomain (like era.library.ualberta.ca) then this should be pretty straight forward. In the end, hopefully this allows us to share as much code as we possibility can between "frontends" (as its still just one code base) and still be able to have our entire app hosted on a single application/process without too much pain/complexity.