katalyst / koi

Administration tools for Rails applications
https://github.com/katalyst/koi
MIT License
8 stars 4 forks source link

Alternate hosting v2 #341

Closed jsidoryn closed 1 year ago

jsidoryn commented 7 years ago

Summary

It may be worth exploring alternatives to shared hosting with EngineYard for small clients and single instance hosting at EngineYard for medium clients.

This is also the continuation of the following RFC started by Peter Alternative hosting · Issue #308 · katalyst/koi · GitHub

Motivation

We have been with EY for 5+ years now and it is prudent to review whether this is the best setup. There have been problems with mismanagement in the past that has been cleaned up by hard work from Peter and Heath.

The foundational reason for going with EY is the belief that by going through a PaaS we spend less time on dev ops (which is not our expertise) and spend more time on building Rails projects (which is our expertise).

This only makes sense if going with a PaaS actually does lead to Katalyst making more money overall or significantly reducing risk. After discussing this with both Peter and Bill it seems that there is still significant time investment in working with EY and our dev ops may not be significantly reduced as well as other pros and cons.

First, some background to provide some context to the decision making process. This is primarily provided so we are coming from the same place in this discussion. Please ask or challenge me on any of this if it doesn’t make sense or you feel it is incorrect.

This first section discusses some things that I don’t believe should be considerations in making the decision to move away from shared hosting with EY. I’m not saying that these may not be good side effects, just that they shouldn’t be drivers in the decision making process.

Shared resources across sites

With multiple sites on a single instance, if one takes up more resources it affects the others. This doesn’t matter as it hasn’t been a limiting problem in the past 10? years of doing this and also we can get around it with better setting customers expectations around this price point. By problem, I don’t mean that it doesn’t happen, just that it hasn’t been enough of a problem from our customers perspective.

Absolute cost from hosting company

The cost of hosting is much wider than just the absolute cost that we are charged. There are many indirect costs that need to be taken into account to arrive at the true cost of a hosting service. The main ones are the risk of having our own setup and therefore alone in solving our problems, the extra expertise needed and the opportunity cost we give up by spending time here rather than elsewhere.

Decreasing cost to clients

Our strength isn’t with what we provide for small clients and providing a great hosting service for them isn’t a priority.

Too many sites on an instance

What we have now has too many on a single server but we could plan a much better execution. EY have indicated that this isn’t optimal but that other customers take this approach.

Below is a list of the areas that I believe should be seriously considered when reviewing the EY platform

Locked to a single version of Ruby

For each instance, we only get to choose a single version of Ruby. This has the side effect of creating consistency but it seems to be a big hindrance. If a single client on a shared server wanted to pay to upgrade to a later version of Koi, then we couldn’t do this without updating them all or creating a new instance and moving it to that.

Quirks and ongoing issues

There seems to be a number of issues like deploying a new app, IP’s not re-attaching themselves when rebooting an app and ongoing Redis issues that don’t seem to be adequately resolved. We have work around's but they were likely time consuming to work out in the first place and should be better.

Opaque system

This one seems to be the crux of much of the frustration. On one hand, any PaaS has its own ways of doing things and part of our job is to learn it. This can be frustrating as we may already understand the way to do it if we had complete control and it’s redundant learning/knowledge as it’s only useful in this particular context.

That being said, the value in this is if it works well. From what I’ve seen their support has mostly been excellent however we are the first to investigate a problem anyway and our ability to troubleshoot when problems do inevitably arise is difficult as there’s a level that we just can’t know anything about.

This is both frustrating and requires an over reliance on a 3rd party that may limit our ability to address an issue in a timely manner.

Better hosting offering for single instance clients

While a great offering for shared clients which are generally lower value is not important, being able to offer a better hosting offering for single instance clients is of benefit. Clients such as Oliveri, Lifecare, Medehealth, Southern Cross Care, Minda, Guide Dogs, Beaus, Combat Sports are good clients for us and we can offer them a much better hosting service if we had an EngineYard alternative such as Digital Ocean or Vultr for lower cost hosting.

Detailed design

Based on the above I think there’s more than enough reasons to investigate alternative hosting options. I think there’s limited value in getting more data or further discussion and it’s best to test it out on a few sites and evaluate the benefits. Peter has also already gone through this process building SALC on a Digital Ocean box and the experience seemed to be pretty positive.

So I think we should start to pull sites off the shared environment. Maybe choose 3-5 different ones and choose a time frame to setup and migrate and a time to evaluate.

I would expect that a separate RFC should be created to better define a more detailed plan to move forward.

Drawbacks

Here is a couple of the main drawbacks of going down this path.

Every server may have a different setup

One unintended side effect of having a shared server is that it keeps many things reasonably consistent. With each server being separate, it’s much easier to create custom configurations. As we have more staff or staff change over time these problems can magnify.

This could be solved primarily through a process. I’m sure you already have ideas about how we can solve this and in addition to these, one may be to peg things against versions of Koi.

Increased dev ops knowledge

We’ll be on our own a lot more and will need to work things out ourselves. This includes setting up the hosting environment, managing deployment and maintaining. The bet here is that we likely know a reasonable amount of this already and what we don’t know we can figure out. There’s no one to help us if and when things go wrong, but on the other hand, we’ll have full knowledge of the system and will have more power to fix things ourselves.

There’s nothing really to say here except that it’s a thing, but we acknowledge that the benefits may be greater than the costs. We’ll just need to keep in mind that we document things better and so when new people start they can quickly be productive in this space.

Time spent researching, testing, integrating and documenting the new setup

This is largely unknown and the responsibility that was on EY is now on us and this will likely take a reasonable amount of time to successfully implement.

Multiple hosting options

As we will likely still have larger sites that require multiple servers hosted with EY (such as the Fringe) we’ll still need knowledge and understanding of the PaaS, however, we’ll also have a second system to manage and understand.

Alternatives

Use different PaaS

A different PaaS doesn’t really solve the problems although the experience may be better. It’s worth noting that ThoughtBot host everything on Heroku and so push off all of their dev ops. Heroku are expensive and it is likely it will be more expensive than EY.

External DevOps team

Another approach is to outsource the DevOps to an external team such as ReInteractive (reinteractive OpsCare | 24/7 Ruby on Rails Operations). This is a strong approach, however, most are set up for managing a single large site rather than many smaller ones. This makes ReInteractive very expensive and I started discussions with them about looking at it as a group of sites and it’s open for further discussion with them if we feel it’s needed.

We also previously hosted at Anchor who offers some DevOps on the stack and their service was also strong.

Use Amazon stack directly

Rather than going through EY we could setup a single instance directly using the Amazon stack directly. There’s also Google Cloud as an alternative to Amazon that I’ve heard decent things about.

Stay with EY and do better

This may include a better strategy for managing shared servers (fewer clients, clear upgrade tasks) and to renegotiate a better deal with them.

Hire to setup and then we run

We could hire a consultant to be responsible for managing this transition so we can keep the focus on our clients and Koi. Once set up we could be responsible for managing.

Unresolved questions

The main unresolved questions are the amount of time, expertise and risk that is associated with going for a single instance hosting. I believe the best way to answer this question is to just do it.

Obversity commented 7 years ago

Maybe choose 3-5 different ones and choose a time frame to setup and migrate and a time to evaluate.

I am completely in favour of this. I think the goal when doing the first one should be to write a server setup script that will do most of the hard work of setting up the server for us, and write a koi generator to generate capistrano config that allows us to deploy to this setup very easily. We'd need to:

The goal should be to make deploying to something like Vultr as painless as possible for a new project. And given that we control everything in the stack, we should be able to make it even less painless than EngineYard with a bit of upfront effort.

We should consider using LetsEncrypt with this setup. As I understand it, Peter has some experience setting this up, so that it's essentially a free SSL service for our customers. (We could charge them a fee for this, essentially offering a cheaper service than comodo / other SSL providers.)

pvawser01 commented 7 years ago

@jsidoryn Quirks and ongoing issues The re attaching of IP's is no longer an issue. There was a bug in EY that was rectified soon after an issue we had and I provided a poor experience feedback report. IP's are now reattached as they should.

Redis/sidekiq issues This could be resovled with tighter configuration of queue namspaces, so that multiple workers don't process the wrong queues.

There are issues with bundler under EY which does not seem to be something they are treating as a high priority. We have recently found that this issue can cause problems with KOI binstubs.

Using a VPS service like Vultr, we would need to determine the base setup required by a standard KOI application and create an image of that setup to speed up server creation. Ongoing updates and security patches would need to be addressed also.

Hiring an external contractor to setup then we run it could be worse as the knowledge of the setup what and why would be lost or at least removed from the company. Support SLA's would need to be in place for that kind of thing.

I think the nature of our business, requires us to have a certain level of devops skill in house as when issues arise, clients contact us ad expect resolutions very quickly. I don't think it would be a good image to say use the "upstream provider issue" line.

jsidoryn commented 7 years ago

Do either of you have any thoughts on using Docker as a way to manage the build images as well as a way to get around the gem and dependency issues we were looking to address with RVM / rbenv-gemsets?

No problem if you don't have an opinion and we can talk about it tomorrow but if it's something you have a view on I'd be interested.

heathamos commented 7 years ago

I am still getting my head around this but:

Digital Ocean Setup Script

  1. Haven't we just done a heap of work regarding a server setup script for Digital Ocean with SALC?
    • Peter invested days getting this working and I was of the understanding that this was the ground work for repeating new box setups on Digital Ocean going forward.

From my perspective Reliability, Cost, and Effort and the three main considerations.

Reliability The EY service seems to be very reliable. Yes we have some issues but as a whole the down time is negligible and we often solve any problems quickly.

Effort It seems to be me that there is a bit of effort to manage and maintain but it seems acceptable considering the number of applications and servers involved.

Cost Is Very high. We spend close to $100K a year on hosting which is basically breaking even. This is better than losing money but it would be better to be making money.

Overall this is definitely not our focus nor do I want it to be. I feel that this will become a big time sink and quickly warrant 60 - 100% of someones time.

I would like to run some number as part of this process as the only reason we would venture down this path is either break even with significant benefits from a KOI/Project management perspective or preferably profitable. I would suspect the equivalent cost of a FT employee e.g $50-$60K p.a. should be added as a cost to set up and maintain a fully functioning infrastructure.

I agree lets run some numbers first on an agreed architecture/projects then if that makes sense set a test up.

pvawser01 commented 7 years ago

@jsidoryn Docker is something Matt and I looked at a while back and I think JBRt is using it. However, there is a lot to learn there. While The concept is a good one, I think it would be something that would require a good chunck of investment to get the knowledge to use docker successfully. Using docker requires all devs (front and back end) to understand it.

Obversity commented 7 years ago

Just for costing comparisons:

In my opinion, the Vultr $20 option would easily suit our needs, server specs wise, for single-instance apps like Guide Dogs / Minda / Oliveri etc.

Heroku is hard to calculate, but it looks absurdly expensive when you consider that things like Redis (required by Sidekiq) and Thinking Sphinx would be $$ per month. Standard 'dynos' are $25 a month each, and they're 512mb-1gb ram each. And it looks like you pay additionally for postgres, depending on how much data you have. I could see it getting very expensive very quickly.

Absolute cost from hosting company

I agree that it shouldn't be about absolute cost from the hosting company. But if we're going to be considering similar services (e.g. Vultr vs DO vs Rackspace) it's important!

Obversity commented 7 years ago

Regarding hiring an external contractor, I'd only be happy with that situation if they were to come in house for a couple of days while they were setting it up, and teach us as they go. That would be invaluable, for our dev ops knowledge, and for getting a solid server setup we could copy and move forwards with.

Having them do it completely on their own and then leave it with us to work with would be much worse than the current situation I'd think.

Obversity commented 7 years ago

Regarding Docker: It's something I'd be interested in, but yes, it'd be a big decision given that none of us are at all familiar with it.

jsidoryn commented 7 years ago

@Obversity @pvawser01 @heathamos thanks for all your input.

I've been doing a bit more research around this and while Docker is a new to learn it may still be worth evaluating as a development spike. We can chat about this more and evaluate it against some of the criteria that @heathamos mentioned.

Another alternative could be to look at something like Dokku or Flynn: http://dokku.viewdocs.io/dokku/ https://flynn.io

I had a good look through Dokku this morning and it looks pretty cool. From what I understand it's basically a Docker container that uses Heroku build packs through something called Herokuish. These seem to be well tested and might be a good midpoint between EngineYard and starting from scratch. There's also the Dokku CLI tool for deployments etc.

There's also a number of plugins for things like Postgres. I'm not sure of how it works for setting up things outside of Dokku such as Reddis. Might be a deal breaker...

Interested to hear whether @Obversity or @pvawser01 have seen this and have any thoughts.