kelinger / OmniStream

Deployment and management tools for an entire streaming platform that can reside on a server (local, remote, hosted, VPS) with media files stored on cloud services like Google Drive or Dropbox.
MIT License
30 stars 8 forks source link

A way to make OmniStream Highly Available? Multi nodes #27

Closed wickedshrapnel closed 1 year ago

wickedshrapnel commented 1 year ago

So everything is running great on one machine. I have been wondering if there is a way to scale this and distribute the load and make applications highly available? Like running just plex on one node and all the other services on another. All services able to fail over to other nodes but when all up and running it spreads services across nodes and updates the cloudflare dns IPs for those services depending on which node they are running on. (nodes are in separate datacenters). I've seen kubernetes works like this but I am just starting to learn about kubernetes so wouldn't know how to tie it in to make something like this work.

kelinger commented 1 year ago

Sorry for the lengthy response that's about to follow!

Good day, @wickedshrapnel and thank you for your comments. I have "distributed" my OmniStream amongst two servers on two different continents. To be clear, though, "distributed" is not the same as "highly available." What I mean is that I could do my downloading from server A and my viewing (streaming) from server B. This is simply handled by doing the OmniStream installation on two servers and selecting different components for each one (though there is some overlap, like Traefik, OmniMount, NetData, etc.).

Obviously, there's nothing highly available about this configuration though it does mean that downloads are handled by a server that has the right bandwidth, storage, etc. while streaming is handled by server with a good proximity to where I'm likely to be streaming.

For a truly highly available system, there's a lot to consider and one of those considerations is that the containers themselves support some kind of clustering or failover properties. True that Docker itself can add some of that, but stay with me here 😄 .

In an early attempt at this, I had the "/configs" subdirectory on my cloud mount (Google, though the actual host isn't that relevant). This meant that, in theory, should server A fail, server B should be able to come up and essentially take over, though this wouldn't be instantaneous. The problem here is that apps like Plex have literally hundreds of thousands of files in the configs. True that they're mostly small files but that just makes things worse since you really only get the bandwidth speed you expect when you're transferring large files. Even though my /configs/plex directory is 20G in size, more than a day went by with only a subset of them being synced up to Google.

So, you may be saying "ok, that's the initial sync with a full library but once that's finished, keeping it in sync shouldn't be nearly as bad, right?" Well, unfortunately, Plex likes (or liked; not sure it still does this) to "touch" just about every single file when you start it up. So, small changes to attributes like the date/time of access were simply pounding the heck out of Google because OmniStream constantly had to re-upload what it thought were changes.

In fact, that whole issue also seemed to destroy some of the alternate ideas, such as server B replicating the configs from server A (with rclone, rsync, etc.). Sure, it's possible to tweak some of the settings, especially in rsync, to ignore some attributes when determining what has changed, but now we're entering a dangerous area where B doesn't really equal A and will that have some future consequence we weren't expecting?

Lastly, there's the database issue. Plex and its friends write to a database which is essentially an open file for the life of the container. There's a high likelihood that what is written up to the cloud (or directly to the other server) isn't a database in a consistent/healthy state as the app (Plex, etc.) was writing to it while it was being copied. Sure, you might have all the files copied when A fails but if B can't start up or has to use a backup database file from a previous day/week, then this doesn't really qualify as a successful failover.

Instead, for OmniStream, we beefed up the backup and restore process. Both @TechPerplexed and I have successfully erased our servers and started from scratch and had a usable system within an hour. This, too, doesn't seem "highly available" (and let's face it, it doesn't pass the spouse/significant-other/roommate -friendly test) but it does err on the side of caution, especially with the differential backups.

One working alternative: this isn't a "highly available" solution but does solve a lot of the problem. If you install OmniStream like I described on two servers, make sure you name anything that exists on both differently on each server. For example, Traefik1 and Traefik2, Plex1 and Plex2, etc. (this is done through the menus or with the omni edit command). You would have two Plexs setup and viewers could watch from either one. A tool like "watchstate" keeps the viewing and playback status in sync between the two servers. Server A fails and you're kicked off Plex on your Apple TV, Fire Stick, etc. and you just switch to the other Plex server on B. You may have some fast forwarding to do, but movie night isn't ruined.

The downside here is that any new videos now have to be scanned by both Plex servers and at least one of them isn't doing this locally which means it is hitting the cloud APIs (in my case, Google) while it scans new videos for metadata, chapter markers, creating thumbnails, and so on. Normally, this will be just fine if you aren't the type that adds every single movie/show to your collection. But, if you add the entire 4K James Bond Collection in a day, you may find yourself hitting API or upload bans after passing a certain quota (if your download quota is 1 TB, for example, each server hitting over 500 GB during scanning within 24 hours may cause you to cross that threshold).

In the perfect world, Plex/Emby/Jellyfin would itself support multiple servers with the same name and a database sync. Thus, any changes to one could easily be replicated to the other and at least one of them wouldn't need to "scan" until it became the "master."

wickedshrapnel commented 1 year ago

I appreciate the in-depth response. I understand the complexities now. I'm going to let the highly available idea go. I attempted running on two different servers already and ran into issues with having the components the same name. I didn't even think about renaming them all, I thought that might mess up the existing server somehow like it would change the name in the config on the rclone drive and the other one would try to use that config as its own. Thanks for that tip. That should get me where I need to be. Excellent job on the scripts and all the config that makes OmniStream what it is. It really made setting everything up so easy and is so well incorporated together, like having all the logs in one place. Keep up the good work. I look forward to updates to this project and any others you are working on. 👍🏻