[TESTING][GCP][GKE] Chart on GCP

taktakpeops commented 4 years ago

Following a chat on the issue https://github.com/jitsi/docker-jitsi-meet/issues/565 for Jitsi Meet in K8S using this chart, I am moving the discussion specifically related to GKE here.

@ChrisTomAlx - could you share your findings here?

I will also provide you some support.

ChrisTomAlx commented 4 years ago

@VengefulAncient I am also leaning towards the daemonset currently, mostly because I know it can be done. Only two problems I see. Firstly kubernetes frowns upon privileged pods. Secondly the image that I am using in the container currently seems to be created by Google themselves, so this method might only work on GKE if we can't find the same image on dockerhub or some other public docker repo.

ChrisTomAlx commented 4 years ago

I have compiled all my work over the past month here. Hope this helps people who stumble on this. Some points to note.

I used this chart for the provision volume claims that allows readwritemany mode for the claims.
The daemonset currently is for GCP only. But if you can find a docker image that does the same as what GCP's default startup-script image does, then it should work on other environments.
Make the appropriate changes that are mentioned before the code lines in each section.
Make sure to have ingress setup correctly with a domain name so you can reach the getrecording api correctly
Make sure you create a nodepool with kubernetes label type : jibri with each node having around 2 - 4 vcpu and 1 - 4 RAM. Enable auto-scaling on this nodepool and change the horizontal pod scaler jibri yaml file below to match the scaling you want. Each node will only hold 1 jibri pod. I tried multiple other ways, unfortunately this is the most consistent and infinitely scale-able approach I could find. You need to set the cpu utilization based on when you want the next node to startup. Node startup along with running of startup scripts could be a time consuming task that takes roughly 7 -10 minutes so it is a good idea to have a couple of extra jibri pods ready.

My FInished deployment files. Please replace the two docker images with your own

Please change the deployment images ( Jibri and getRecording )as you require. ```yaml # DAEMONSET FOR STARTUP SCRIPTS kind: DaemonSet apiVersion: apps/v1 metadata: name: daejibri spec: selector: matchLabels: name: daejibri # Label selector that determines which Pods belong to the DaemonSet template: metadata: labels: name: daejibri # Pod template's label selector spec: hostPID: true nodeSelector: type: jibri containers: - name: daejibri image: asia.gcr.io/google-containers/startup-script:v1 securityContext: privileged: true resources: limits: memory: 200Mi requests: cpu: 100m memory: 200Mi env: - name: STARTUP_SCRIPT value: | ! /bin/bash mkdir -p /yourcompany; if [ -z "$(lsmod | grep -om1 snd_aloop)" ]; then sudo apt update --yes && sudo apt-get install linux-modules-extra-gcp --yes && sudo echo "options snd-aloop enable=1 index=0" > /etc/modprobe.d/alsa-loopback.conf && sudo echo "snd-aloop" >> /etc/modules && sudo reboot fi --- # DEPLOYMENT FOR JIBRI MAIN apiVersion: apps/v1 kind: Deployment metadata: name: yourcompany-jitsi-meet-jibri spec: replicas: 1 selector: matchLabels: app: yourcompany-jitsi-meet-jibri template: metadata: labels: app: yourcompany-jitsi-meet-jibri spec: nodeSelector: type: jibri serviceAccountName: yourcompany-jitsi-meet securityContext: fsGroup: 999 volumes: - name: dev-snd hostPath: path: "/dev/snd" type: Directory - name: dev-shm hostPath: path: "/dev/shm" type: Directory - name: test-volume-claim persistentVolumeClaim: claimName: test-volume-claim containers: - name: yourcompany-jitsi-meet-jibri image: 11111<===replace with your jibri image or the latest on dockerhub===>11111 imagePullPolicy: Always resources: requests: memory: ".8Gi" cpu: "2.0" limits: memory: "1.0Gi" cpu: "2.1" volumeMounts: - mountPath: "/dev/snd" name: dev-snd - mountPath: "/dev/shm" name: dev-shm - mountPath: "/data/recordings" name: test-volume-claim securityContext: privileged: true capabilities: add: - SYS_ADMIN lifecycle: postStart: exec: command: - "sh" - "-c" - > mkdir -p "/config/recordings"; mkdir -p "/data/recordings"; echo "mv -f /config/recordings/* /data/recordings" > /config/finalize.sh; envFrom: - configMapRef: name: yourcompany-jitsi-meet-jicofo - configMapRef: name: yourcompany-prosody-common - configMapRef: name: yourcompany-jitsi-meet-web env: - name: DISPLAY value: ':0' - name: JIBRI_FINALIZE_RECORDING_SCRIPT_PATH value: /config/finalize.sh - name: JIBRI_STRIP_DOMAIN_JID value: muc - name: JIBRI_LOGS_DIR value: /config/logs - name: JIBRI_RECORDING_DIR value: /config/recordings --- # DEPLOYMENT FOR JIBRI GET-RECORDING API apiVersion: apps/v1 kind: Deployment metadata: name: yourcompany-jitsi-meet-getrecording spec: replicas: 3 selector: matchLabels: app: yourcompany-jitsi-meet-getrecording template: metadata: labels: app: yourcompany-jitsi-meet-getrecording spec: volumes: - name: test-volume-claim persistentVolumeClaim: claimName: test-volume-claim containers: - name: yourcompany-jitsi-meet-getrecording image: 11111<===Your get recording docker image===>11111 imagePullPolicy: Always volumeMounts: - mountPath: "/data/recordings" name: test-volume-claim lifecycle: postStart: exec: command: - "sh" - "-c" - > mkdir -p "/data/recordings"; --- kind: PersistentVolumeClaim apiVersion: v1 metadata: name: test-volume-claim spec: storageClassName: "nfs" accessModes: - ReadWriteMany resources: requests: storage: 10Gi --- # SERVICE TO GET RECORDING apiVersion: v1 kind: Service metadata: name: yourcompany-meet-get-recording labels: app: yourcompany-jitsi-meet-getrecording spec: type: NodePort ports: - port: 80 targetPort: 80 protocol: TCP name: http selector: app: yourcompany-jitsi-meet-getrecording --- # HORIZONTAL POD SCALER FOR JIBRI POD apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler metadata: name: yourcompany-jitsi-meet-jibri-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: yourcompany-jitsi-meet-jibri minReplicas: 3 maxReplicas: 6 metrics: - type: Resource resource: name: cpu targetAverageUtilization: 3 --- # HORIZONTAL POD SCALER FOR JVB POD # apiVersion: autoscaling/v2beta1 # kind: HorizontalPodAutoscaler # metadata: # name: yourcompany-jitsi-meet-jibri-hpa # spec: # scaleTargetRef: # apiVersion: apps/v1 # kind: Deployment # name: yourcompany-jitsi-meet-jibri # minReplicas: 3 # maxReplicas: 6 # metrics: # - type: Resource # resource: # name: cpu # targetAverageUtilization: 3 ```

My get recording source code - NodeJS

Its not perfect but it mostly works. Make sure to change yourJitsiMeetFullDomainName. Use [this ](https://nodejs.org/de/docs/guides/nodejs-docker-webapp/)to dockerize it ```js const http = require('http'); const fs = require('fs'); const os = require('os'); var glob = require("glob"); const path = require("path"); // var contentDisposition = require('content-disposition') const server = http.createServer((function (req, res) { if (req.url != '/favicon.ico') { try { let url = req.url; let filePattern = url.split("/")[url.split("/").length - 1].trim(); filePattern = filePattern.toLowerCase(); // let recordingPath= "D:" + path.sep + "Work" + path.sep + "yourcompany" + path.sep + "Experiments" + path.sep + "jitsi-helm" + path.sep + "data" + path.sep + "recordings"; let recordingPath = path.sep + "data" + path.sep + "recordings"; let pathSplitter = os.type() == "Windows_NT" ? '/' : path.sep; let files = glob.sync(recordingPath + "/**/*.mp4", {}); console.log(files); let fileNames=[]; let validFilePaths=[]; // Get only valid file paths and file names for the corresponding conference for (let index = 0; index < files.length; index++) { let fullFilePath = files[index]; let fileName = fullFilePath.split(pathSplitter)[fullFilePath.split(pathSplitter).length - 1]; if(fileName.includes(filePattern) && filePattern !== "") { fileNames.push(fileName); validFilePaths.push(fullFilePath); } console.log("fileName",fileName); } console.log("fileNames"); console.log(fileNames); console.log("validFilePaths"); console.log(validFilePaths); // Show no conference found message if not found else get the array of files and serve the latest if(fileNames.length == 0) { res.writeHead(200, {'Content-Type': 'text/plain'}); res.end('Sorry no such conference found! Please recheck the conference name and open yourJitsiMeetFullDomainName/api/getrecording/ in your browser'); } else { // Get the array of dates let dates = []; for (let index = 0; index < fileNames.length; index++) { let fullFileName = fileNames[index]; let fileHalves = fullFileName.split("_"); let timeAndExtHalf = fileHalves[fileHalves.length - 1]; let timeHalf = timeAndExtHalf.split(".")[0]; let extHalf = timeAndExtHalf.split(".")[1]; let timeSplit = timeHalf.split("-"); dates.push(new Date(timeSplit[0]+'/'+timeSplit[1]+'/'+timeSplit[2]+' '+timeSplit[3]+':'+timeSplit[4]+':'+timeSplit[5])); console.log(timeHalf); console.log(extHalf); } console.log(dates); let max = dates.reduce(function (a, b) { return a > b ? a : b; }); let maxDateString = max.toISOString().replace(/T/, '-').replace(/\..+/, '').replace(/:/g, '-'); console.log(maxDateString); let finalfileName = fileNames.reduce(function (a, b) { return a.includes(maxDateString) ? a : b; }); let finalFilePath = validFilePaths.reduce(function (a, b) { return a.includes(finalfileName) ? a : b; }); // Download the file var stream = fs.createReadStream(finalFilePath); res.writeHead(200, { 'Content-disposition': 'attachment; filename='+finalfileName+'.mp4', 'Content-Type' : 'video/mp4' }); stream.pipe(res); // also you can set content-type } } catch (error) { console.log(error); } } })); server.listen(80, () => { console.log('server started'); }); ```

taktakpeops commented 4 years ago

Hi @ChrisTomAlx - sorry for the late reply I was a but busy on another project ! Thank you for your feedback, will look carefully into it today !

As it seems that the chart works, let's start preparing for pushing that to the central Helm repo? :D

VengefulAncient commented 4 years ago

Incidentally, same situation as @taktakpeops for me, we're just getting back to Jitsi after some other stuff that took priority away from it. I have a few questions for both of you that I hope you would be able to help me with:

@ChrisTomAlx :

1) Why are you using a deployment with an HPA instead of a daemonset if you only plan to run one jibri pod per node? The daemonset will also allow you to simply init container to bootstrap your nodes, which is especially handy if you dedicate a node pool only to Jibri pods - nodes containing other components will not need to be rebooted. 2) I assume the PVC is only used for recordings? Our company isn't interested in them, only livestreams, so I'd prefer to skip that part if possible. (BTW, 👍 on the nfs-server-provisioner, we use it for persistent NGINX cache shared between pods in RWX mode and it mostly works great.) 3) We're configuring five ALSA loopback interfaces for each node (echo "options snd-aloop enable=1,1,1,1,1 index=0,1,2,3,4" > /etc/modprobe.d/alsa-loopback.conf). But each Jibri pod can only do one recording/livestream at a time, correct? Does that mean we'd have to run five Jibri pods per node to actually make use of these extra interfaces? And if so, why do you prefer to keep only one Jibri pod per node - is that because of stability issues you mentioned?

@taktakpeops :

1) Did I understand correctly that this chart doesn't do actually infinite scaling and only sets minimum and maximum amount of replicas? 2) Are we actually clear on what needs to be scaled? I've been digging through a bunch of other Jitsi/Jibri threads and it seems like we only really need to scale Jibri and JVB, not prosody etc. I could be wrong though.

@ everyone:

Each JVB pod needs its own UDP port. This brings up two problems.

1) Do we have an idea of how to scale them infinitely (or at least for a few hundred replicas) without hardcoding the amount into values? We'd somehow need to store, read and write the list of already claimed ports. I don't have enough Kubernetes experience to know whether this is easily possible. 2) Opening these ports requires a firewall rule - is it possible to create one as a Kubernetes object? So far, I've been doing that manually on GKE using gcloud compute firewall-rules for just one or two hardcoded JVB ports, and while it's theoretically possible to do it from a daemonset init script by installing Google Cloud SDK (Google actually suggests doing just that in their example), it's not very maintainable and generally clumsy.

As always, thank you both for continuing to look into this, your efforts are highly appreciated.

ChrisTomAlx commented 4 years ago

@taktakpeops Sure.. Although I am not entirely certain helm is still accepting charts into their repo.

@VengefulAncient Here are my views on these questions :-

@ChrisTomAlx :

Why are you using a deployment with an HPA instead of a daemonset if you only plan to run one jibri pod per node? The daemonset will also allow you to simply init container to bootstrap your nodes, which is especially handy if you dedicate a node pool only to Jibri pods - nodes containing other components will not need to be rebooted.

I assume the PVC is only used for recordings? Our company isn't interested in them, only livestreams, so I'd prefer to skip that part if possible. (BTW, 👍 on the nfs-server-provisioner, we use it for persistent NGINX cache shared between pods in RWX mode and it mostly works great.)

We're configuring five ALSA loopback interfaces for each node (echo "options snd-aloop enable=1,1,1,1,1 index=0,1,2,3,4" > /etc/modprobe.d/alsa-loopback.conf). But each Jibri pod can only do one recording/livestream at a time, correct? Does that mean we'd have to run five Jibri pods per node to actually make use of these extra interfaces? And if so, why do you prefer to keep only one Jibri pod per node - is that because of stability issues you mentioned?

Daemonsets can't be scaled as far as I could tell. HPA's allow you to scale the deployment as per your wish based on the cpu and memory utilization of pods
PVC is used only for recording.. Although I haven't tested the live stream interaction. If you see some problems there let me know, I can look into it.
This is a bit more complicated. So initially I had 5 jibri deployments (and 5 jibri HPA) instead of one, with podantiaffinity.. That worked alright but recording were crashing after 5 minute intervals. It could be a cpu issue, but I could not find the cut off required. Also kubernetes docs mentions not to use podAntiAffinity in clusters that have several hundred nodes. So I decided this would be the only infintely scalebale option. You can alter the HPA based on your need. Provide a min and a max and set the cpu utilization at which you want to start up a new pod.

@ everyone: Each JVB pod needs its own UDP port. This brings up two problems.

Do we have an idea of how to scale them infinitely (or at least for a few hundred replicas) without hardcoding the amount into values? We'd somehow need to store, read and write the list of already claimed ports. I don't have enough Kubernetes experience to know whether this is easily possible.

Opening these ports requires a firewall rule - is it possible to create one as a Kubernetes object? So far, I've been doing that manually on GKE using gcloud compute firewall-rules for just one or two hardcoded JVB ports, and while it's theoretically possible to do it from a daemonset init script by installing Google Cloud SDK (Google actually suggests doing just that in their example), it's not very maintainable and generally clumsy.

HPA's should allow you to do that. You can alter the HPA anytime during the life of the deployment. Although, if you do a manual edit. You might also have to do a manual delete when you want to delete the chart (not sure though). Also autoscaling of nodepools makes sure that you are not spending money unnecessarily by having too many unused nodes present.
You can create a firewall rule in GCP and then when creating a nodepool in GKE it will ask you for a network label. That should work, but I haven't tested it. I just edited an existing firewall rule that applies to all the compute engine VM's and made it allow the correct UDP port. This was purely for testing.. I would not encourage you to do this on prod. I would suggest going the network label way, it should work imho. Give this nodepool a kubernetes label that matches JVB's nodeselector and GIve it a network label and all your JVB pods will sit here with welcoming arms to any UDP requests.

Hope this helps!!

taktakpeops commented 4 years ago

@ChrisTomAlx : you are right. It’s not about pushing to their repo but making the chart available by following these guidelines: https://github.com/helm/hub/blob/master/Repositories.md

Regarding the HPA, it applies only to the web element and the JVB element which can be scaled. Jibri + Jicofo could be moved from a statefulset to a daemonset, would be more logical I think.

The PVC such as the broadcaster are optional, if you don’t want them you can disable it in the value file used to deploy your chart.

For the recording part, not sure Alsa is the best solution as we discussed earlier with @ChrisTomAlx - more investigations are on going on my side.

Regarding the HPA for JVB, I want to get it to work in sync with Octo.

About Jibri, an HPA doesn’t make much sense I think as you would require also vertical scaling (more sound devices).

VengefulAncient commented 4 years ago

@ChrisTomAlx

Daemonsets automatically schedule one pod per each node that matches labels/tolerations for it. If you are only scheduling one Jibri pod per node anyway and then starting more nodes based on CPU usage, it should be more or less the same.
Livestreaming without recording does work without PVCs, that's why I was wondering :) I got used to the idea of recordings being saved on the nodes and not persistent volumes, but your idea is better in case we decide to record, since I assume that nodes can be killed at any time (I am using pre-emptible nodes on staging to cut costs).
Sorry, I'm getting confused. Do you mean that you had 5 different Jibri deployments with their own replicasets, each depositing one pod per node, resulting in 5 pods per node? That's an interesting idea, I haven't thought of that. I'm not interested in doing that though, since we're only interesting in having one participant per conference and livestreaming to YouTube, and for that purpose, 2 vCPUs and 2 GB RAM do the job - just make sure to set disableThirdPartyRequests: true in your Jitsi web component config (see this issue) to avoid Jibri eating all RAM. So for my purposes, I can just have one Jibri pod on each of such nodes. Trying to have more will overload the CPUs. (BTW I'm currently using custom N2D nodes on GKE to make sure I get AMD EPYC and not some random older Intel - somehow, with N1 nodes with the same amount of RAM and vCPUs the stream was doing much poorer)

@taktakpeops

Do we actually need to scale Jicofo though? I'm not trying to argue against it BTW, I'm just really confused by all the components, I'm not sure what Jicofo actually does.
ALSA might not be the best solution, but I already have a script bootstrapping Ubuntu nodes for it, and it all works. If that part can be thrown out, that would be great, but I prefer it over having to modify any of the Jitsi component images - it's frankly a huge pain because of the poor documentation, confusing errors and insane amount of environment variables. Completely unmaintainable, IMO. Not sure whether a Helm chart that makes you modify an image before it works would be too popular, either. (Though the init script also currently requires Ubuntu nodes which aren't standard on either GKE or AWS, so that might be a moot point)
Could you please link me somewhere I can read about Octo? I swear I must be really stupid, but somehow I was not able to find any documentation on what it is, besides passing mentions by Jitsi maintainers. Is it a Kubernetes thing? A Jitsi thing?
Why would we need more sound devices though? You need pretty powerful nodes to handle multiple Jibri pods. Makes more sense to just run one Jibri per node (with weaker nodes), using one audio device. The costs on any major cloud provider are per CPU/RAM, so having one node with let's say 8 vCPUs and 16 GB RAM that can handle 4 Jibri pods is the same as having 4 nodes with 2 vCPUs and 4 GB RAM each - sure, there's some Kubernetes overhead you get rid of by moving to fewer larger nodes (less control plane pods), but it's not worth the hassle IMO. Unless I misunderstand something?

@ everyone

Of course I want HPAs and autoscaling, we definitely don't want to spend more money on idle nodes :) But my point was that every JVB needs a unique UDP port. Which then needs to be opened in the firewall, defined in the JVB service, and in the JVB_PORT variable for each JVB pod. I don't see how HPAs would allow me to achieve that. Unless I'm again misunderstanding something? (Seems to be a running trend with Jitsi)
Do you mean opening all UDP ports just for the node pool that holds JVB pods? I guess that could work, though I'd probably go with a certain range instead.

Also, a new question came up: in my deployment, Jibri sometimes becomes ready before Prosody, so it (predictably) can't authenticate and just fails (which is where a k8s-native application would just kill the pod and restart, but alas, no such thing with Jibri). Normally I'd solve that by attaching an init container that would poll Prosody service, but since none of these components have proper healthcheck endpoints, this isn't going to work - Prosody only returns XMPP, not HTTP status codes I could use for Kubernetes health checks. Have either of you run into this issue? If so, how do you handle it? The official Jitsi docker-compose.yaml simply has depends_on directives for each component, which give us an understanding of which components should be ready before others, but Kubernetes sadly still does not support container ordering.

TIA and sorry for another wall of text 😢

ChrisTomAlx commented 4 years ago

Daemonsets automatically schedule one pod per each node that matches labels/tolerations for it. If you are only scheduling one Jibri pod per node anyway and then starting more nodes based on CPU usage, it should be more or less the same.

But how do you scale it? You need an HPA right to check and scale based on cpu utilisation? and HPA's don't work on daemonsets as far as I know. Or atleast they are not designed for that purpose.

Sorry, I'm getting confused. Do you mean that you had 5 different Jibri deployments with their own replicasets, each depositing one pod per node, resulting in 5 pods per node?

Exactly but I had to use podAntiAffinity and that isn't infinitely scaleable so I went with the one pod per node concept. Also I was experiencing random crashes which could be solved by adding more cpu maybe. But I stopped going down that rabbit hole after the podAntiAffinity issue.

Of course I want HPAs and autoscaling, we definitely don't want to spend more money on idle nodes :) But my point was that every JVB needs a unique UDP port. Which then needs to be opened in the firewall, defined in the JVB service, and in the JVB_PORT variable for each JVB pod. I don't see how HPAs would allow me to achieve that. Unless I'm again misunderstanding something? (Seems to be a running trend with Jitsi)

Do you mean opening all UDP ports just for the node pool that holds JVB pods? I guess that could work, though I'd probably go with a certain range instead.

So what I was planning is to have a HPA for JVB. Again with one JVB per Node. firewall rule will only open one particular UDP port in all the nodes of the JVB nodepool. This is again vertical scaling so I would not suggest this setup for on prem deployments.

Also, a new question came up: in my deployment, Jibri sometimes becomes ready before Prosody

I had that issue as well.. For now I am deploying jibri only once prosody is up.. But there has to be a better way. If prosody restarts for whatever reason all jibri pods will go down with no restart. So that is an issue.. If you do find a workaround do post it here.

marcoadasilvaa commented 4 years ago

+1

taktakpeops commented 4 years ago

Hello @ChrisTomAlx,

Sorry (again) for the late reply.

But how do you scale it? You need an HPA right to check and scale based on cpu utilisation? and HPA's don't work on daemonsets as far as I know. Or atleast they are not designed for that purpose.

In this case, you aren't scaling horizontally but vertically. So if you need a new daemonset, you would spawn a new node.

However, now that I am running Jitsi at scale in a EC2 infrastructure, I realized that for JVB, Octo can work in K8s using the BRIDGE_SELECTION_STRATEGY set to IntraRegionBridgeSelectionStrategy. In this case, your JVB can be managed by a deployment and therefore, you can apply an HPA. You just need to ensure that while you are having traffic on one instance, K8S doesn't kill it.

Exactly but I had to use podAntiAffinity and that isn't infinitely scaleable so I went with the one pod per node concept. Also I was experiencing random crashes which could be solved by adding more cpu maybe. But I stopped going down that rabbit hole after the podAntiAffinity issue.

If we are auto scaling the nodes of the clusters for JVB, Jibri can benefit from the same logic (so 2 daemonset in this case).

So what I was planning is to have a HPA for JVB. Again with one JVB per Node. firewall rule will only open one particular UDP port in all the nodes of the JVB nodepool. This is again vertical scaling so I would not suggest this setup for on prem deployments.

Can do, but need custom image for Jicofo + JVB (sip properties) - look at first answer.

I had that issue as well.. For now I am deploying jibri only once prosody is up.. But there has to be a better way. If prosody restarts for whatever reason all jibri pods will go down with no restart. So that is an issue.. If you do find a workaround do post it here.

I still think that one pod containing Jicofo + Prosody + Jibri is the way to go. Basically some kind of main pod that you would scale vertically as explained for the daemonset.

I will be available tomorrow, if you are interested, we can plan a call online to answer most questions !

taktakpeops commented 4 years ago

@VengefulAncient : will reply to your questions today. As suggested in my previous answer, we can have a call to go through all questions once for all :D

taktakpeops / jitsi-meet-helm

[TESTING][GCP][GKE] Chart on GCP #1