jitsi-contrib / jitsi-helm

A helm chart to deploy Jitsi to Kubernetes
MIT License
136 stars 75 forks source link

JVB pods can't communicate with each other in some OCTO setups #107

Open Hero9909 opened 6 months ago

Hero9909 commented 6 months ago

As a workaround of having just one jvb instance, which limits this chart, i was playing around with the thought of having a setup consisting of external load balancers, node port services and one jvb instance each. Using node selectors we could enshure either no overlaping, if its a problem.

something like this:

jvb:
  enabled: true
  instances:
    - name: videobridge1
      NodePort: 30001
      publicIP: xx.xx.xx.xx
    - name: videobridge2
      NodePort: 30002
      publicIP: yy.yy.yy.yy
    - name: videobridge3
      NodePort: 30003
      publicIP: zz.zz.zz.zz

do you think this this would work?

spijet commented 6 months ago

Hello @Hero9909!

Can you please elaborate? I'm currently running a testbed with 4 JVB instances and it seems to work fine. Please refer to the screenshot and a piece of my values.yaml below.

jvb:
  replicaCount: 4
  UDPPort: 32768
  useHostPort: true
  publicIPs:
    # A list of public IP addresses of 
    # all nodes in the cluster:
    - 1.2.3.4
    - 5.6.7.8
    - 4.3.2.1
    - 8.7.6.5
  stunServers: meet-jit-si-turnrelay.jitsi.net:443,stun1.l.google.com:19302,stun2.l.google.com:19302,stun3.l.google.com:19302,stun4.l.google.com:19302
octo:
  enabled: true
image

Make sure that you set .Values.octo.enabled to true, as this seems to be the main prerequisite to running multiple JVB instances.

Hero9909 commented 6 months ago

Are you shire that all instances are working? If i look at the logs i can only verify one working instance. The other does not seems to be used at all. What kind of traffic setup do you use? Are you running on metal or cloud?

spijet commented 6 months ago

I'm running RKE2 on bare metal, with JVB's UDP ports exposed via useHostPort: true. Jitsi Meet desktop app shows different server address every time I join a room with 3 or more participants, so I assumed that it means that Octo is working properly. I also see some noticeable traffic between all JVB pods that seems to be roughly the same (in terms of volume/bandwidth) for every pod. I'll do a more elaborate test tomorrow and get back with the results.

image
spijet commented 5 months ago

Hello again @Hero9909!

I've been tinkering with my test installation with OCTO enabled and saw a couple of weird problems, e.g. one of 4 JVB pods would refuse to exchange information with the other 3, and the users connected to it would end up isolated from the rest of the room. Also, your initial idea got me thinking about a way to add support for separate OCTO regions, which would require creating multiple Deployments of JVB and Web with different OCTO region specified for each.

If you don't mind, I'd like to use this issue to track the JVB miscommunication problem and use #111 to track the separate OCTO regions support.

Hero9909 commented 5 months ago

Shure, go ahead. Ive got 8h from my work to investigate in this issues, if you got something to test, ill try my best to get it running in our environment

spijet commented 5 months ago

Right now I'm trying to investigate that problem with a rogue JVB pod. For me one of the pods seems to ignore the other three, while @Bananenbrot1 says in #112 that all works well for him.

Will let you know if I find anything.

Bananenbrot1 commented 5 months ago

@spijet, I was conducting some tests today. I set up 5 JVB (Jitsi Video Bridge) pods across 5 different nodes and initiated a meeting with four participants. Everything functioned smoothly for about 15-20 minutes, every user was connected to a different server, but then my video suddenly froze, and I had to rejoin the session. I'm going to look into what might have caused this and will keep you updated on my findings.

spijet commented 5 months ago

@Bananenbrot1, please do! JVB logs can be a pain to read, though. :)

In my case one of the participants would consistently get directed to one of the JVBs (always the same, the one that's closest to them) and would always end up "separated" from the rest of the room (so everyone could see and hear everyone except them and vice versa). The issue didn't go away even after I restarted that JVB.

spijet commented 5 months ago

Hello @Bananenbrot1 and @Hero9909!

I just pushed a bunch of updates, including some fixes for JVB. Please test the new main snapshot and let me know if anything is better (or worse!) than it was before. I recommend testing the new snapshot with this config (if it's applicable for your setup):

jvb:
  ## Set JVB instance count (replace X with the number you want):
  replicaCount: X
  ## Expose JVB interface port to the outside world
  #  only on nodes that actually have it:
  useHostPort: true
  ## Make every JVB pod announce its Node's external
  #  IP address and nothing more:
  useNodeIP: true

octo:
  ## Enable OCTO support for both JVB and Jicofo:
  enabled: true

This setup makes every JVB announce one and only one IP address — the IP of the node any specific pod is currently running on. On the contrary, when we use .Values.jvb.publicIPs, every JVB pod announces every IP address from the list (which may or may not work as expected, depending on JVB and users behaviour). I'm currently testing my installation with the same config and it seems to work well — I only saw a stuck/isolated user once so far.

Bananenbrot1 commented 4 months ago

Hi @sphijet, will give it a shot next week and will try the current main!

spijet commented 4 months ago

Hi @Bananenbrot1! Can't wait for the news, thank you so much! :)

Hero9909 commented 3 months ago

So, here is how far we got. In our company a few things changed and therefor jitsi dropped in the background, but i could test the latest changes. In my company the nodes are inside a private network, where the ips cannot be published, so we use loadbalancers here that support to pass the udp traffic. sadly this does not work as expected. maybe our setup will advance in the comming month to get around this setup limitation maybe not.

as this sounds to me like "our" special setup is one of the issues here, i would like to see the results of bananenbrot1.

spijet commented 1 month ago

Hello @Hero9909 and @Bananenbrot1!

I'm going to do another round of testing the 1.4.0 chart with the newest images and (if all goes well) release the chart. Any feedback from both of you would be greatly appreciated. :)

Hero9909 commented 1 month ago

Deployment on our 3 clusters worked nearly out of box, just as expected. Testing in progress

Hero9909 commented 1 week ago

im getting back with not so great news, but as i think your waiting for feedback here i think its necessary to give you a bit of an update. currently im sick and no longer abled to work, its still unclear when i will be abled to return, but until then i cannot access any feedback. I'm really sorry about that.