litmuschaos / litmus

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
https://litmuschaos.io
Apache License 2.0
4.44k stars 697 forks source link

Force subscriber disconnection when already connected #4775

Closed ledbruno closed 4 months ago

ledbruno commented 4 months ago

Proposed changes

On a scenario with multiple infrastructures connecting to a "main" chaoscenter/portal, we can have websocket disconnection cases. In those scenarios the subscribers exists with -1 code and pod is restarted, but fails to restablish connection because server responds ALREADY_CONNECTED.

We are proposing to changes

Subscriber

1) should disconnect with panic, so ws connection (which is defered) can be called. 2) should exit when receives a ALREADY CONNECTED message from the server, since it should retry

Portal/chaosCenter

2) Should close previous channel and return ALREADY connected error. That will trigger ctc.Done() and cleanup in-memory map with connected infras, so a subscriber restart will be able to connect again.

Types of changes

What types of changes does your code introduce to Litmus? Put an x in the boxes that apply

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

Dependency

Special notes for your reviewer: