magma / grants

0 stars 3 forks source link

Supporting High Availability on AMF #31

Open ajay-kashyap opened 2 years ago

ajay-kashyap commented 2 years ago

Proposal: Supporting High Availability on AMF

Elevator Pitch

High availability is a very essential element in a core network as it offers seamless services in case of failures. In the current proposal we are planning to achieve it from a load balancer.

When active node fails, load balancer will detect failure and it will switch the context to standby(Standby will undergo a transition to active) and start handling the requests.

The current HA proposal is considered for AMF node only.

Total ask

Support of HA feature on to Magma Architecture will be delivered in 3 milestones.

Contact Information

Ajay Kashyap (ajay.kashyap@wavelabs.ai)

Project Details

Prerequisite: Considering the stateless feature into account, we assume all the data structures, counters and configurations are stored in Redis db.

Current Architecture

 <Does not have HA support> 

Current Architecture

Proposed Architecture

  In the Current proposal, Plan is to introduce HA functionality for AMF using Redis sentinal along with a open source load balancer.

Proposed Architecture

Sequence of operations

 * Load balancer to send requests for active node by continuously monitoring heartbeat.
 * All the data in active to be synced continuously  with standby with Redis sentinal.
 * When there is heartbeat failure LB to send signal to Standby to undergo transition from active to standby.

Proposed approach

* An open source load balancer needs to be identified which monitors the heartbeat of magma AMF. 

* If heart beat is not received then load balancer needs to send a signal to standby for a transition from standby to active state. 

* Redis DB is used to store the session and policy details, It also stores the in memory data structures of AMF & this information is confined to a node. 

* Approach would be to use Redis sential for replicating  the data present in active to standby (Master - Slave) 

* When active node goes down, all its resources has to be cleared and the process needs to be gracefully shut down. 

* Load balancer will assign a floating IP to the current active node . 

Feature Roadmap

Feature will be delivered in 3 milestones. Each milestone will have the following 5 process gates.

MileStone 1

1)Identify a HA load balancer for magma product.

2)Load balancer (LB)with have the active and standby node configurations understanding and testing.

3)Existing AMF needs modifications to respond for the heartbeat messages sent from LB.

4)Integration testing of AMF and LB for heartbeat and heartbeat Ack.

MileStone 2

1)Configurations of Redis sentinal to be identified for active to standby replication.

2)Multi node setup creation.

3)Testing to be done at standby for redis db replication.

MileStone 3

1)Active to Standby transition testing.

2)After active to standby transition, resources from standby needs to be gracefully released and tested.

3)After active to standby transition all new calls get diverted here and LB assigns a floating IP and current calls will be intact, this needs to be tested.

4)Transitions from active to standby and vice versa to be rigorously tested.

Test plan


Milestone Deliverable Summary
MS1 Identify Load Balancer,LB to send heartbeat for SCTPd & AMF, Modify SCTPd & AMF to respond to heart beat , Integration testing
MS2 Configure Redis Sentinal for master-slave, multi-node setup creation, Redis data replication in standby and its testing
MS3 Active-Standby transition testing,LB assigns floating IP testing,All calls to be intact and new calls directed to current active testing

References

https://redis.io/topics/sentinel

https://www.haproxy.org/