hmcts / roadmap-platform-operations

0 stars 0 forks source link

Enable STATEFUL Network flows from HMCTS Azure to MoJ WAN (145) #723

Open hmcts-platform-operations opened 7 months ago

hmcts-platform-operations commented 7 months ago

DTSPO-17470

Summary

Currently we have a pair of Palo Alto VM Series Firewalls running behind a load balancer to give us a "active-active" setup. Unfortunately this only works with stateless networking flows. For stateful flows i.e anything involving NATs, this does not work as currently the NAT XLATES table (essentially a translation table) is not shared/synced between the two firewalls. The means if your return/subsequent traffic ends up going to the Palo that didn't originally issue the NAT it will be dropped. This causes the source and destination to constantly retry traffic, causing traffic to take >30 seconds in the best case scenario or timing out.

The current workaround is to create specific routes that send traffic via a single Palo (rather than the LoadBalancer). This obviously introduces a single point of failure and will cause confusion in the future are BAU activities like maintenance and failover of the Palos.

The intention is to (see additional information for other options that where considered):

 

Rough implementation plan in order:

Investigation should be done in non-prod to determine what the downtime would be, if we can setup the NATs in CloudGateway prior to removing them from the Palos we may be able to do this with zero downtime.

Known Impacted teams/services :

danielwilsonkainos commented 2 months ago

scope of this may mutate but for now there's a lot of pushback implementing the changes so withdrawn.

Richard R: "had an email about 10 mins ago asking us to pause. Pushback is pretty strong, so I think it’s 90%+ probable that we’ll need to cancel it. Happy for you guys to pull the next thing on the roadmap instead"