openfoodfacts / openfoodfacts-infrastructure

Where we collaboratively plan and maintain the infrastructure of Open Food Facts
3 stars 6 forks source link

Move stagging dockers (200) to ovh1 #217

Closed alexgarel closed 1 year ago

alexgarel commented 1 year ago

We have latency problems from ovh2 to ovh3 which makes using nfs mounts for off-net impractical. I though we could move the datasets clones to ovh2 (#216) but it's not feasible: disks are 1T too small and the products dataset is 1,5 T (I did misread the size).

But while ovh2 is 10 ms away from ovh3, ovh1 is only 0,12 ms away (this is a 100 fold), so if we move the 200 VM to ovh1, we would be able to use nfs volumes from ovh3.

Task: move some services from ovh1 to ovh2 and move the 200 VM to ovh1.

alexgarel commented 1 year ago

Some proposal to make the switch and keep disk space well dispatched: We move from ovh2 -> ovh1

We could move ovh2 -> ovh1 (for a total of 234G !)

alexgarel commented 1 year ago

I did synchronize storage of 200 on ovh1 before migration but now disk is full ! I underestimated the size (as it's a block storage, I should have considered max size (322G) instead of real size… (294G)).

Also I got a difference between what proxmox console shows on ovh1 (Usage

98.23% (940.04 GB of 956.97 G) and zpool list command:

 rpool   920G   886G  33.8G        -         -    76%    96%  1.00x    ONLINE  -

I will move more containers:

alexgarel commented 1 year ago

Monitoring was in bad shape, and also ZFS on ovh1 was too high.

So I moved monitoring 203 to ovh2. Although I'm a bit sad that now monitoring is on same machine as prod dockers (200).

alexgarel commented 1 year ago

It has been stable, closing.

alexgarel commented 1 year ago

Something was missing !!!

I had to change the default route for VM that I moved (staging and monitoring):

  1. Edited /etc/network/interfaces to change gateway to the new host internal address (10.0.0.x)
  2. In a screen (to prevent ssh connexion loss between commands !): ip route del default via 10.0.0.2 dev ens18 onlink ; ip route add default via 10.0.0.1 dev ens18 onlink (or the other way around)
alexgarel commented 1 year ago

Now starting backend container on staging is fast again :tada: (from 1m20 in good scenarios to 0m12) !