rstudio / helm

Helm Resources for RStudio Products
MIT License
33 stars 28 forks source link

PPM: when replica is set > 1, initial package request is very slow #534

Open lachlansimpson opened 1 month ago

lachlansimpson commented 1 month ago

From internal ZD ticket 107580:

Our PPM is installed by Helm Helm version 0.5.29 And installed on GCP GKE cluster.

We found that package installation is slow when user try to download a package at the first time.

For each request we make to ppm, it takes more than 2 mins for response, so we even need to increase the IDE timeout time for downloading.

For example: Test with Terraform installation in Python. It has about 30 dependencies of pkgs, so it cost 45 mins for first download.

At the first time even though IDE is timeout but PPM still working background. So the second time we download it will be fast in seconds.

We found that when we set the GKE values replica to 3, the slow request happends. If we adjust the number to 1. The slow request won't happen.

So we suspect that is a NFS timeout issue that on the documentation. https://docs.posit.co/rspm/admin/ha/#nfs

But after we set the client mount lookupcache to pos. The pods start to crash with some error logs. It's attached with pics.

We want to solve the slow request and set the replica to 3 for HA. Please help us find out the reason and solve the issue, thanks.