lsst-epo / rubin-obs-api

The backend to the Rubin Obs. operational site
0 stars 0 forks source link

Troubleshoot what's taking up ephemeral storage causing containers to enter a bad state #283

Closed ericdrosas87 closed 3 weeks ago

ericdrosas87 commented 1 month ago

The regression which caused the API container to drop from 4Gi to 2Gi of ephemeral storage illuminated that much more space is being taken up than expected. I need to perform root cause analysis to ensure that this problem does not scale as traffic grows.

ericdrosas87 commented 3 weeks ago

My suspicion is that the transformations that are happening on-the-fly via GQL queries are eating up storage space - these transformations are saved to disk before being sent along to GCS. I have set up an alert to go off once a container reaches 1Gi of ephemeral storage. The only way to determine for certain what's going on is to use the following commands once the alert goes off:

Disk Utility:

du -h --max-depth=1 

Disk Free Space:

df

The best place to run these commands is in the ./storage folder in the container and then cd up each subfolder until I find the culprit.