pulibrary / pdc_discovery

Princeton Data Commons discovery portal for Research Data
10 stars 0 forks source link

pdc-discovery-staging1 should be happy #636

Closed acozine closed 1 week ago

acozine commented 2 weeks ago

What maintenance needs to be done?

Fix nginx on pdc-discovery-staging1.

Level of urgency

Why is this maintenance needed?

CheckMK reported today that pdc-discovery-staging1 had a full disk. A quick look revealed that the /tmp directory had about 9GB of files in it. A reboot cleared those files, but the Rails health page check did not come back up. It looks like nginx has lost some of its configuration.

Implementation notes, if any

Here's the output from nginx for the two staging machines:

On the unhappy server:

pulsys@pdc-discovery-staging1:~$ sudo service nginx status
● nginx.service - A high performance web server and a reverse proxy server
     Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset:>
     Active: active (running) since Thu 2024-06-27 21:49:17 UTC; 39min ago
       Docs: man:nginx(8)
    Process: 12890 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_proc>
    Process: 12891 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (>
   Main PID: 12914 (nginx)
      Tasks: 22 (limit: 9345)
     Memory: 110.9M
        CPU: 2min 5.247s
     CGroup: /system.slice/nginx.service
             ├─12892 "Passenger watchdog" "" "" "" "" "" "" "" "" "" "" "" "" ">
             ├─12896 "Passenger core" "" "" "" "" "" "" "" "" "" "" "" "" "" "">
             ├─12914 "nginx: master process /usr/sbin/nginx -g daemon on; maste>
             ├─12915 "nginx: worker process" "" "" "" "" "" "" "" "" "" "" "" ">
             └─12916 "nginx: worker process" "" "" "" "" "" "" "" "" "" "" "" ">

Jun 27 21:49:17 pdc-discovery-staging1 systemd[1]: Starting A high performance >
Jun 27 21:49:17 pdc-discovery-staging1 systemd[1]: Started A high performance w>

On the happy server:

pulsys@pdc-discovery-staging2:~$ sudo service nginx status
● nginx.service - A high performance web server and a reverse proxy server
     Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2024-06-25 19:38:15 UTC; 2 days ago
       Docs: man:nginx(8)
    Process: 176585 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
    Process: 176586 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
   Main PID: 176606 (nginx)
      Tasks: 36 (limit: 19007)
     Memory: 494.5M
        CPU: 2h 6min 59.225s
     CGroup: /system.slice/nginx.service
             ├─ 176587 "Passenger watchdog" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ">
             ├─ 176591 "Passenger core" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "">
             ├─ 176606 "nginx: master process /usr/sbin/nginx -g daemon on; master_process on;"
             ├─ 176607 "nginx: worker process" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
             ├─ 176608 "nginx: worker process" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
             ├─1001850 "Passenger RubyApp: /opt/pdc_discovery/current (staging)"
             └─1181044 "Passenger RubyApp: /opt/pdc_discovery/current (staging)"

Jun 25 19:38:15 pdc-discovery-staging2 systemd[1]: Starting A high performance web server and a reverse proxy server...
Jun 25 19:38:15 pdc-discovery-staging2 systemd[1]: Started A high performance web server and a reverse proxy server.

Restarting the nginx service did not help - it came back up fine, but still did not load Passenger.

carolyncole commented 1 week ago

I went ahead an rebooted the machine since passenger was complaining about space. That should clean up /tmp which will give a bit of space back

carolyncole commented 1 week ago

To fully resolve passenger I had to run

gem update strscan
sudo service nginx restart
acozine commented 1 week ago

Thanks @carolyncole! Closing as complete. If anything else goes wrong, we can "fix it with a new one", see #640.