mit-jp / mit-climate-data-viz

Plotting climate data for the MIT Joint Program on the Science and Policy of Global Change
https://cypressf.shinyapps.io/eppa-dashboard/
0 stars 0 forks source link

svante4 container state is broken #285

Closed cypressf closed 1 year ago

cypressf commented 2 years ago

it looks like the backend container process is stopped, but the process is still alive

podman ps -a
CONTAINER ID  IMAGE                                      COMMAND     CREATED      STATUS                     PORTS                                           NAMES
ca533d149e3d  localhost/podman-pause:4.0.2-1652749236                3 weeks ago  Up 3 weeks ago             0.0.0.0:8000->8000/tcp, 0.0.0.0:8002->4000/tcp  17d243513267-infra
66985c7dfb56  docker.io/library/postgres:latest          postgres    3 weeks ago  Up 3 weeks ago             0.0.0.0:8000->8000/tcp, 0.0.0.0:8002->4000/tcp  crm_db
e535e2296017  localhost/climate_risk_map_backend:latest  bash        3 weeks ago  Exited (1) 11 minutes ago  0.0.0.0:8000->8000/tcp, 0.0.0.0:8002->4000/tcp  crm_backend
curl 0.0.0.0:8000/state
[{"id":1,"name":"Alabama"},{"id":2,"name":"Alaska"},{"id":4,"name":"Arizona"},{"id":5,"name":"Arkansas"},{"id":6,"name":"California"},{"id":8,"name":"Colorado"},{"id":9,"name":"Connecticut"},{"id":10,"name":"Delaware"},{"id":11,"name":"D.C."},{"id":12,"name":"Florida"},{"id":13,"name":"Georgia"},{"id":15,"name":"Hawaii"},{"id":16,"name":"Idaho"},{"id":17,"name":"Illinois"},{"id":18,"name":"Indiana"},{"id":19,"name":"Iowa"},{"id":20,"name":"Kansas"},{"id":21,"name":"Kentucky"},{"id":22,"name":"Louisiana"},{"id":23,"name":"Maine"},{"id":24,"name":"Maryland"},{"id":25,"name":"Massachusetts"},{"id":26,"name":"Michigan"},{"id":27,"name":"Minnesota"},{"id":28,"name":"Mississippi"},{"id":29,"name":"Missouri"},{"id":30,"name":"Montana"},{"id":31,"name":"Nebraska"},{"id":32,"name":"Nevada"},{"id":33,"name":"New Hampshire"},{"id":34,"name":"New Jersey"},{"id":35,"name":"New Mexico"},{"id":36,"name":"New York"},{"id":37,"name":"North Carolina"},{"id":38,"name":"North Dakota"},{"id":39,"name":"Ohio"},{"id":40,"name":"Oklahoma"},{"id":41,"name":"Oregon"},{"id":42,"name":"Pennsylvania"},{"id":44,"name":"Rhode Island"},{"id":45,"name":"South Carolina"},{"id":46,"name":"South Dakota"},{"id":47,"name":"Tennessee"},{"id":48,"name":"Texas"},{"id":49,"name":"Utah"},{"id":50,"name":"Vermont"},{"id":51,"name":"Virginia"},{"id":53,"name":"Washington"},{"id":54,"name":"West Virginia"},{"id":55,"name":"Wisconsin"},{"id":56,"name":"Wyoming"},{"id":60,"name":"American Samoa"},{"id":66,"name":"Guam"},{"id":69,"name":"Northern Mariana Islands"},{"id":72,"name":"Puerto Rico"},{"id":74,"name":"U.S. Minor Outlying Islands"},{"id":78,"name":"U.S. Virgin Islands"}]

Despite the crm_backend container being in state 'exited' it still is responding to http requests, and the website continues to work. This is strange.

cypressf commented 2 years ago

Looking at the logs of the 'exited' crm_backend pod, they continue to be written to. There's also an error that looks like happened when the latest deploy occurred, but the process is still running.

podman logs crm_backend
[2022-07-15T17:20:25Z INFO  actix_web::middleware::logger] 10.0.2.100 "GET /data/65?source=11&start_date=2020-01-01&end_date=2020-12-31 HTTP/1.1" 200 35054 "https://est.mit.edu/5" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 0.024228
Error: Os { code: 98, kind: AddrInUse, message: "Address already in use" }
[2022-07-15T17:33:26Z INFO  actix_web::middleware::logger] 10.0.2.100 "GET /map-visualization?include_drafts=false HTTP/1.1" 200 155400 "https://est.mit.edu/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.5 Safari/605.1.15" 0.765390
cypressf commented 1 year ago

Now i'm seeing a different error:

https://github.com/cypressf/climate-risk-map/actions/runs/3481126712/jobs/5822039110#step:7:28

======CMD======
[23](https://github.com/cypressf/climate-risk-map/actions/runs/3481126712/jobs/5822039110#step:7:24)
podman stop crm_backend && ln -snf ~/builds/15245eb592c99b7b3274aec30d551dd22a3d4f3c ~/climate-risk-map && podman run --rm -v ~/climate-risk-map/backend:/opt/climate-risk-map/backend:Z --tz=America/New_York --env-file=$HOME/.env --pod=crm_pod sqlx migrate run && podman start crm_backend
[24](https://github.com/cypressf/climate-risk-map/actions/runs/3481126712/jobs/5822039110#step:7:25)

[26](https://github.com/cypressf/climate-risk-map/actions/runs/3481126712/jobs/5822039110#step:7:27)
======END======
[27](https://github.com/cypressf/climate-risk-map/actions/runs/3481126712/jobs/5822039110#step:7:28)
err: time="2022-11-16T11:32:09-05:00" level=error msg="container \"e535e22960178fbdd57d1d2e934397ef2aa6b977ebe86af8d475e375eb997204\" does not exist"
[28](https://github.com/cypressf/climate-risk-map/actions/runs/3481126712/jobs/5822039110#step:7:29)
err: Error: timed out waiting for file /tmp/podman-run-1004/libpod/tmp/exits/e535e22960178fbdd57d1d2e934397ef2aa6b977ebe86af8d475e375eb997204: internal libpod error
[29](https://github.com/cypressf/climate-risk-map/actions/runs/3481126712/jobs/5822039110#step:7:30)
2022/11/16 16:32:14 Process exited with status 125
[30](https://github.com/cypressf/climate-risk-map/actions/runs/3481126712/jobs/5822039110#step:7:31)
cypressf commented 1 year ago

@mjbludwig could you take a look at svante4? The state after the error above is

podman pod ps
POD ID        NAME        STATUS      CREATED       INFRA ID      # OF CONTAINERS
17d243513267  crm_pod     Degraded    4 months ago  ca533d149e3d  3
podman ps -a
CONTAINER ID  IMAGE                                      COMMAND     CREATED       STATUS                     PORTS                                           NAMES
ca533d149e3d  localhost/podman-pause:4.0.2-1652749236                4 months ago  Up 2 hours ago             0.0.0.0:8000->8000/tcp, 0.0.0.0:8002->4000/tcp  17d243513267-infra
66985c7dfb56  docker.io/library/postgres:latest          postgres    4 months ago  Up 2 hours ago             0.0.0.0:8000->8000/tcp, 0.0.0.0:8002->4000/tcp  crm_db
e535e2296017  localhost/climate_risk_map_backend:latest  bash        4 months ago  Exited (-1) 6 minutes ago  0.0.0.0:8000->8000/tcp, 0.0.0.0:8002->4000/tcp  crm_backend

but again, even though the crm_backend container is in state "exited", the backend service continues to run.

cypressf commented 1 year ago

Trying deploying again today. it's still not deploying the service correctly. It looks like the new frontend got copied over but the backend is still running the old version.

cypressf commented 1 year ago

It looks like the error output was either empty, or got parsed incorrectly on github: https://github.com/cypressf/climate-risk-map/actions/runs/3640778471/jobs/6146126525#step:7:26

cypressf commented 1 year ago

@mjbludwig, when we talked on the phone you mentioned there might be a configuration you need to change to ensure the container can run long-term. Did you investigate that today?