wooju-memdori / stuvel-api

🔥 실시간 화상 스터디 서비스 STUVEL server 🚀
0 stars 1 forks source link

docker 컨테이너 중지 #20

Closed Hee-jin506 closed 3 years ago

Hee-jin506 commented 3 years ago

Current behavior (bug)

api 서버가 올라간 docker 컨테이너가 자꾸 중지된다. 로그를 확인해보았을 때 컨테이너 위에 올라간 서버의 문제는 아닌 것 같다.

2021-08-31T05:29:06.904967597Z accesstoken 인증 실패
2021-08-31T05:29:06.905895502Z GET / 403 0.722 ms - 16
2021-08-31T05:29:20.912732201Z accesstoken 인증 실패
2021-08-31T05:29:20.912759779Z GET / 403 0.797 ms - 16
2021-08-31T05:29:36.933332379Z accesstoken 인증 실패
2021-08-31T05:29:36.934145878Z GET / 403 0.694 ms - 16
2021-08-31T05:29:50.927230996Z accesstoken 인증 실패
2021-08-31T05:29:50.927259437Z GET / 403 0.690 ms - 16
2021-08-31T05:30:06.953259229Z accesstoken 인증 실패
2021-08-31T05:30:06.954132154Z GET / 403 0.705 ms - 16
2021-08-31T05:30:20.957277432Z accesstoken 인증 실패
2021-08-31T05:30:20.957355556Z GET / 403 0.700 ms - 16
2021-08-31T05:30:36.997476130Z accesstoken 인증 실패
2021-08-31T05:30:37.000534688Z GET / 403 13.595 ms - 16

<- 이것이 중단되기 전 마지막 로그

Expected behavior (correct)

직접 중지하지 않는 한, 컨테이너가 스스로 중단되어서는 안된다.

Hee-jin506 commented 3 years ago
root@ip-172-31-14-215:/var/log# last reboot
reboot   system boot  5.8.0-1041-aws   Fri Aug  6 15:39   still running
reboot   system boot  5.8.0-1041-aws   Fri Aug  6 15:27   still running
reboot   system boot  5.4.0-1045-aws   Fri Aug  6 03:46 - 15:25  (11:38)
reboot   system boot  5.4.0-1045-aws   Thu Aug  5 15:38 - 15:25  (23:46)
root@ip-172-31-14-215:/var/log# last -f btmp
root     pts/0                         Mon Aug 16 16:34    gone - no logout
heejin   ssh:notty    110.12.31.215    Fri Aug  6 15:54    gone - no logout
heejin   ssh:notty    110.12.31.215    Fri Aug  6 15:40 - 15:54  (00:14)
heejin   ssh:notty    110.12.31.215    Fri Aug  6 15:29 - 15:40  (00:10)
root     pts/0                         Fri Aug  6 15:28 - 16:34 (10+01:06)
heejin   ssh:notty    110.12.31.215    Fri Aug  6 15:24 - 15:29  (00:04)
root     pts/3                         Fri Aug  6 14:52    gone - no logout
root     pts/3                         Fri Aug  6 14:52 - 14:52  (00:00)
heejin   ssh:notty    110.12.31.215    Fri Aug  6 14:51 - 15:24  (00:32)
heejin   ssh:notty    110.12.31.215    Fri Aug  6 14:51 - 14:51  (00:00)
root     pts/0                         Fri Aug  6 11:54 - 15:28  (03:33)
root     pts/1                         Thu Aug  5 18:20    gone - no logout

부팅 히스토리가 옛날 것인거 보니...흠... 재부팅이 원인은 아닌것같다.

Hee-jin506 commented 3 years ago
Aug 31 05:20:38 ip-172-31-14-215 systemd[1]: user@1000.service: Succeeded.
Aug 31 05:20:38 ip-172-31-14-215 systemd[1]: Stopped User Manager for UID 1000.
Aug 31 05:20:38 ip-172-31-14-215 systemd[1]: Stopping User Runtime Directory /run/user/1000...
Aug 31 05:20:38 ip-172-31-14-215 systemd[1]: run-user-1000.mount: Succeeded.
Aug 31 05:20:38 ip-172-31-14-215 systemd[1]: user-runtime-dir@1000.service: Succeeded.
Aug 31 05:20:38 ip-172-31-14-215 systemd[1]: Stopped User Runtime Directory /run/user/1000.
Aug 31 05:20:38 ip-172-31-14-215 systemd[1]: Removed slice User Slice of UID 1000.
Aug 31 05:30:12 ip-172-31-14-215 snapd[69882]: storehelpers.go:551: cannot refresh: snap has no updates available: "amazon-ssm-agent", "core18", "
lxd", "snapd"
Aug 31 05:30:42 ip-172-31-14-215 systemd[1]: Reloading.
Aug 31 05:30:42 ip-172-31-14-215 systemd[1]: Starting Daily apt download activities...
Aug 31 05:30:42 ip-172-31-14-215 systemd[1]: Starting Message of the Day...
Aug 31 05:30:42 ip-172-31-14-215 systemd[1]: Reloading.
Aug 31 05:30:43 ip-172-31-14-215 systemd[1]: Mounting Mount unit for docker, revision 1125...
Aug 31 05:30:43 ip-172-31-14-215 systemd[1]: Mounted Mount unit for docker, revision 1125.
Aug 31 05:30:43 ip-172-31-14-215 systemd[1]: Stopping Service for snap application docker.dockerd...
Aug 31 05:30:45 ip-172-31-14-215 docker.dockerd[109886]: time="2021-08-31T05:30:45.533251878Z" level=error msg="failed to get event" error="rpc er
ror: code = Unavailable desc = transport is closing" module=libcontainerd namespace=moby
Aug 31 05:30:45 ip-172-31-14-215 docker.dockerd[109886]: time="2021-08-31T05:30:45.595444673Z" level=error msg="failed to get event" error="rpc er
ror: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby

Starting Daily apt download activities, systemd[1]: Stopping Service for snap application docker.dockerd... => Daily apt upgrade를 하는 과정에 docker를 중단

https://hacktiming.tistory.com/46

다음과 같이 daily api upgrade 서비스 중단해주었다

sudo systemctl list-timers

sudo systemctl stop apt-daily-upgrade.timer
sudo systemctl disable apt-daily-upgrade.timer
sudo systemctl daemon-reload

=>

NEXT                        LEFT        LAST                        PASSED       UNIT                         ACTIVATES
Tue 2021-08-31 08:41:54 UTC 29min left  Tue 2021-08-31 00:34:14 UTC 7h ago       fwupd-refresh.timer          fwupd-refresh.service
Tue 2021-08-31 14:28:58 UTC 6h left     Tue 2021-08-31 05:30:42 UTC 2h 41min ago motd-news.timer              motd-news.service
Tue 2021-08-31 16:00:22 UTC 7h left     Mon 2021-08-30 16:00:22 UTC 16h ago      systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
Tue 2021-08-31 18:52:43 UTC 10h left    Tue 2021-08-31 06:01:22 UTC 2h 10min ago apt-daily.timer              apt-daily.service
Wed 2021-09-01 00:00:00 UTC 15h left    Tue 2021-08-31 00:00:09 UTC 8h ago       logrotate.timer              logrotate.service
Wed 2021-09-01 00:00:00 UTC 15h left    Tue 2021-08-31 00:00:09 UTC 8h ago       man-db.timer                 man-db.service
Sun 2021-09-05 03:10:42 UTC 4 days left Sun 2021-08-29 03:11:19 UTC 2 days ago   e2scrub_all.timer            e2scrub_all.service
Mon 2021-09-06 00:00:00 UTC 5 days left Mon 2021-08-30 00:00:09 UTC 1 day 8h ago fstrim.timer                 fstrim.service
Hee-jin506 commented 3 years ago

또 멈췄다.

해당 도커 컨테이너 로그

2021-08-31T11:50:37.835614171Z GET / 403 0.687 ms - 16
2021-08-31T11:50:55.155274411Z accesstoken 인증 실패
2021-08-31T11:50:55.155305591Z GET / 403 0.696 ms - 16
2021-08-31T11:51:07.863293164Z accesstoken 인증 실패
2021-08-31T11:51:07.888272460Z GET / 403 0.772 ms - 16 <- 마지막 로그

해당 시점 syslog

Aug 31 11:30:12 ip-172-31-14-215 snapd[69882]: storehelpers.go:551: cannot refresh: snap has no updates available: "amazon-ssm-agent", "core18", "lxd", "snapd"
Aug 31 11:51:12 ip-172-31-14-215 systemd[1]: Reloading.
Aug 31 11:51:13 ip-172-31-14-215 systemd[1]: Reloading.
Aug 31 11:51:13 ip-172-31-14-215 systemd[1]: Mounting Mount unit for docker, revision 1125...
Aug 31 11:51:13 ip-172-31-14-215 systemd[1]: Mounted Mount unit for docker, revision 1125.
Aug 31 11:51:13 ip-172-31-14-215 systemd[1]: Stopping Service for snap application docker.dockerd...
Aug 31 11:51:15 ip-172-31-14-215 docker.dockerd[112320]: time="2021-08-31T11:51:15.065990719Z" level=error msg="failed to get event" error="rpc error: code = Unavailable d
esc = transport is closing" module=libcontainerd namespace=plugins.moby
Aug 31 11:51:15 ip-172-31-14-215 docker.dockerd[112320]: time="2021-08-31T11:51:15.076688976Z" level=error msg="failed to get event" error="rpc error: code = Unavailable d
esc = transport is closing" module=libcontainerd namespace=moby
Aug 31 11:51:25 ip-172-31-14-215 docker.dockerd[114158]: time="2021-08-31T11:51:25.864031315Z" level=error msg="connecting to shim" error="dial unix \x00/containerd-shim/0
7efef3c242f7e995176a5acf2c87afb4819ed64f2123514ae897050be07c173.sock: connect: connection refused" id=025674a06923a318e811db70e12e7106513652508e0911104499159f7cf0c17f name
space=moby
hayeon17kim commented 3 years ago

또 멈췄다.

9월 4일 syslog

Sep  4 01:55:11 ip-172-31-14-215 snapd[69882]: storehelpers.go:551: cannot refre
sh: snap has no updates available: "amazon-ssm-agent", "core18", "lxd", "snapd"
Sep  4 01:55:41 ip-172-31-14-215 systemd[1]: Reloading.
Sep  4 01:55:42 ip-172-31-14-215 systemd[1]: Reloading.
Sep  4 01:55:42 ip-172-31-14-215 systemd[1]: Mounting Mount unit for docker, rev
ision 1125...
Sep  4 01:55:42 ip-172-31-14-215 systemd[1]: Mounted Mount unit for docker, revi
sion 1125.
Sep  4 01:55:42 ip-172-31-14-215 systemd[1]: Stopping Service for snap applicati
on docker.dockerd...
Sep  4 11:00:40 ip-172-31-14-215 systemd[1]: Reloading.
Sep  4 11:00:41 ip-172-31-14-215 systemd[1]: Reloading.
Sep  4 11:00:41 ip-172-31-14-215 systemd[1]: Mounting Mount unit for docker, rev
ision 1125...
Sep  4 11:00:41 ip-172-31-14-215 systemd[1]: Mounted Mount unit for docker, revi
sion 1125.
Sep  4 11:00:41 ip-172-31-14-215 systemd[1]: Stopping Service for snap applicati
on docker.dockerd...
hayeon17kim commented 3 years ago

https://okky.kr/article/1056535

@Hee-jin506 님께서 오키에 글을 올려주셨다. 우선 hadyso님의 답변을 보는데, 컨테이너가 하나밖에 없어 도커 프로세스 자체가 죽는 건지, 특정 도커 컨테이너가 죽는 건지 판단하기가 어려워 우선 테스트용으로 도커 컨테이너를 하나 더 돌려보았다. 둘 중 어떤 케이스인지 확인 후 처리를 하고자 한다.

시스템 메모리는 50% 전후로 왔다갔다 하고 있어서 메모리 문제는 아닌 것 같다.

Mambo님께서 말씀해주신 것처럼 공식 문서에서 권장하는 대로 다시 설치를 해보았다.

상황을 지켜보고 또 서버가 내려가는지, 테스트용 컨테이너도 같이 내려가는지 확인을 해보고자 한다.

hayeon17kim commented 3 years ago

스냅으로 설치한 도커를 삭제하고 공식 문서 가이드대로 설치한 결과 문제가 해결되었다. https://docs.docker.com/engine/install/ubuntu/