Closed Hee-jin506 closed 3 years ago
root@ip-172-31-14-215:/var/log# last reboot
reboot system boot 5.8.0-1041-aws Fri Aug 6 15:39 still running
reboot system boot 5.8.0-1041-aws Fri Aug 6 15:27 still running
reboot system boot 5.4.0-1045-aws Fri Aug 6 03:46 - 15:25 (11:38)
reboot system boot 5.4.0-1045-aws Thu Aug 5 15:38 - 15:25 (23:46)
root@ip-172-31-14-215:/var/log# last -f btmp
root pts/0 Mon Aug 16 16:34 gone - no logout
heejin ssh:notty 110.12.31.215 Fri Aug 6 15:54 gone - no logout
heejin ssh:notty 110.12.31.215 Fri Aug 6 15:40 - 15:54 (00:14)
heejin ssh:notty 110.12.31.215 Fri Aug 6 15:29 - 15:40 (00:10)
root pts/0 Fri Aug 6 15:28 - 16:34 (10+01:06)
heejin ssh:notty 110.12.31.215 Fri Aug 6 15:24 - 15:29 (00:04)
root pts/3 Fri Aug 6 14:52 gone - no logout
root pts/3 Fri Aug 6 14:52 - 14:52 (00:00)
heejin ssh:notty 110.12.31.215 Fri Aug 6 14:51 - 15:24 (00:32)
heejin ssh:notty 110.12.31.215 Fri Aug 6 14:51 - 14:51 (00:00)
root pts/0 Fri Aug 6 11:54 - 15:28 (03:33)
root pts/1 Thu Aug 5 18:20 gone - no logout
부팅 히스토리가 옛날 것인거 보니...흠... 재부팅이 원인은 아닌것같다.
Aug 31 05:20:38 ip-172-31-14-215 systemd[1]: user@1000.service: Succeeded.
Aug 31 05:20:38 ip-172-31-14-215 systemd[1]: Stopped User Manager for UID 1000.
Aug 31 05:20:38 ip-172-31-14-215 systemd[1]: Stopping User Runtime Directory /run/user/1000...
Aug 31 05:20:38 ip-172-31-14-215 systemd[1]: run-user-1000.mount: Succeeded.
Aug 31 05:20:38 ip-172-31-14-215 systemd[1]: user-runtime-dir@1000.service: Succeeded.
Aug 31 05:20:38 ip-172-31-14-215 systemd[1]: Stopped User Runtime Directory /run/user/1000.
Aug 31 05:20:38 ip-172-31-14-215 systemd[1]: Removed slice User Slice of UID 1000.
Aug 31 05:30:12 ip-172-31-14-215 snapd[69882]: storehelpers.go:551: cannot refresh: snap has no updates available: "amazon-ssm-agent", "core18", "
lxd", "snapd"
Aug 31 05:30:42 ip-172-31-14-215 systemd[1]: Reloading.
Aug 31 05:30:42 ip-172-31-14-215 systemd[1]: Starting Daily apt download activities...
Aug 31 05:30:42 ip-172-31-14-215 systemd[1]: Starting Message of the Day...
Aug 31 05:30:42 ip-172-31-14-215 systemd[1]: Reloading.
Aug 31 05:30:43 ip-172-31-14-215 systemd[1]: Mounting Mount unit for docker, revision 1125...
Aug 31 05:30:43 ip-172-31-14-215 systemd[1]: Mounted Mount unit for docker, revision 1125.
Aug 31 05:30:43 ip-172-31-14-215 systemd[1]: Stopping Service for snap application docker.dockerd...
Aug 31 05:30:45 ip-172-31-14-215 docker.dockerd[109886]: time="2021-08-31T05:30:45.533251878Z" level=error msg="failed to get event" error="rpc er
ror: code = Unavailable desc = transport is closing" module=libcontainerd namespace=moby
Aug 31 05:30:45 ip-172-31-14-215 docker.dockerd[109886]: time="2021-08-31T05:30:45.595444673Z" level=error msg="failed to get event" error="rpc er
ror: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Starting Daily apt download activities
, systemd[1]: Stopping Service for snap application docker.dockerd...
=> Daily apt upgrade를 하는 과정에 docker를 중단
다음과 같이 daily api upgrade 서비스 중단해주었다
sudo systemctl list-timers
sudo systemctl stop apt-daily-upgrade.timer
sudo systemctl disable apt-daily-upgrade.timer
sudo systemctl daemon-reload
=>
NEXT LEFT LAST PASSED UNIT ACTIVATES
Tue 2021-08-31 08:41:54 UTC 29min left Tue 2021-08-31 00:34:14 UTC 7h ago fwupd-refresh.timer fwupd-refresh.service
Tue 2021-08-31 14:28:58 UTC 6h left Tue 2021-08-31 05:30:42 UTC 2h 41min ago motd-news.timer motd-news.service
Tue 2021-08-31 16:00:22 UTC 7h left Mon 2021-08-30 16:00:22 UTC 16h ago systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
Tue 2021-08-31 18:52:43 UTC 10h left Tue 2021-08-31 06:01:22 UTC 2h 10min ago apt-daily.timer apt-daily.service
Wed 2021-09-01 00:00:00 UTC 15h left Tue 2021-08-31 00:00:09 UTC 8h ago logrotate.timer logrotate.service
Wed 2021-09-01 00:00:00 UTC 15h left Tue 2021-08-31 00:00:09 UTC 8h ago man-db.timer man-db.service
Sun 2021-09-05 03:10:42 UTC 4 days left Sun 2021-08-29 03:11:19 UTC 2 days ago e2scrub_all.timer e2scrub_all.service
Mon 2021-09-06 00:00:00 UTC 5 days left Mon 2021-08-30 00:00:09 UTC 1 day 8h ago fstrim.timer fstrim.service
또 멈췄다.
해당 도커 컨테이너 로그
2021-08-31T11:50:37.835614171Z GET / 403 0.687 ms - 16
2021-08-31T11:50:55.155274411Z accesstoken 인증 실패
2021-08-31T11:50:55.155305591Z GET / 403 0.696 ms - 16
2021-08-31T11:51:07.863293164Z accesstoken 인증 실패
2021-08-31T11:51:07.888272460Z GET / 403 0.772 ms - 16 <- 마지막 로그
해당 시점 syslog
Aug 31 11:30:12 ip-172-31-14-215 snapd[69882]: storehelpers.go:551: cannot refresh: snap has no updates available: "amazon-ssm-agent", "core18", "lxd", "snapd"
Aug 31 11:51:12 ip-172-31-14-215 systemd[1]: Reloading.
Aug 31 11:51:13 ip-172-31-14-215 systemd[1]: Reloading.
Aug 31 11:51:13 ip-172-31-14-215 systemd[1]: Mounting Mount unit for docker, revision 1125...
Aug 31 11:51:13 ip-172-31-14-215 systemd[1]: Mounted Mount unit for docker, revision 1125.
Aug 31 11:51:13 ip-172-31-14-215 systemd[1]: Stopping Service for snap application docker.dockerd...
Aug 31 11:51:15 ip-172-31-14-215 docker.dockerd[112320]: time="2021-08-31T11:51:15.065990719Z" level=error msg="failed to get event" error="rpc error: code = Unavailable d
esc = transport is closing" module=libcontainerd namespace=plugins.moby
Aug 31 11:51:15 ip-172-31-14-215 docker.dockerd[112320]: time="2021-08-31T11:51:15.076688976Z" level=error msg="failed to get event" error="rpc error: code = Unavailable d
esc = transport is closing" module=libcontainerd namespace=moby
Aug 31 11:51:25 ip-172-31-14-215 docker.dockerd[114158]: time="2021-08-31T11:51:25.864031315Z" level=error msg="connecting to shim" error="dial unix \x00/containerd-shim/0
7efef3c242f7e995176a5acf2c87afb4819ed64f2123514ae897050be07c173.sock: connect: connection refused" id=025674a06923a318e811db70e12e7106513652508e0911104499159f7cf0c17f name
space=moby
또 멈췄다.
9월 4일 syslog
Sep 4 01:55:11 ip-172-31-14-215 snapd[69882]: storehelpers.go:551: cannot refre
sh: snap has no updates available: "amazon-ssm-agent", "core18", "lxd", "snapd"
Sep 4 01:55:41 ip-172-31-14-215 systemd[1]: Reloading.
Sep 4 01:55:42 ip-172-31-14-215 systemd[1]: Reloading.
Sep 4 01:55:42 ip-172-31-14-215 systemd[1]: Mounting Mount unit for docker, rev
ision 1125...
Sep 4 01:55:42 ip-172-31-14-215 systemd[1]: Mounted Mount unit for docker, revi
sion 1125.
Sep 4 01:55:42 ip-172-31-14-215 systemd[1]: Stopping Service for snap applicati
on docker.dockerd...
Sep 4 11:00:40 ip-172-31-14-215 systemd[1]: Reloading.
Sep 4 11:00:41 ip-172-31-14-215 systemd[1]: Reloading.
Sep 4 11:00:41 ip-172-31-14-215 systemd[1]: Mounting Mount unit for docker, rev
ision 1125...
Sep 4 11:00:41 ip-172-31-14-215 systemd[1]: Mounted Mount unit for docker, revi
sion 1125.
Sep 4 11:00:41 ip-172-31-14-215 systemd[1]: Stopping Service for snap applicati
on docker.dockerd...
https://okky.kr/article/1056535
@Hee-jin506 님께서 오키에 글을 올려주셨다. 우선 hadyso님의 답변을 보는데, 컨테이너가 하나밖에 없어 도커 프로세스 자체가 죽는 건지, 특정 도커 컨테이너가 죽는 건지 판단하기가 어려워 우선 테스트용으로 도커 컨테이너를 하나 더 돌려보았다. 둘 중 어떤 케이스인지 확인 후 처리를 하고자 한다.
시스템 메모리는 50% 전후로 왔다갔다 하고 있어서 메모리 문제는 아닌 것 같다.
Mambo님께서 말씀해주신 것처럼 공식 문서에서 권장하는 대로 다시 설치를 해보았다.
상황을 지켜보고 또 서버가 내려가는지, 테스트용 컨테이너도 같이 내려가는지 확인을 해보고자 한다.
스냅으로 설치한 도커를 삭제하고 공식 문서 가이드대로 설치한 결과 문제가 해결되었다. https://docs.docker.com/engine/install/ubuntu/
Current behavior (bug)
api 서버가 올라간 docker 컨테이너가 자꾸 중지된다. 로그를 확인해보았을 때 컨테이너 위에 올라간 서버의 문제는 아닌 것 같다.
<- 이것이 중단되기 전 마지막 로그
Expected behavior (correct)
직접 중지하지 않는 한, 컨테이너가 스스로 중단되어서는 안된다.