Open Halama opened 7 years ago
ta instance taky jela 5 týdnů což je více než normálně
cc @ujovlado neposila to info o velikosti docker pool uz ten caadvisor?
Jo, ty chownovací kontejnery se nemažou, chyba! To fixnu hned: b8cbd2d302268119c763a7a845449ab0d66ec993 Vrátit job do fronty se mi moc nechce, protože se to může stát třeba až na konci jejího běhu, uvnitř jsou různě nastavený retrye (což je teda dost nešikovný, ale šetří nám to čas oproti restartu jobu).
Ty visící builder image - tenhle je třeba od úspěšně dokončenýho jobu - https://papertrailapp.com/groups/23635/events?q=586e37379a282
Nj, image od builderu se nemazou, protoze se jakoze cachujou, napadaji me asi 3 veci:
jo neco takovyho vypada dobre, takze asi teda udeleame nejaej ten garbage collect
dnes se to stalo opet
krome toho mazání by určitě bylo dobrý vyřešit i odesílání té velikosti do cloudwatch aby to mohlo řvát nebo to na to rovnou mohlo automaticky reagovat např. vypnutím instance.
taky vidím že tam zůstávají kontenjery:
5a58abc43189 d5a7307f1659 "/bin/sh -c '(for i i" 5 weeks ago Exited (1) 5 weeks ago sick_colden
f0fb9514a8be alpine "sh -c 'chown 501 /da" 5 weeks ago Dead high_mahavira
e16247a765dc alpine "sh -c 'chown 501 /da" 5 weeks ago Dead hungry_noyce
8624c6ba50d8 docker:1.11-dind "dockerd-entrypoint.s" 5 weeks ago Dead angry_ptolemy
194f31016192 alpine "sh -c 'chown 501 /da" 5 weeks ago Dead pensive_ramanujan
2778d6aa78aa docker:1.11-dind "dockerd-entrypoint.s" 5 weeks ago Dead boring_northcutt
7d74ec0f90cc docker:1.11-dind "dockerd-entrypoint.s" 5 weeks ago Dead jovial_ride
ad03e152960c alpine "sh -c 'chown 501 /da" 5 weeks ago Dead drunk_yonath
e3b2128126ec alpine "sh -c 'chown 501 /da" 5 weeks ago Dead sleepy_austin
d359d66d415e alpine "sh -c 'chown 501 /da" 5 weeks ago Dead condescending_feynman
68d9d1d6b5e1 docker:1.11-dind "dockerd-entrypoint.s" 5 weeks ago Dead admiring_ride
92111d299662 docker:1.11-dind "dockerd-entrypoint.s" 5 weeks ago Dead awesome_mirzakhani
dc6b17ed9c01 alpine "sh -c 'chown 501 /da" 5 weeks ago Dead clever_engelbart
d9427f41e41b alpine "sh -c 'chown 501 /da" 5 weeks ago Dead naughty_cori
5427cc8dc3aa docker:1.11-dind "dockerd-entrypoint.s" 5 weeks ago Dead hungry_pare
16595fef1290 alpine "sh -c 'chown 501 /da" 5 weeks ago Dead condescending_goodall
5a1b208a9a40 docker:1.11-dind "dockerd-entrypoint.s" 5 weeks ago Dead
Po 13 dnech běhu:
docker info
Data Space Used: 250.1 GB
Data Space Total: 510 GB
Data Space Available: 260 GB
[root@kbc-us-east-1-syrup-docker-i-09f6451b148246490 ec2-user]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 16G 136K 16G 1% /dev
tmpfs 16G 0 16G 0% /dev/shm
/dev/xvda1 50G 5.7G 44G 12% /
/dev/md0 500G 17G 484G 4% /tmp
[root@kbc-us-east-1-syrup-docker-i-09f6451b148246490 docker]# du -h --max-depth=1 .
356M ./image
136K ./network
62M ./devicemapper
52M ./tmp
4.0K ./trust
2.5G ./volumes
3.5M ./containers
3.0G .
Tohle řeší ty volumes docker volume rm $(docker volume ls -qf dangling=true)
[root@kbc-us-east-1-syrup-docker-i-09f6451b148246490 docker]# du -h --max-depth=1 .
356M ./image
136K ./network
62M ./devicemapper
52M ./tmp
4.0K ./trust
208K ./volumes
3.4M ./containers
472M .
Root disky máme v NR tak na to udělám alespon alarm zatím https://infrastructure.newrelic.com/accounts/218779/storage?filters=%7B%22and%22%3A%5B%7B%22is%22%3A%7B%22ec2Tag_KeboolaRole%22%3A%22syrup-worker-docker%22%7D%7D%5D%7D&scope=Docker%20Runner . Velikost docker pool bysme si museli do cloudwatch asi posílat sami.
po spuštění:
docker rm $(docker ps -a -q)
docker rmi $(docker images -q -f dangling=true)
Data Space Used: 193 GB
Data Space Total: 510 GB
Data Space Available: 317 GB
Metadata Space Used: 138.4 MB
Metadata Space Total: 5.365 GB
Nastaven alarm na root disky https://alerts.newrelic.com/accounts/218779/policies/1
Tady je ukázka skriptu který by mohl posílat do cloudwatch info o zaplnění docker poolu https://aws.amazon.com/blogs/compute/optimizing-disk-usage-on-amazon-ecs/
pro jistotu rolluju docker runner servery. ty image jen tak smazat nejde.
Jenom shrnutí:
docker volume rm $(docker volume ls -qf dangling=true)
--rm
případně ještě radši periodicky volat docker rm $(docker ps -a -q)
jo, uz se jdu na ten GC vrhnout - predpokaldam ze by to mel byt api call, autorizace pres storage nebo manage token?
řekl bych spíš nějaký cli command který budeme spouštět cronem?
vsechny kontejnery se pousti s --rm (overil jsem), krome hlavniho image, ktery se odstranuje dodatacene (kvuli inspectu) command je tady https://github.com/keboola/docker-bundle/pull/186
Je to nasazený a jdu spustit ten command s default nastavením
Stav před spuštěním:
[deploy@kbc-us-east-1-syrup-docker-i-05c0f45866b1bcf66 current]$ sudo docker info
Containers: 980
Running: 3
Paused: 0
Stopped: 977
Images: 2867
Server Version: 1.11.2
Storage Driver: devicemapper
Pool Name: docker-thinpool
Pool Blocksize: 524.3 kB
Base Device Size: 10.74 GB
Backing Filesystem: xfs
Data file:
Metadata file:
Data Space Used: 186.8 GB
Data Space Total: 510 GB
Data Space Available: 323.2 GB
Metadata Space Used: 120.5 MB
Metadata Space Total: 5.365 GB
Metadata Space Available: 5.244 GB
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Library Version: 1.02.93-RHEL7 (2015-01-28)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge null host
Kernel Version: 4.4.23-31.54.amzn1.x86_64
Operating System: Amazon Linux AMI 2016.09
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.42 GiB
Name: kbc-us-east-1-syrup-docker-i-05c0f45866b1bcf66
ID: 3XGE:OIZV:2F4Y:WP7J:TR4H:ECNU:ZHZO:HRG6:EFJ6:PT7A:6X57:GFLQ
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
[deploy@kbc-us-east-1-syrup-docker-i-05c0f45866b1bcf66 current]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 16G 136K 16G 1% /dev
tmpfs 16G 0 16G 0% /dev/shm
/dev/xvda1 50G 4.2G 45G 9% /
/dev/md0 500G 2.3G 498G 1% /tmp
Po spuštění:
[deploy@kbc-us-east-1-syrup-docker-i-05c0f45866b1bcf66 current]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 16G 116K 16G 1% /dev
tmpfs 16G 0 16G 0% /dev/shm
/dev/xvda1 50G 4.2G 45G 9% /
/dev/md0 500G 2.3G 498G 1% /tmp
[deploy@kbc-us-east-1-syrup-docker-i-05c0f45866b1bcf66 current]$ docker info
Cannot connect to the Docker daemon. Is the docker daemon running on this host?
[deploy@kbc-us-east-1-syrup-docker-i-05c0f45866b1bcf66 current]$ sudo docker info
Containers: 958
Running: 3
Paused: 0
Stopped: 955
Images: 1460
Server Version: 1.11.2
Storage Driver: devicemapper
Pool Name: docker-thinpool
Pool Blocksize: 524.3 kB
Base Device Size: 10.74 GB
Backing Filesystem: xfs
Data file:
Metadata file:
Data Space Used: 172 GB
Data Space Total: 510 GB
Data Space Available: 338.1 GB
Metadata Space Used: 99.46 MB
Metadata Space Total: 5.365 GB
Metadata Space Available: 5.265 GB
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Library Version: 1.02.93-RHEL7 (2015-01-28)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge null host
Kernel Version: 4.4.23-31.54.amzn1.x86_64
Operating System: Amazon Linux AMI 2016.09
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.42 GiB
Name: kbc-us-east-1-syrup-docker-i-05c0f45866b1bcf66
ID: 3XGE:OIZV:2F4Y:WP7J:TR4H:ECNU:ZHZO:HRG6:EFJ6:PT7A:6X57:GFLQ
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
U tech kontejnerů bude potřeba mazat i status dead
:
"State": {
"Status": "dead",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": true,
"Pid": 0,
"ExitCode": 0,
"Error": "",
"StartedAt": "2017-08-16T11:02:41.091400229Z",
"FinishedAt": "2017-08-16T11:02:41.688876693Z"
},
to sem udelal tu blbost ze jsem si ten log chtel otevrit v prohlizeci :)
prijde me teda dost divny, ze je tam 1000 unfinished containreu
to jsem predtim posilal, jsou vsechny dead
jsou to ty docker login
aj, dead, tak to by se tam melo pridat, je tam jen status exited
jj, u tech kontejneru to staci pridat a melo by to byt ok. Ale u tech image si teda nejsem jisty.
A u mazani dangling volumes chybi jedno sudo
:
sudo docker volume rm $(docker volume ls --quiet --filter='dangling=true')
stačí změnit za sudo docker volume rm $(sudo docker volume ls --quiet --filter='dangling=true')
a už to jede.
zrovna koukam jeste me neni jasny proc failnulo tohle Clearing dangling failed The command "sudo docker volume rm $(docker volume ls --quiet --filter='dangling=true')" failed.
Takže nejasný ja asi už jenom mazání images. Tyhle hlášky chápu tak že to vylistovalo na začatku a ty image byly smazány dříve než to k nim doiterovalo jako závislé?
Error: No such image or container: b2cc31613260
Error occurred when processing image 749f476a2f6d: The command "sudo docker inspect 749f476a2f6d" failed.
Exit Code: 1(General error)
ale jak na to koukám tak ty builder image to asi promazalo, nejstarší má 18 hod.
zkouším to na kbc-us-east-1-syrup-docker-i-05c0f45866b1bcf66
melo by to tak byt, tak smazlo jich to polovinu
Pak tam je teda ještě tohle conflict: unable to delete 2cff4b2ac0ce (cannot be forced) - image has dependent child images
jako zda se me, ze tam je tech inspect erroru zbytecne hodne, ale to takhle tezko rict
ten dependent je mozne a je to v poradku, muze byt starej image na kterym je neco buildnutyho, co jeste nejde smazat
Tohle je ono:
[deploy@kbc-us-east-1-syrup-docker-i-05c0f45866b1bcf66 current]$ sudo docker images --all --filter=label=com.keboola.docker.runner.origin=builder | grep 2cff4b2ac0ce
<none> <none> 2cff4b2ac0ce 29 hours ago 1.119 GB
[deploy@kbc-us-east-1-syrup-docker-i-05c0f45866b1bcf66 current]$ sudo docker rmi --force 2cff4b2ac0ce
Error response from daemon: conflict: unable to delete 2cff4b2ac0ce (cannot be forced) - image has dependent child images
a ma teda nejaky child images? premyslim, proc mi uteklo to sudo
zkoumam jak to zjistit. tohle mi zatim nevali https://gist.github.com/Siva-Charan/db7bd84ad2ca2b0779d87a75e6bb4176
Asi starej docker:
sudo docker images --filter since=2cff4b2ac0ce -q
Error response from daemon: Invalid filter 'since'
lokalne mi to funguje.
Co kdybych zkusil připravit ten nový docker https://github.com/keboola/docker-bundle/issues/125
Ted už nabízí 17.03.1ce
nj, to by bylo fajn, ale tak asi tam jsou child images kdyz to docker tvrdi no
spravim ten dangling a dead
jo proste tam asi budou. ten danglig a dead super. Pak by to jeste nejak mohlo osetrit ty No such image or container: c9acee87a950
pri mazani image. Mozna je nelogovat?
Ja jinak zkusim pripravit ten novy docker kdyz uz se v tom vrtame, stejne jsem to chtel udelat.
Je tam spoustu bordelu:
docker inspect 456dafcf6271 [ { "Id": "456dafcf62718224da9434d66cc0cc2516e2b138d84f443547ae4cc2f71da466", "Created": "2017-01-09T06:30:21.607698769Z", "Path": "sh", "Args": [ "-c", "chown 501 /data -R" ], "State": { "Status": "exited", "Running": false, "Paused": false, "Restarting": false, "OOMKilled": false, "Dead": false, "Pid": 0, "ExitCode": 0, "Error": "", "StartedAt": "2017-01-09T06:30:22.364690768Z", "FinishedAt": "2017-01-09T06:30:22.40937923Z" }, "Image": "sha256:baa5d63471ead618ff91ddfacf1e2c81bf0612bfeb1daf00eb0843a41fbfade3", "ResolvConfPath": "/var/lib/docker/containers/456dafcf62718224da9434d66cc0cc2516e2b138d84f443547ae4cc2f71da466/resolv.conf", "HostnamePath": "/var/lib/docker/containers/456dafcf62718224da9434d66cc0cc2516e2b138d84f443547ae4cc2f71da466/hostname", "HostsPath": "/var/lib/docker/containers/456dafcf62718224da9434d66cc0cc2516e2b138d84f443547ae4cc2f71da466/hosts", "LogPath": "/var/lib/docker/containers/456dafcf62718224da9434d66cc0cc2516e2b138d84f443547ae4cc2f71da466/456dafcf62718224da9434d66cc0cc2516e2b138d84f443547ae4cc2f71da466-json.log", "Name": "/pensive_liskov0", "RestartCount": 0, "Driver": "devicemapper", "MountLabel": "", "ProcessLabel": "", "AppArmorProfile": "", "ExecIDs": null, "HostConfig": { "Binds": [ "/tmp/docker/run-5873298d2ee976.88917256/data:/data" ], "ContainerIDFile": "", "LogConfig": { "Type": "json-file", "Config": null }, "NetworkMode": "default", "PortBindings": {}, "RestartPolicy": { "Name": "no", "MaximumRetryCount": 0 }, "AutoRemove": false, "VolumeDriver": "", "VolumesFrom": null, "CapAdd": null, "CapDrop": null, "Dns": [], "DnsOptions": [], "DnsSearch": [], "ExtraHosts": null, "GroupAdd": null, "IpcMode": "", "Cgroup": "", "Links": null, "OomScoreAdj": 0, "PidMode": "", "Privileged": false, "PublishAllPorts": false, "ReadonlyRootfs": false, "SecurityOpt": null, "StorageOpt": null, "UTSMode": "", "UsernsMode": "", "ShmSize": 67108864, "ConsoleSize": [ 0, 0 ], "Isolation": "", "CpuShares": 0, "Memory": 0, "CgroupParent": "", "BlkioWeight": 0, "BlkioWeightDevice": null, "BlkioDeviceReadBps": null, "BlkioDeviceWriteBps": null, "BlkioDeviceReadIOps": null, "BlkioDeviceWriteIOps": null, "CpuPeriod": 0, "CpuQuota": 0, "CpusetCpus": "", "CpusetMems": "", "Devices": [], "DiskQuota": 0, "KernelMemory": 0, "MemoryReservation": 0, "MemorySwap": 0, "MemorySwappiness": -1, "OomKillDisable": false, "PidsLimit": 0, "Ulimits": null, "CpuCount": 0, "CpuPercent": 0, "BlkioIOps": 0, "BlkioBps": 0, "SandboxSize": 0 }, "GraphDriver": { "Name": "devicemapper", "Data": { "DeviceId": "131296", "DeviceName": "docker-202:1-661201-ed1f2db9621d65e9bf4124ea0b9ce259b1551647ca1eb6191bd62261ca517229", "DeviceSize": "10737418240" } }, "Mounts": [ { "Source": "/tmp/docker/run-5873298d2ee976.88917256/data", "Destination": "/data", "Mode": "", "RW": true, "Propagation": "rprivate" } ], "Config": { "Hostname": "456dafcf6271", "Domainname": "", "User": "", "AttachStdin": false, "AttachStdout": true, "AttachStderr": true, "Tty": false, "OpenStdin": false, "StdinOnce": false, "Env": [ "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" ], "Cmd": [ "sh", "-c", "chown 501 /data -R" ], "Image": "alpine", "Volumes": null, "WorkingDir": "", "Entrypoint": null, "OnBuild": null, "Labels": {} }, "NetworkSettings": { "Bridge": "", "SandboxID": "d8e6306871c672c6e63614a71cb54a530367180de6232a5b82455b8ad0e508e1", "HairpinMode": false, "LinkLocalIPv6Address": "", "LinkLocalIPv6PrefixLen": 0, "Ports": null, "SandboxKey": "/var/run/docker/netns/d8e6306871c6", "SecondaryIPAddresses": null, "SecondaryIPv6Addresses": null, "EndpointID": "", "Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "IPAddress": "", "IPPrefixLen": 0, "IPv6Gateway": "", "MacAddress": "", "Networks": { "bridge": { "IPAMConfig": null, "Links": null, "Aliases": null, "NetworkID": "aa8c54910e6b684bbef92052ffdb96c525b533079c44187572473a5c689b139e", "EndpointID": "", "Gateway": "", "IPAddress": "", "IPPrefixLen": 0, "IPv6Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "MacAddress": "" } } } } ]
Fix: