saltstack-formulas / docker-formula

Install and set up Docker
http://docs.saltstack.com/en/latest/topics/development/conventions/formulas.html
Other
136 stars 330 forks source link

fix: raspbian pi4 dockerd failing with segmentation fault, issue #286 #287

Closed auphofBSF closed 3 years ago

auphofBSF commented 3 years ago

PR progress checklist (to be filled in by reviewers)


What type of PR is this?

Primary type

Secondary type

Does this PR introduce a BREAKING CHANGE?

Unsure.

Related issues and/or pull requests

Describe the changes you're proposing

This PR installs docker and docker-compose successfully and functionally tested on raspberry pi4 buster This PR however should be regarded as WIP as I have limited understanding of the full functionality of this docker-formula and am still a very NOOB to salt and salt formulas> I am very willing to learn and the process of getting it functional to this state has been rewarding. I am anticipating the saltstack formula masters will be kind enough to show what next and guide to fixing remaining issues in style and context of the universal docker-formula.

The following 4 are my understanding of open issues that I am stuck with

Server: Docker Engine - Community Engine: Version: 20.10.6 API version: 1.41 (minimum version 1.12) Go version: go1.13.15 Git commit: 8728dd2 Built: Fri Apr 9 22:44:17 2021 OS/Arch: linux/arm Experimental: false containerd: Version: 1.4.4 GitCommit: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e runc: Version: 1.0.0-rc93 GitCommit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec docker-init: Version: 0.19.0 GitCommit: de40ad0

- [ ] `[docker.service startup issues on state.apply]`  Frequently fails on ID: docker-software-service-running-docker.
on salt master the following failure is displayed

      ID: docker-software-service-running-docker
Function: service.running
    Name: docker
  Result: False
 Comment: Job for docker.service failed because the control process exited with error code.
          See "systemctl status docker.service" and "journalctl -xe" for details.
 Started: 05:24:31.958207
Duration: 357.258 ms
 Changes:


on target machine `systemctl status docker.service` shows

docker.service - Docker Application Container Engine Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Mon 2021-04-26 05:24:41 BST; 7min ago Docs: https://docs.docker.com Process: 11344 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock (code=exited, status=1/FAILURE) Main PID: 11344 (code=exited, status=1/FAILURE)

Apr 26 05:24:41 raspberrypi systemd[1]: docker.service: Service RestartSec=2s expired, scheduling restart. Apr 26 05:24:41 raspberrypi systemd[1]: docker.service: Scheduled restart job, restart counter is at 4. Apr 26 05:24:41 raspberrypi systemd[1]: Stopped Docker Application Container Engine. Apr 26 05:24:41 raspberrypi systemd[1]: docker.service: Start request repeated too quickly. Apr 26 05:24:41 raspberrypi systemd[1]: docker.service: Failed with result 'exit-code'. Apr 26 05:24:41 raspberrypi systemd[1]: Failed to start Docker Application Container Engine.

 Solved by ether redo `salt-ssh .... state.apply docker` or issuing a `systemctl start docker.service`

- [ ] `[state.apply docker.clean not removing docker]`   The clean stops and removes docker.service. It uninstalls docker-compose but docker still remains installed. ie `docker` still show the CLI commands but `sudo docker ps ` fails correctly. doing an `apt list --installed | grep  docker` lists the following:

docker-ce-cli/now 5:20.10.6~3-0~raspbian-buster armhf [installed,local] docker-ce-rootless-extras/now 5:20.10.6~3-0~raspbian-buster armhf [installed,local] golang-docker-credential-helpers/stable,now 0.6.1-2 armhf [installed,automatic] python3-docker/stable,now 3.4.1-4 all [installed] python3-dockerpty/stable,now 0.4.1-1 all [installed,auto-removable] python3-dockerpycreds/stable,now 0.3.0-1 all [installed,automatic]



- 

### Pillar / config required to test the proposed changes
<!-- Provide links to the SLS files and/or relevant configs (be sure to remove sensitive info). -->
Non Required

### Debug log showing how the proposed changes work
<!-- Include a debug log showing how these changes work, e.g. using `salt-minion -l debug`. -->
<!-- Alternatively, linking to Kitchen debug logs is useful, e.g. via. Travis CI. -->
<!-- Most useful is providing a passing InSpec test, which can be used to verify any proposed changes. -->

### Documentation checklist
<!-- Please tick each box that is relevant (after creating the PR). -->

- [ ] Updated the `README` (e.g. `Available states`).
- [ ] Updated `pillar.example`.

### Testing checklist
<!-- Please tick each box that is relevant (after creating the PR). -->

- [ ] Included in Kitchen (i.e. under `state_top`).
- [ ] Covered by new/existing tests (e.g. InSpec, Serverspec, etc.).
- [ ] Updated the relevant test pillar.

### Additional context
<!-- Add any other context about the proposed changes here. -->
noelmcloughlin commented 3 years ago

Hi @auphofBSF could you update your commit messages to use conventional commit: https://www.conventionalcommits.org/en/v1.0.0/#summary

auphofBSF commented 3 years ago

Failing Lint check on ./docker/osarchmap.yaml, because of jinja,

I note the comment in https://github.com/saltstack-formulas/docker-formula/pull/288#pullrequestreview-663353765

In fact, this is actually a reminder that we really want to move to our new v5 map.jinja, which emphasizes moving Jinja out of our YAML files.

I will have to understand and modify this PR accordingly

myii commented 3 years ago

Failing Lint check on ./docker/osarchmap.yaml, because of jinja,

@auphofBSF We can get around this problem in this PR by adding docker/osarchmap.yaml above these two lines:

https://github.com/saltstack-formulas/docker-formula/blob/4a9579f75b680a2f63facd4e18798453950dd3f5/.yamllint#L23-L24

I note the comment in #288 (review)

In fact, this is actually a reminder that we really want to move to our new v5 map.jinja, which emphasizes moving Jinja out of our YAML files.

I will have to understand and modify this PR accordingly

Please don't worry about that! It is a detailed job that is too much to ask from someone who hasn't got a lot of familiarity with it. Let's get the CI for this PR working and then this can be merged, if all appears to be OK.

noelmcloughlin commented 3 years ago

Nice, I was unsure about #286 but PR makes sense.

auphofBSF commented 3 years ago

thank you @myii and @noelmcloughlin , that appears to work, I was delayed in pushing this as I wanted to do a final test locally with the the merge including v2.0.7.

Thankfully docker appears to install, on salt v3002.6

There are a two issues !!!! and I am not sure which need to be handled here.

1) experiencing template missing errors with v3003.2, (exact message eludes me at the moment) need to do more research, I recall there being issues with salt-ssh and v3003.1 and just this morning saw activity on a possible relevant issue https://github.com/saltstack/salt/issues/59942#issuecomment-905888356

2) with Salt v3002.6 on fresh Rasbian (Buster) / fully atp-get updated and upgraded Docker installs but fails to start and the following error logged is

Failing to start dockerd: failed to create NAT chain DOCKER reason as documented here with fix This never occurred back when I initiated the PR

Reason

The docker installer uses iptables for nat. Unfortunately Debian uses nftables. You can convert the entries over to nftables or just setup Debian to use the legacy iptables.

Fix

on target

sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
sudo shutdown -r 0  # Do a restart, Docker.d should then function

or with Salt in an SLS

iptables:
  alternatives.set:
    - path:  /usr/sbin/iptables-legacy
ip6tables:
  alternatives.set:
    - path:  /usr/sbin/ip6tables-legacy

What to do for this PR

I hesitate to implement the fix for 2) in this docker_formula because I don't know which upgrade or raspbian version introduced it. For now I would recommend a note in documentation to the effect that

if docker.d does not start for reason failed to create NAT chain then perform these changes to use legacy iptables.

with regard to issue 1) re v3003.2 maybe a similar note in docs that this only current works on v3002.6

noelmcloughlin commented 3 years ago

Thanks for the detailed update. Could you update the README please and we can merge once CI passes? thanks

noelmcloughlin commented 3 years ago

Commitlint does not like Uppercase characters so probably these cause problems with CI.

auphofBSF commented 3 years ago

Apologies for the multitude of commits , trying to pass linting, restructured text syntax and rewording commit messages, All things I had no idea of how to do, but hopefully that should be able to be all squashed in merge of PR

noelmcloughlin commented 3 years ago

Thanks @auphofBSF for the PR, LGTM. merged

saltstack-formulas-travis commented 3 years ago

:tada: This PR is included in version 2.1.0 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket: