hitachienergy / epiphany

Cloud and on-premises automation for Kubernetes centered industrial grade solutions.
Apache License 2.0
138 stars 107 forks source link

[BUG] epicli may fail on re-run after adding component #3234

Closed to-bar closed 2 years ago

to-bar commented 2 years ago

Describe the bug epicli failed with error:

TASK [download : Download file postgres_exporter-0.10.0.linux-amd64.tar.gz] ****
15:26:34 INFO cli.src.ansible.AnsibleCommand - fatal: [tobar-aws-ub-small-postgresql-vm-0]: FAILED! => {"attempts": 3, "changed": false, "dest": "/tmp", "elapsed": 0, "gid": 0, "group": "root", "mode": "01777", "msg": "Request failed", "owner": "root", "response": "HTTP Error 404: Not Found", "size": 4096, "state": "directory", "status_code": 404, "uid": 0, "url": "http://10.1.11.33/epirepo/files/postgres_exporter-0.10.0.linux-amd64.tar.gz"}

on re-run after changing count for postgresql from 0 to 1 in epiphany-cluster doc.

This is because the first run was optimized to not download postgresql related requirements but on the second run the Download Epiphany requirements was skipped due to existence of /var/tmp/epi-download-requirements/download-requirements-done.flag file.

How to reproduce Steps to reproduce the behavior:

  1. execute epicli apply with postgresql.count: 0
  2. wait until repository role is done, then stop epicli run
  3. set postgresql.count: 1 and run epicli apply again

Expected behavior No failure.

Environment

epicli version: 2.0.1dev

Additional context Related to #3188


DoD checklist

przemyslavic commented 2 years ago

If the first run is successful, repository teardown is executed, which removes the entire /var/tmp/epi-download-requirements directory where the flag is located and then there is no issue. The problem occurs on epicli re-run if the previous run failed/was cancelled after setting up the repository.