woodpecker-ci / autoscaler

Scale your woodpecker agents automatically to the moon and back based on the current load.
Apache License 2.0
29 stars 5 forks source link

Return errors during cloud-init #167

Closed pat-s closed 3 weeks ago

pat-s commented 3 weeks ago

Which prevent docker to be installed and the agent coming up. E.g. I just had this one here:

2024-07-06 19:37:31,783 - util.py[DEBUG]: Attempting to load yaml from string of length 1215 with allowed root types (<class 'dict'>,)
2024-07-06 19:37:31,785 - util.py[WARNING]: Failed loading yaml blob. Invalid format at line 41 column 1: "while scanning a simple key
  in "<unicode string>", line 41, column 1:
    WOODPECKER_TOKEN=eyJhbGciOiJIUzI ...
    ^

After setting

          env:
            - name: WOODPECKER_AGENT_ENV
              value: |
                WOODPECKER_SERVER=XYZ
                WOODPECKER_TOKEN=XYZ
xoxys commented 3 weeks ago

How should that work? In worst case, not a single wp component is running on the machine if cloud-init failed. What should be returned and where?

anbraten commented 3 weeks ago

Similar to #153.

I think the general issue is getting some feedback why an agent is not working currently requires manual interaction with the cloud provider or sshing into the agent, so some kind of feedback system would be awesome.

(How this could be implemented easily, but also in a robust and secure way. I've no idea atm 🤷🏾‍♂️)

xoxys commented 3 weeks ago

Admins should be able to log in to a faulty agent and just read the syslog. I don't see the difference between login to the cloud provider web frontend instead of ssh-ing to the faulty machine.

pat-s commented 3 weeks ago

This was just an idea, I don't know if this is feasible with the backend implementation. Definitely not easy in general.

Sure, ssh-ing is always possible but then you need to look at "the right" places and find them in the first place - instead of having errors logged within the autoscaler log.

Let's close this again as the prio is too low and it is unclear if this is even technically possible.