neoave / mrack

Multicloud use-case based multihost async provisioner for CIs and testing during development
Apache License 2.0
11 stars 14 forks source link

feat: configurable post provisioning check + minor fixes #144

Closed Tiboris closed 2 years ago

Tiboris commented 2 years ago

feat: Make post provisioning ssh check configurable

Using post_provisioning_check with ssh default section in the provisioning config to configure check based on values in the dictionary:

    post_provisioning_check:
        ssh:
            # Default configurations for every host
            enabled: True # True | False
            disabled_providers: ["podman"] # Per provider override to `enabled: True`
            enabled_providers: [] # Would be relevant if 'enabled' is 'False'
            # port: 22
            # timeout: 10 # minutes

            # Overrides

           group:
                ipaclient:
                    timeout: 309090 # minutes
            # If we want to override based on OS
            os:
                win-2012r2:
                    timeout: 15 # minutes
                win-2016:
                    timeout: 15 # minutes
                win-2019:
                    timeout: 15 # minutes
                fedora-34:
                    enabled: False
                    timeout: 1
                    enabled_providers: ["static"]
                    disabled_providers: ["beaker"]

Priority: OS > Group > Default

f-trivino commented 2 years ago

from commit message: s/covfig/config/g

pvoborni commented 2 years ago

Hi, thanks for the PR, I'm bit worried about limitations of the configuration regarding future enhancements and also that it is only about ssh. In general, SSH is not the only method how to run stuff on hosts (and thus pre-check that it is possible) and thus our checks should not limit to it. Example use case is mentioned in https://github.com/neoave/mrack/issues/142 .

WDYT about the following config:


# defined in root of provisioning config, we don't need to mention defailt as by being in prof-config it becomes default behavior
connection_check:
  type: ssh # e.g. from options: ssh  | none, none would be the same as check: False
  # common options
  timeout: 10 # minutes
  # following type specific options,e.g.
  ssh_port: 22

# each provider can have it's own override, e.g.
# this also elimininate a need for exclusion list as 'none' will disable it for the provider
podman:
  connection_check: 
    type: none # or podman for native container connection 

WDYT?

Tiboris commented 2 years ago

Hi, thanks for the PR, I'm bit worried about limitations of the configuration regarding future enhancements and also that it is only about ssh. In general, SSH is not the only method how to run stuff on hosts (and thus pre-check that it is possible) and thus our checks should not limit to it. Example use case is mentioned in #142 .

WDYT about the following config:

# defined in root of provisioning config, we don't need to mention defailt as by being in prof-config it becomes default behavior
connection_check:
  type: ssh # e.g. from options: ssh  | none, none would be the same as check: False
  # common options
  timeout: 10 # minutes
  # following type specific options,e.g.
  ssh_port: 22

# each provider can have it's own override, e.g.
# this also elimininate a need for exclusion list as 'none' will disable it for the provider
podman:
  connection_check: 
    type: none # or podman for native container connection 

WDYT?

I believe we did a very similar suggestion of how the post provisioning ssh check configuration should look like. What is different is only a format how it is represented in provisioning-config.yaml file and this can be chaged little bit not to have such long names in config.

Also I think my version while being less readable because of the long names or whatever has very nice possibility: to disable check for only one OS in one provider with not disabling check for other os variants from the provider.

Tiboris commented 2 years ago

Updated and fixed AWS issues.

pvoborni commented 2 years ago

We should first agree on desired behavior and interface (provision config structure) - aka having a design first. Otherwise you'll waste a lot of your time on multiple re-implementation of the PR.

The importance on having a good interface from the beginning is the fact that it is much more difficult to change when people start using it.

I did not look at OSes in my original proposal as it was not visible that it is needed. From that perspective, I agree with having a separate post_provisioning_check sections with possibility of having various check - ssh being the first.

I'm not sure if the structure should be based mainly around os. But if so, see the comments inline:

 post_provisioning_check:  # LGTM
        ssh: # OK
            default: # OK
               # The check section means provider config? Or does it have also another purpose?
               # Term check is duplicated with base `post_provisioning_check`. If it is for providers, wouldn't it be better to 
               # state it, e.g.  name the section `providers` ?
                check:
                    enable: True  # I'd use  `enabled` - follows the same format as e.g. in ansible systemd module
                    podman: False
                port: 22
                timeout: 10 # minutes
            win-2012r2:
                timeout: 15 # minutes
            win-2016:
                timeout: 15 # minutes
            win-2019:
                timeout: 15 # minutes
            win-2012r2-latest:
                timeout: 20 # minutes
            win-2016-latest:
                timeout: 20 # minutes
            win-2019-latest:
                timeout: 20 # minutes
            fedora-34:
                timeout: 200
                check:
                    enable: False
                    aws: True # check only ssh on aws

Also alternative approach where the format is more extensible in the future with being simpler on default config and not focusing primarily on os (but still having the same power):

post_provisioning_check:
  ssh:
    # Default configurations for every host
    enabled: True # True | False
    disabled_providers: ['podman'] # Per provider override to `enabled: True`
    enabled_providers: [] # Would be relevant if 'enabled' is 'False'
    port: 22
    timeout: 10 # minutes

    # Overrides

    # If we want to override based on OS
    os:
      win-2012r2:
        timeout: 15 # minutes
      win-2016:
        timeout: 15 # minutes
      win-2019:
        timeout: 15 # minutes
      fedora-34:
        timeout: 200
        enabled: False
        enabled_providers: ['aws']

    # Groups (just idea to demonstrate why overrides has subsections as `os`
    # or `groups`, we don't need to implement this part now
    group:
      win:
        timeout: 30 # minutes
Tiboris commented 2 years ago

Thanks for the nice feedback @pvoborni I am okay with the above suggestion as i think its the better version of the one already in PR. Group secrtion can be definitely implemented later but with the commits above it should not be that problematic. What do you think @f-trivino shall I start?

Tiboris commented 2 years ago

Hello @pvoborni I have updated the code to match better suggestion, now it is ready to be reviewed.

amore17 commented 2 years ago

Commits: fix(OpenStack): Use only first part of fqdn for host name refactor: use builtin operator to merge dictionary helped to solve issue with AD

Tiboris commented 2 years ago

Above commits are not related to feature so i have moved them to https://github.com/neoave/mrack/pull/146

pvoborni commented 2 years ago

We should probably create some kind of context object, e.g. in up action or transformer. Then pass all the relevant objects in it - the ones which are input and the ones which are ooutput of the operations. Then the code would be simpler and we would not need to enhance the Host object with additional data which are in metadata.

E.g.

class HostProvisioningContext:
    def __init__(self, name, req, meta_host):
        self.name = name
        self.req = req
        self.meta_host = meta_host
        self.spec = None
        self.result = None
        self.error = None

    @property
    def ok(self):
        """If provisioning resulted in success."""
        return self.error != None
Tiboris commented 2 years ago

Updated the code to move code to _check_ssh_auth()

Tiboris commented 2 years ago

Thanks @pvoborni I will try to do better in future.