Replace `ansible` dependecy with `ansible-core`

xlab-si / xopera-opera

xOpera orchestrator compliant with TOSCA YAML v1.3 in the making

https://xlab-si.github.io/xopera-docs/

Apache License 2.0

35 stars 14 forks source link

Replace `ansible` dependecy with `ansible-core` #251

Closed anzoman closed 2 years ago

anzoman commented 2 years ago

Description

The opera TOSCA orchestrator depends on ansible PyPI package. This means that along with Ansible we also install around 500 MB of Ansible collections automatically. And we don't really need this because the users can install only the collections they need (via ansible-galaxy install or by using requirements.yml file). Apart from that installing collections manually also means that the users are able to use any version of collections, whereas installing via ansible python package brings only specific versions, which can lead to lock-in. So, I propose that we replace ansible dependecy with ansible-core, which would also make opera lighter and more flexible for the users.

Steps

We can replace ansible with ansible-core dependecy in requirements.txt file.

We should also check upgradability - if the users with older opera versions will be able to upgrade to newer version that

Current behaviour

opera depends on ansible python package.

Expected results

opera depends on ansible-core python package.

Note

This issue was first mentioned in https://github.com/xlab-si/iac-scan-runner/pull/3#issue-1157243599.

anzoman commented 2 years ago

Apart from backward compatibility the replacement of ansible with ansible-core dependency could also bring some problems with our xopera-examples as majority of them could fail due to missing Ansible collections, so we would need to mark, which Ansible collections need to be installed for each example (we could do this in README in prerequisites section). Another problem is with xOpera SaaS, where users don't have direct access to orchestration environment and its console so they cannot use ansible-galaxy install and other commands. And Ansible does not have a tool that would automatically install dependencies (= collections) based on their usage in Ansible playbooks (on the contrary Terraform manages this with terraform init command that finds and installs necessary plugins). So, one possible solution here would be to require having requirements.yml somewhere within IaC and the call a command like opera init that would run ansible-galaxy install -r requirements.yml behind the scenes. But this approach also brings a drawback for the users as they must follow some new convention and place their requirements in exactly specified location, so that opera could find and install them.

After a conversation with @cankarm and @sstanovnik we agreed that probably right now the best solution is to unleash the power of TOSCA. This would mean that for example users would need to define a new TOSCA node type (e.g. AnsiblePrerequisites) for installing necessary Ansible collections and this node would link to Ansible playbook that would use ansible.builtin.shell module to install collections via ansible-galaxy install command or in any other way that users prefer. By using only TOSCA + Ansible to install required collections the transition from ansible to ansible-core would go smoothly and without the need to implement any other constructs and things that would do more harm than good.

cankarm commented 2 years ago

Thanks for putting this here @anzoman. I guess that every real world TOSCA orchestrators (and blueprints) face this same issue, not only for Ansible. Therefore it would be reasonable to ask others how they tackle this problem to create the orchestrator to work for any potential TOSCA input.

@lauwers, @tliron - if you might know, would you be so kind and maybe share how this is tackled by other orchestrators? Somehow would be logical that each TOSCA service template would have a requirements.txt similarly as python packages. Because it is rarely that you can find a "perfect executor engine" that would be able to deploy things without any dependencies. Thanks.

tliron commented 2 years ago

I'm not entirely clear on how you're using Ansible here, but Ansible Galaxy does support a requirements.yml-style all-at-once installation.

Why is Ansible's size a problem? Are you installing in constrained environments? (It's really not "big", it's just slow to install due to many dependencies requiring C/C++ and Rust compilation....)

lauwers commented 2 years ago

Yes, this problem exists not just with Ansible, but with most other implementation artifacts as well. For example, a Python script may depend on a number of other modules (specified in a requirements.txt, for example). Almost all of my bash scripts require 'jq' to parse JSON (when passing complex input values). We ought to come up with a "best practices" way of specifying these prerequisites in TOSCA templates. In my opinion, TOSCA provides 2 mechanisms for doing this:

Using "requirements": the node that has the implementation artifact with the prerequisites could have a requirement to another node that is only there to install those prerequisites.
Using "dependent" artifacts: one could define a dependent artifact (e.g. a bash script) that is processed before the primary artifact and that installs the dependencies required by the primary artifacts

I have used both approaches in my templates, and they each have pros and cons. Neither one feels extremely elegant, because it feels like we're mixing "application functionality" and "orchestration support" in the same template. I wonder if there might be a better way to do this?

anzoman commented 2 years ago

Thanks for your answers!

@tliron installing all these 500MB of collections along with the orchestrator just doesn't feel right to me. opera currently uses only Ansible to implement TOSCA interface operations, but in the future we want to add new executors and if each one of them comes with installing all its dependencies this will only increase the size of the download. We want to keep opera as lightweight as possible and not populate user's environment with the things he won't ever need, so depending ansible-core seems a good solution here (we already had a problem by having too big Docker image for our IaC Scan Runner and opera's Ansible collections installation was one of the reasons). But we could consider updating our Python package in a way that we allow installing ansible for those users who want to pre-install all collections along with Ansible (i.e. we could do this by adding new entry to options.extras_require within setup.cfg similar as we do for OpenStack, where users can install opera with all required OpenStack libraries by running pip install opera[openstack]).

@lauwers we also also thought about encouraging opera's users to use these two TOSCA mechanisms (when we drop ansible dependency). We already have some TOSCA nodes in our examples that are used only for installing prerequisite packages. On the other hand, as you have put it, this might not be the best since we are populating TOSCA templates with some additional node types that are actually more connected to application prerequisites than orchestration and are there only because we don't want the users to do this installation manually. I believe that the better way to do this might be to automate the installation of these prerequisites by extracting dependencies from implementation artifacts (in a similar way that Terraform installs its plugins) and installing them during the initialization phase (for example this could be invoked by running opera init), where apart from CSAR extraction, storage preparation, the orchestrator could also install the necessary prerequisites. But still, we should think twice before actually going this way, because this approach could bring some drawbacks. For example we would need to explore how to extract prerequisites for each executor. In case of Ansible, the installation cannot be automated easily and would probably be done by installing with requirements.yml file (e.g., it's interesting that Ansible Tower can already find requirements.yml and install collections automatically).

tliron commented 2 years ago

I'm really confused by your desire to optimize this. Opera can orchestrate entire clouds, thousands of servers, many thousands of dollars of resources per day, and you're concerned with a few megabytes of storage on one orchestration machine? What am I missing here?

Or are you thinking of installing Ansible on every single compute node? Why? Maybe consider setting up Ansible Tower (AWX) in the environment, a single deployment of of Ansible that can access everything. It also has many features above and beyond what a local Ansible install can do. Or even set up a simple jump server with a local Ansible to target all nodes. That's what Ansible is designed for.

Sorry, I'm very confused as to what you're trying to achieve and how this has become a problem. Ansible is considered to be a very lightweight orchestrator.

anzoman commented 2 years ago

Thanks @tliron, I understand your confusion, as you said a few MB shouldn't be a problem when opera is only installed on one target machine, but when spawning a lot of machines or containers this could become an issue. One of our use cases is the xOpera SaaS orchestrator, which introduces deployment projects for every IaC package. As we want to keep user orchestration environments as separate as possible, we create a new container for every project that installs its own opera package. Even if we have just a few hundreds of these project here we could benefit a lot by reducing the size of opera package. Our another use case is IaC Scan Runner, which is a tool that analyzes the IaC and reports back different (security) vulnerabilities. To analyze different kinds of languages and configurations, we use a lot of different SAST and SCA tools that need to be installed into one container and opera is the biggest of them all, so again if we manage to reduce its size safely this would be great. Also from another point of view - even if you install all the listed Ansible collections by installing ansible package there might be still some that's missing or some that you need to install manually because they are not published on Galaxy.

To draw a line, I think that we should not just abandon ansible package and go to ansible-core immediately because there may be other disadvantages of doing so. But, if we decide do it we will keep an extra option to install the whole ansible package if needed with pip install opera[ansible].

lauwers commented 2 years ago

This is a good discussion. We need to distinguish between the following types of dependencies:

Dependencies in the service that is deployed using TOSCA. These dependencies are expressed using TOSCA requirements.
Dependencies in the (implementation) artifacts that are used to implement Interface Operations. These dependencies could be expressed using the “dependent” keyword in the operation implementation.
Dependencies in the artifact processor.

It appears that the discussion here relates to the third type of dependencies. We should have a more general discussion about how TOSCA profile designers can “introduce” artifact processors into an orchestrator for the artifact types that are part of the profile. There is currently no language support for this. Feel free to suggest ways in which this can be done.

From: Anže Luzar @.> Sent: Monday, March 14, 2022 5:39 AM To: xlab-si/xopera-opera @.> Cc: Chris Lauwers @.>; Mention @.> Subject: Re: [xlab-si/xopera-opera] Replace ansible dependecy with ansible-core (Issue #251)

Thanks @tlironhttps://github.com/tliron, I understand your confusion, as you said a few MB shouldn't be a problem when opera is only installed on one target machine, but when spawning a lot of machines or containers this could become an issue. One of our use cases is the xOpera SaaShttps://xlab-si.github.io/xopera-docs/04-saas.html orchestrator, which introduces deployment projects for every IaC package. As we want to keep user orchestration environments as separate as possible, we create a new container for every project that installs its own opera package. Even if we have just a few hundreds of these project here we could benefit a lot by reducing the size of opera package. Our another use case is IaC Scan Runnerhttps://xlab-si.github.io/iac-scanner-docs/02-runner.html, which is a tool that analyzes the IaC and reports back different (security) vulnerabilities. To analyze different kinds of languages and configurations, we use a lot of different SAST and SCA tools that need to be installed into one container and opera is the biggest of them all, so again if we manage to reduce its size safely this would be great. Also from another point of view - even if you install all the listed Ansible collections by installing ansible package there might be still some that's missing or some that you need to install manually because they are not published on Galaxy.

To draw a line, I think that we should not just abandon ansible package and go to ansible-core immediately because there may be other disadvantages of doing so. But, if we decide do it we will keep an extra option to install the whole ansible package if needed with pip install opera[ansible].

— Reply to this email directly, view it on GitHubhttps://github.com/xlab-si/xopera-opera/issues/251#issuecomment-1066737459, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AASPLIN2RHZWTEX42A6JSALU74XPJANCNFSM5PZU5G3Q. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you were mentioned.Message ID: @.**@.>>

cankarm commented 2 years ago

Correct, @lauwers, this is the "third type" of dependencies, that practically does not affect TOSCA standard as such. They are required for the orchestrator to prepare deploying-runtime environment in a way that executor -- which does the actual deployment -- has all required tools and libraries installed.

The list of those dependencies serves the general-purpose orchestrators to update themselves according the requirements and less general-purpose orchestrators to quickly respond if they will not be able to deploy a service, due to the underlying deployment technology, which they not support.

I guess that we will require at least a file with a list of dependecies - let's assume that we would have multiple executors at the same time (bash, python, ansible, terraform) so, the list could include anything:

python3
terraform
boto3    # for AWS
ansible-core
<required ansible collection1>
<required ansible collection2>
terraform

but probably it would be better to have more structured form (it could be txt like here or even yaml):

[python]
python3
boto3

[ansible]
ansible-core
<required ansible collection1>
<required ansible collection2>

[terraform]
terraform

Probably the structured form is better as the installations of particular packages can vary.

lauwers commented 2 years ago

We should investigate how to make Executors/Artifact Processors portable. If I’m a profile designer, and I introduce a new artifact type in my profile, I probably also need to package an artifact processor for my new artifact type. This raises a number of questions:

What is the API to the artifact processor that is called by the orchestrator?
Can I somehow package the code for my artifact processor in the CSAR that contains the profile?
How do I specify the list of dependencies for my artifact processor?

cankarm commented 2 years ago

If I understand correctly, the idea that you mention goes into direction that CSAR would enclose also the artifact processor(s) itself.

My idea goes into direction to only uniquely define the artifact processor(s) (IaC interpreter) for each artifact. Like in shell scripts, where you can define the interpreter with #!/bin/bash or `"#!/usr/bin/python", you would be able to define an interpreter and also the dependencies - required sub-modules of interpreter - for your artefact.

In the creation of CSAR, a process would need to collect all dependencies together, and serve it to the orchestrator, so the orchestrator would be able to install/prepare the environment before the first deploy.

In the contrast, I think that piggybacking the artifact processors can produce a very heavy load for the CSARs, which seems unnecessary.

anzoman commented 2 years ago

This feature was implemented in #252 and if needed we can continue our discussion in https://github.com/oasis-open/tosca-community-contributions.

lauwers commented 2 years ago

Yes, let’s continue this discussion on the TOSCA github. Even for bash script artifacts, an orchestrator needs artifact processing software that:

Creates environment variables for the various inputs
Executes the script and checks the return values
Captures output values somehow and return them to the orchestrator.

Ideally, the installation of this software (as well as its execution) should be done in a “standard” way.