konveyor / move2kube

Move2Kube is a command-line tool for automating creation of Infrastructure as code (IaC) artifacts. It has inbuilt support for creating IaC artifacts for replatforming to Kubernetes/Openshift.
https://move2kube.konveyor.io/
Apache License 2.0
377 stars 119 forks source link

Move2Kube : Detailed Insights into Configuration File Generation #1151

Open nitinrana-deloitte opened 3 months ago

nitinrana-deloitte commented 3 months ago

Move2Kube dynamically adjusts its questioning process based on different scenarios. Our goal is to gain a deeper understanding of the configuration file generation process without having to execute the transformation.

Dynamic Questioning based on Cluster Type: When changing the cluster type, such as to OpenShift, Move2Kube modifies subsequent questions, like requesting route creation. We want to explore how this effects the generation of the config file.

Variation in Questions for Multiple Services: In scenarios where multiple services are detected, Move2Kube poses specific questions tailored to each service. Now, these questions can vary based on the number of services it detects, for a particular service there can be more questions compared to the other detected services.

We are aware of the ability of the config file, but our main goal is to gain a deeper understanding of the configuration file generation process so as to build the config file by taking minimal input from user ( rather than asking users to answer the entire transformation questions) and feed it to pipeline hosting m2k to generate manifest file. ****

ashokponkumar commented 3 months ago

Due to the dynamic composing nature of transformers, the questions change depending on the transformers that get activated. Going through the relevant transformers code (https://github.com/konveyor/move2kube/tree/main/transformer) might give an idea. Or we might have to end up trying out with say a cluster type to see what gets into the config. Using --qa-skip might help quickly check the questions that get asked (at least on the default flow).

@HarikrishnanBalagopal do we have any documents explaining the wild card capabilities of a config file? @Akash-Nayak can you also point to the question categorisation work?

HarikrishnanBalagopal commented 3 months ago

@nitinrana-deloitte Our yaml config file generation is heavily based on https://github.com/mikefarah/yq and json path. Especially the ability to create new yaml documents https://mikefarah.gitbook.io/yq/

The config file generation process is very simple.

  1. It starts with an empty dictionary (key value pairs)
  2. When a question is asked, the answer is stored in the dictionary. The key is the question ID and the value is the answer to the question that the user provided.
  3. The key is often in the form aaa.bbb.ccc.ddd where the . is used as a separator. The key is split into sub keys using the seperator. The subkeys are used to decide where to store the answer.

Example: Initially the dict is empty

{}

We ask our 1st question with ID aaa.bbb.ccc and get the answer myanswer1. The dictionary now looks like this

{
  "aaa": {
    "bbb": {
      "ccc": "myanswer1"
    }
  }
}

Now we ask our 2nd question with ID aaa.bbb.ddd and get the answer myanswer2. The dictionary now looks like this

{
  "aaa": {
    "bbb": {
      "ccc": "myanswer1",
      "ddd": "myanswer2",
    }
  }
}

The above works for input, multiline, select, and boolean question types.

For multi-select questions, the answer is an array so we use a special symbol [] to indicate the location along the tree where the array goes.

By default we don't store password type questions in the config (although there is a CLI flag to change that).

nitinrana-deloitte commented 3 months ago

@HarikrishnanBalagopal, thanks for this. Could we also get to know more about the decision tree? Are there only a fixed number of questions which m2k is going to ask, we want to answer these questions for the users beforehand, but with increasing code base we have to deal with an increased number of questions, is there any way where we can handle this?

HarikrishnanBalagopal commented 3 months ago

@HarikrishnanBalagopal, thanks for this. Could we also get to know more about the decision tree? Are there only a fixed number of questions which m2k is going to ask, we want to answer these questions for the users beforehand, but with increasing code base we have to deal with an increased number of questions, is there any way where we can handle this?

There is no limit to the number of questions that can be asked. By default we try to avoid asking the user too many questions to keep things simple.

Using custom transformers you can ask as many questions as you need. Example using a custom Starlark transformer https://move2kube.konveyor.io/transformers/external/starlark#qa-engine https://github.com/konveyor/move2kube-transformers/blob/a91025e0fe23f288ea1331bb04cf0a34c58ba0d8/cloud-foundry-to-ce-iks-roks/cedockerfile/cedockerfile.star#L52-L54

seshapad commented 3 months ago

@nitinrana-deloitte There is a feature in move2kube to switch on/off categories of questions in the flow. Here is the tutorial for this: https://move2kube.konveyor.io/tutorials/qa-categorization This could potentially help in removing question sets from a flow if it is not relevant to a particular application. Please give it a try.

Akash-Nayak commented 3 months ago

@nitinrana-deloitte There is a feature in move2kube to switch on/off categories of questions in the flow. Here is the tutorial for this: https://move2kube.konveyor.io/tutorials/qa-categorization This could potentially help in removing question sets from a flow if it is not relevant to a particular application. Please give it a try.

@nitinrana-deloitte Here is the mapping file, which contains the names of different categories and also the mapping between the different categories and the questions. If we disable a category, say cicd, then Move2Kube won't ask any questions that fall in this category to the user and take the default answer for the questions.

$ move2kube transform -s src --qa-disable cicd

Multiple categories can be disabled by passing the --qa-disable flag multiple times.

$ move2kube transform -s src --qa-disable cicd --qa-disable git
nitinrana-deloitte commented 3 months ago

@Akash-Nayak Thanks for this, I just wanted to clarify that we are implementing move2kube via the move2kube API. Can we also pass these flags during the API implementation?

seshapad commented 3 months ago

@nitinrana-deloitte Cluster selector: https://move2kube.konveyor.io/tutorials/customize-cluster-selector Custom transformers: Tutorials: https://move2kube.konveyor.io/tutorials/customizing-the-output Custom transformers: https://github.com/konveyor/move2kube-transformers please take a look at this: https://move2kube.konveyor.io/tutorials/customizing-the-output/custom-annotations

nitinrana-deloitte commented 3 months ago

@seshapad could we get a document which talks about the wildcard in the config file?

HarikrishnanBalagopal commented 3 months ago

@seshapad could we get a document which talks about the wildcard in the config file?

@nitinrana-deloitte I will try to summarize the semantics here for now.

We always try to get the answer from the config file first. We do this by looking at the ID of the question, splitting that ID into a bunch of sub keys separated by . and looking for the value pointed to by those sub keys in the config map/dict. See my previous answer for an example https://github.com/konveyor/move2kube/issues/1151#issuecomment-1991694254

If we are unable to find the answer in the config, then we ask the user for the answer.

There are 2 wildcards [] and *.

https://github.com/konveyor/move2kube/blob/565394bf94dcbab7ea340b7daf603f8dee326a14/common/constants.go#L74-L77

Wildcard *

The * symbol is simpler.

For a question with ID a.mysvc1.port and a config like

{
  "a": {
    "*": {
      "port": "8080"
    }
  }
}

It would not match anything at first.

So then we try to match against * symbols present in the config. We do this by replacing parts of the key with the symbol * and doing a literal match. In the above example the key a.mysvc1.port turns into a.*.port which matches and gets the value 8080.

https://github.com/konveyor/move2kube/blob/565394bf94dcbab7ea340b7daf603f8dee326a14/types/qaengine/config.go#L103-L104

Wildcard []

The [] symbol is used for Multi-Select type questions. It can match against any key in a map/dict. Multi-Select type questions take a list of strings as the answer to the question.

https://github.com/konveyor/move2kube/blob/565394bf94dcbab7ea340b7daf603f8dee326a14/types/qaengine/config.go#L135-L137

There are 2 ways such a question can be stored in the config:

https://github.com/konveyor/move2kube/blob/565394bf94dcbab7ea340b7daf603f8dee326a14/types/qaengine/config.go#L245-L256

For a Multi-Select question with id a.b.c, options ["ans1", "ans2", "ans3"] and answers ["ans1", "ans3"] we get a config that looks like this

{
  "a": {
    "b": {
      "c": ["ans1", "ans3"]
    }
  }
}

For a Multi-Select question with id a.[].c, options ["ans1", "ans2", "ans3"] and answers ["ans1", "ans3"] we get a config that looks like this

{
  "a": {
    "ans1": {
      "c": true
    },
    "ans2": {
      "c": false
    },
    "ans3": {
      "c": true
    }
  }
}
nitinrana-deloitte commented 3 months ago

@HarikrishnanBalagopal, thanks.

So let's say we have this config file which is only having one service (src), now if the service count is to be increased for future runs, in case of a different code base, so can this wildcard functionality be used for the same?

move2kube: minreplicas: "1" services: src: "5001": servicetype: ClusterIP deployment: Deployment enable: true port: "5001" target: cicd: tekton: gitrepobasicauthsecret: "" gitreposshsecret: "" registrypushsecret: "" default: clustertype: Openshift ingress: host: python-sample.com

ashokponkumar commented 3 months ago
{
  "a": {
    "*": {
      "port": "8080"
    }
  }
}

It just needs to be "*" where you have "src" now.

nitinrana-deloitte commented 3 months ago

Hi @ashokponkumar, using the wildcard *, can we construct the config file in such a way so that we still have the ability to ask the user certain questions for instance port number, while implementing it like this it gets skipped during the transformation.

move2kube: minreplicas: "1" *services: "":** "5001": servicetype: ClusterIP deployment: Deployment enable: true port: "5001"

  I was trying to use the [] selector, but I guess it works only with the multi selector questions
ashokponkumar commented 3 months ago

Hi @ashokponkumar, using the wildcard *, can we construct the config file in such a way so that we still have the ability to ask the user certain questions for instance port number, while implementing it like this it gets skipped during the transformation.

move2kube: minreplicas: "1" *services: "":** "5001": servicetype: ClusterIP deployment: Deployment enable: true port: "5001"

  I was trying to use the [] selector, but I guess it works only with the multi selector questions

Remove the port number from this config

move2kube: minreplicas: "1" services: "": "": servicetype: ClusterIP deployment: Deployment enable: true

nitinrana-deloitte commented 3 months ago

Hi @ashokponkumar. We don't want to include the hostname and url path in our route.yaml which will be generated after trnasformation, but if I'm passing empty string in the config.yaml, it is skipping to create the route.yaml file.

This is the config file for reference, doing so skips route.yaml creation.

services: src: "5001": servicetype: Ingress urlpath: "" deployment: Deployment enable: true port: "5001" target: cicd: tekton: gitrepobasicauthsecret: "" gitreposshsecret: "" registrypushsecret: "" default: clustertype: Openshift ingress: host: ""

    We want our route.yaml to look like this :

apiVersion: route.openshift.io/v1 kind: Route metadata: creationTimestamp: null labels: move2kube.konveyor.io/service: src name: src spec: port: targetPort: port-5001 tls: termination: edge to: kind: Service name: src weight: 1 status: ingress:

  • {}
ashokponkumar commented 3 months ago

Use a dummy host, and then use a custom transformer like https://move2kube.konveyor.io/tutorials/customizing-the-output/custom-annotations to change the yaml that is generated.

nitinrana-deloitte commented 3 months ago

Hi @ashokponkumar, this is our config.yaml

move2kube: minreplicas: "2" services: "*": "": servicetype: Ingress urlpath: /carbon-emission-ui deployment: Deployment enable: true port: ""

On placing a wildcard in place of the service name, we only want to answer the port number during transformation, but m2k is asking values for both servicetype and urlpath, is this the expected behaviour while using wildcard? We are having multiple services for this transformation.

ashokponkumar commented 3 months ago
move2kube:
  minreplicas: "2"
  services:
    "*":
      "*":
        servicetype: Ingress
        urlpath: /carbon-emission-ui
      deployment: Deployment
      enable: true

Use a * in the port number section too.

Is the url path same for all services?

nitinrana-deloitte commented 3 months ago

No, it will not be same for all the services.

nitinrana-deloitte commented 3 months ago

Hi @ashokponkumar, we are still getting the service type question during our transformation on using the above config.

move2kube: minreplicas: "2" services: "": "": servicetype: Ingress urlpath: /carbon-emission-ui deployment: Deployment enable: true

HarikrishnanBalagopal commented 3 months ago

@nitinrana-deloitte if possible try to preserve the indentation while pasting code. YAML especially cares a lot about indentation. You can use the three back ticks ``` to indicate the start of a code block. https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks#fenced-code-blocks

You can also upload the config.yaml file here in the comments. That will help us debug.

nitinrana-deloitte commented 3 months ago

Sure @HarikrishnanBalagopal, here's the yaml which we are trying to tweak to get the user to only answer for the port number question, we are dealing with multiple services in this transformation.


  minreplicas: "2"
  services:
    "*":
      "*":
        servicetype: Ingress
        urlpath: /carbon-emission-ui
      deployment: Deployment
      enable: true
nitinrana-deloitte commented 3 months ago

This is the complete config which gets generated, for now we have only provided sample/dummy values.


  minreplicas: "2"
  services:
    carbon-emission-ui:
      "5001":
        servicetype: Ingress
        urlpath: /carbon-emission-ui
      deployment: Deployment
      enable: true
      port: "5001"
    demandbetter:
      "5001":
        servicetype: Ingress
        urlpath: /demandbetter
      childModules:
        demandbetter:
          port: "5001"
      deployment: Deployment
      dockerfileType: build stage in base image
      enable: true
    demandbetter-python:
      "5001":
        servicetype: Ingress
        urlpath: /demandbetter-python
      deployment: Deployment
      enable: true
      port: "5001"
  target:
    cicd:
      tekton:
        gitrepobasicauthsecret: ""
        gitreposshsecret: ""
        registrypushsecret: ""
    default:
      clustertype: Openshift
      ingress:
        host: multiple-services.com
    imageregistry:
      namespace: multiple-services
      quay.io:
        logintype: use an existing pull secret
        pullsecret: pull-secret
      url: quay.io
  transformers:
    kubernetes:
      argocd:
        namespace: ""
    types:
      - Tekton
      - Liberty
      - CloudFoundry
      - ComposeGenerator
      - DockerfileParser
      - Python-Dockerfile
      - EarRouter
      - PHP-Dockerfile
      - ArgoCD
      - DockerfileDetector
      - KubernetesVersionChanger
      - OperatorTransformer
      - Parameterizer
      - DockerfileImageBuildScript
      - Rust-Dockerfile
      - Jboss
      - ReadMeGenerator
      - ZuulAnalyser
      - Golang-Dockerfile
      - EarAnalyser
      - Nodejs-Dockerfile
      - ComposeAnalyser
      - WarRouter
      - Jar
      - Gradle
      - Tomcat
      - Ruby-Dockerfile
      - ClusterSelector
      - Knative
      - Buildconfig
      - Kubernetes
      - WarAnalyser
      - Maven
      - DotNetCore-Dockerfile
      - WinWebApp-Dockerfile
      - ContainerImagesPushScriptGenerator
  transformerselector: ""
route:
  tls:
    certificate: ""
    key: ""
    terminationpolicy: edge
HarikrishnanBalagopal commented 3 months ago

Sure @HarikrishnanBalagopal, here's the yaml which we are trying to tweak to get the user to only answer for the port number question, we are dealing with multiple services in this transformation.

  minreplicas: "2"
  services:
    "*":
      "*":
        servicetype: Ingress
        urlpath: /carbon-emission-ui
      deployment: Deployment
      enable: true

@nitinrana-deloitte Thanks, the double * wildcard is not supported yet (to avoid having to deal with several edge cases). We can add support for it if necessary.

As mentioned in the comment https://github.com/konveyor/move2kube/issues/1151#issuecomment-1996364310 the way the * wildcard works is by replacing parts of the question ID with the * and doing a literal match against the config. Example: Given a.b.c.d.e this matches a.b.c.*.e, then a.b.*.d.e, then a.*.c.d.e

https://github.com/konveyor/move2kube/blob/565394bf94dcbab7ea340b7daf603f8dee326a14/types/qaengine/config.go#L103-L104

In the meantime, if you are only using a few ports, then you might try using the * wildcard for the service name and listing all the ports in the config like this:

  minreplicas: "2"
  services:
    "*":
      "8080":
        servicetype: Ingress
        urlpath: /carbon-emission-ui
      "8081":
        servicetype: Ingress
        urlpath: /carbon-emission-ui
      "80":
        servicetype: Ingress
        urlpath: /carbon-emission-ui
      "443":
        servicetype: Ingress
        urlpath: /carbon-emission-ui
      deployment: Deployment
      enable: true
nitinrana-deloitte commented 3 months ago

Hi @HarikrishnanBalagopal, thanks for this approach, but we want the end user to provide for the port numbers, so we cannot possibly place it in the config beforehand, what we are trying to achieve is to get only the specific answers from the user, for example port number for all the services that are identified..

Could we figure out a way in the config through which this can be done?

HarikrishnanBalagopal commented 3 months ago

@nitinrana-deloitte We have added a new feature to override the existing QAMapping file. It has been released in version v0.3.13-rc.0

We have put a sample and instructions here https://github.com/konveyor/move2kube-transformers/tree/main/enable-disable-qa-categories

This new feature means you can:

  1. Copy the default QA mapping file here https://github.com/konveyor/move2kube/blob/main/assets/built-in/qa/qamappings.yaml
  2. Make some changes to it.
    • Enable/disable certain categories.
    • Move existing questions between categories.
    • Create new categories.
  3. Put the new mappings yaml file in a folder (example: customizations/my-custom-qa-mappings.yaml)
  4. And then run a transformation with move2kube transform -s source/ -c customizations/
  5. Move2Kube will respect the custom QA mappings file that you provided and only ask the questions from enabled categories.
HarikrishnanBalagopal commented 3 months ago

@nitinrana-deloitte For your particular use case we have moved the port related questions out of the network category and into a separate category called ports https://github.com/konveyor/move2kube/blob/7383e9115acff68248e68d4b122b9158f585c4b1/assets/built-in/qa/qamappings.yaml#L56-L60

So I would suggest using a config like this

  minreplicas: "2"
  services:
    "*":
      deployment: Deployment
      enable: true

and using the command below to only enable the categories that you need

$ move2kube transform -s source/ --config my-custom-config.yaml --qa-enable ports