seqeralabs / cx-field-tools-installer

Unofficial Terraform solution to help clients install Seqera Platform
Apache License 2.0
3 stars 1 forks source link

[ Refactor Task ] Streamline number of locals #87

Open gwright99 opened 1 month ago

gwright99 commented 1 month ago

Leveraging the data.external technique implemented in https://github.com/seqeralabs/cx-field-tools-installer/pull/86/files, it should be possible to significantly streamline the number of locals (and related population logic).

Example

Consider the following two local values:

  dns_instance_ip = (
    var.flag_make_instance_private == true ? aws_instance.ec2.private_ip :
    var.flag_make_instance_private_behind_public_alb == true ? aws_instance.ec2.private_ip :
    var.flag_private_tower_without_eice == true ? aws_instance.ec2.private_ip :
    var.flag_make_instance_public == true ? aws_eip.towerhost[0].public_ip :
    "No_Match_Found"
  )

  alb_ingress_cidrs = (
    var.flag_make_instance_public == true || var.flag_make_instance_private_behind_public_alb == true ? var.sg_ingress_cidrs :
    var.flag_make_instance_private == true && var.flag_create_new_vpc == true ? [var.vpc_new_cidr_range] :
    var.flag_make_instance_private == true && var.flag_use_existing_vpc == true ? [data.aws_vpc.preexisting.cidr_block] :
    var.flag_private_tower_without_eice == true && var.flag_use_existing_vpc == true ? [data.aws_vpc.preexisting.cidr_block] :
    # DELETE THIS
    var.flag_private_tower_without_eice == true && var.flag_create_new_vpc == true ? [data.aws_vpc.preexisting.cidr_block] :
    ["No CIDR block found"]
  )

The ternary operators are both convoluted and repetitive. This could be refactored into a much cleaner control structure (if statement / case statement / etc).

gwright99 commented 1 month ago

Upon further reflection (and failed efforts), I think the ability to externalize to a Python script will have limited applicability in the definition of local values.

The externalization works well when the logic is entirely dependent on tfvars and local values (_e.g. population of data resources in _009_define_filetemplates.tf.

It works less well when the logic to be replaced uses ternary logic to determine which count-controlled resource an attribute should be pulled from. Take the first block I was trying to refactor, for example:

  dns_zone_id = (
    var.flag_create_route53_private_zone == true ? aws_route53_zone.private[0].id :
    var.flag_use_existing_route53_public_zone == true ? data.aws_route53_zone.public[0].id :
    var.flag_use_existing_route53_private_zone == true ? data.aws_route53_zone.private[0].id :
    "No_Match_Found"
  )

At no given time can a public and private hosted zone exist at the same time. Depending on set flags, one will exist and the other will not. If we try to pass the non-existent resource to a data object via query, TF will throw a [resource name] is empty tuple ... The given key does not identify an element in this collection value: the collection has not elements.

gwright99 commented 1 month ago

Plot thickens. Apparently I was wrong -- I can pass non-existing objects into the script so long as I package it a bit differently:

data "external" "generate_dns_valeus" {
  program = ["python3", "${path.module}/.githooks/data_external/generate_dns_values.py"]
  query = {
    r53_privatezone = jsonencode(aws_route53_zone.private)       # [0]
    r53_public =  jsonencode(data.aws_route53_zone.public)        # [0]
}

Results in

{'r53_privatezone': '[]', 'r53_public': '[
        {
            "arn": "arn:aws:route53:::hostedzone/Z0...",
            "caller_reference": "RISWorkflow-RD:...",
            "comment": "HostedZone created by Route53 Registrar",
            "id": "Z0...",
            "linked_service_description": null,
            "linked_service_principal": null,
            "name": "dev-seqera.net",
            "name_servers": [
                "..."
            ],
            "primary_name_server": "ns-1332.awsdns-38.org",
            "private_zone": false,
            "resource_record_set_count": xx,
            "tags": {},
            "vpc_id": null,
            "zone_id": "Z02...."
        }
    ]'
}
gwright99 commented 1 month ago

Hugely frustrating bug: Setting the project debugger level to DEBUG causes the following error when data_dictionary = get_tfvars_as_json() tries to read results from data.external (Update: Found cause - it's due to the stdout_handler, which uses sys.stdout which is causing the problem:

For some reason, logger.debug in the actual script file seems to be ok. Something to investigation tomorrow.

│ Error: Unexpected External Program Results
│ 
│   with data.external.generate_db_connection_string,
│   on 000_main.tf line 66, in data "external" "generate_db_connection_string":
│   66:   program = ["python3", "${path.module}/.githooks/data_external/generate_db_connection_string.py"]
│ 
│ The data source received unexpected results after executing the program.
│ 
│ Program output must be a JSON encoded map of string keys and string values.
│ 
│ If the error is unclear, the output can be viewed by enabling Terraform's logging at TRACE level. Terraform documentation on logging: https://www.terraform.io/internals/debugging
│ 
│ Program: /home/deeplearning/cx-field-tools-installer/venv/bin/python3
│ Result Error: invalid character '-' after top-level value
gwright99 commented 1 month ago

I've found a way around this problem but will solicit second options whether it's worth pursuing.

Ultimately, the problem is that Terraform watches both the stdout AND stderr of the external script for a positive / negative result.

There are two problems to address:

  1. Debugging

    • I have not found a way to hook a debugger into the externalized Python script yet (could be possible, I just dont how know to do it yet).
    • The dependency on some TF-created objects means (I think) that I need to always invoke via a TF flow -- I cant debug directly because running the script directly means we'll be missing critical objects.
  2. Logging

    • My workaround to the debugging problem was to fallback to copious logger.debug entries in the external scripts to observe the state of the TF objects during execution.
    • Unfortunately, the typical logging pattern (console / file) requires use of stdout and stderr ... which cause Terraform to break since it thinks the diagnostic log event is the actual result of the external script.

Workaround

I've found a functioning workaround by using the MemoryHandler class. Essentially, I log normally but stash all the entries to RAM (avoiding use of stdout/stderr), emit the actual payload TF is expecting, THEN flush all the earlier log events to a file (for observability / troubleshooting purposes).

This approach seems to work well with only minor code changes required and could be worthwhile. HOWEVER, I realize I'm playing with time / sequence ordering here -- the order of execution of the deployment script now no longer entries matches the sequence of entries in the generated log file.

I remember how it works now, but will I remember how it works 30 days for now when we run into some other edge-case? TBD.

gwright99 commented 1 month ago

Success (after several headaches)!! Committing raw files to capture state. Will clean up once I fully unpack what I did. Commit Id: https://github.com/seqeralabs/cx-field-tools-installer/commit/a71e7285fe0b1698538760620071b3e2490dc857

image