opentelekomcloud / terraform-provider-opentelekomcloud

Terraform OpenTelekomCloud provider
https://registry.terraform.io/providers/opentelekomcloud/opentelekomcloud/latest
Mozilla Public License 2.0
87 stars 78 forks source link

opentelekomcloud_networking_router_route_v2 does not handle disappeared routes well #2652

Open pvbouwel opened 1 month ago

pvbouwel commented 1 month ago

Terraform provider version

1.35.2

Affected Resource(s)

opentelekomcloud_networking_router_route_v2

Terraform Configuration Files

resource "opentelekomcloud_networking_router_route_v2" "router_vpn_routes" {
  depends_on       = [opentelekomcloud_compute_instance_v2.compute_node[0]]
  for_each         = var.vpn_router_routes
  router_id        = var.router_id
  destination_cidr = each.value.destination_cidr
  next_hop         = var.vpngateway_ip
}

Debug Output/Panic Output

N/A

Steps to Reproduce

  1. terraform apply for your start setup
  2. Change the image of the compute instance forcing the replacement of the compute instance
  3. terraform apply
  4. terraform apply

Expected Behavior

On step 3 the compute instance should have been replaced and the routes that were defined should exist.

Actual Behavior

On step 3 the compute instance is replaced but all the old routes have disappeared. On step 4 when trying the apply again terraform notices that the routes are no longer in place but rather than creating new routes he wants to replace them:

# module.vpngateway[0].opentelekomcloud_networking_router_route_v2.router_vpn_routes["key"] must be replaced
-/+ resource "opentelekomcloud_networking_router_route_v2" "router_vpn_routes" {
      + destination_cidr = "1.2.3.0/27" # forces replacement
      ~ id               = "f2150689-9c2e-4c5c-b03f-e95a63d03746-route-1.2.3.0/27-10.1.2.10" -> (known after apply)
      + next_hop         = "10.1.2.10" # forces replacement
      ~ region           = "eu-nl" -> (known after apply)
        # (1 unchanged attribute hidden)
    }

10.1.2.10 would be the IP-address associated with the ECS server

And as a result will give errors (1 per route):

│ Error: route did not exist already

Important Factoids

So the compute instance is just an ECS that we configure with source/destination check disabled and where a software VPN is running (to setup a VPN tunnel)

References

I could not find related github issues

anton-sidelnikov commented 1 month ago

Hi @pvbouwel opentelekomcloud_networking_router_route_v2 is deprecated resource, use opentelekomcloud_vpc_route_v2 instead, and update terraform to latest

pvbouwel commented 1 month ago

@anton-sidelnikov https://registry.terraform.io/providers/opentelekomcloud/opentelekomcloud/latest/docs/resources/vpc_route_v2 (v1.36.18) does not allow this type of route.

opentelekomcloud_networking_router_route_v2 only allows to specify routes of type peering.

So if I use something like:

resource "opentelekomcloud_vpc_route_v2" "router_vpn_routes" {
  depends_on       = [opentelekomcloud_compute_instance_v2.compute_node[0]]
  type             = "peering"
  vpc_id           = "my-vpc-uuid"
  destination      = "192.168.254.254/32"
  nexthop          = opentelekomcloud_compute_instance_v2.compute_node[0].network[0].port
}

Then it will fail with Error: error creating OpenTelekomCloud VPC route: Resource not found: [POST https://vpc.eu-nl.otc.t-systems.com/v2.0/vpc/routes], error message: {"NeutronError": {"detail": "", "message": "No VPC peering exist with id 6ebac2e9-56f3-4959-96f5-4d4a4dea55f3", "type": "VPCPeeringNotExist"}} where the error message hints that it only considers VPC peerings as nexthop but that is not what I want in my use case.

In my use case I have to be able to add a static route to a certain interface/network port/IP in order to direct traffic into my VPN tunnel. And this interface is in a local account so no VPC Peering setup applies.

anton-sidelnikov commented 1 month ago

@pvbouwel yes, sorry my mistake wrong resource, please try: opentelekomcloud_vpc_route_table_v1 https://registry.terraform.io/providers/opentelekomcloud/opentelekomcloud/latest/docs/resources/vpc_route_table_v1

pvbouwel commented 1 month ago

@anton-sidelnikov That does come with a similar issue. If I change my ECS the plan shows it should:

  1. Replace the ECS
  2. Update the Route table in-place

Which is what I'd want but it fails with: │ Error: error updating OpenTelekomCloud VPC route table: Bad request with: [PUT https://vpc.eu-nl.otc.t-systems.com/v1//routetables/], error message: {"code":"VPC.2800","message":"The same route is included in the route list."}

I also noticed that if you create a route table and later on just add a subnet in the terraform router resource but do not do any change to the routes then you get the same error.

So it seems it suffers the same bug. Note that even if it gets squashed/resolved there would be a regression in user experience because with the new resource you must know all your routes up-front. Which hinders breaking up Terraform code in clean modules. Because I cannot but my VPN logic in a module and run it after my base networking module. IMHO: Route should be a separate terraform resource and terraform should manage the idempotency.

Slightly off topic: How can we see what is deprecated and what ain't because if you read https://registry.terraform.io/providers/opentelekomcloud/opentelekomcloud/latest/docs/resources/networking_router_route_v2 you wouldn't guess it is deprecated. But migrating to the VPC resources is an expensive operation (requires careful planning of managing the terraform state when you don't have the luxury of downtime)

pvbouwel commented 1 month ago

For reproducing:

locals {
  test_network_name = "test-network-pvbouwel"
  test_subnet = "test-subnet-pvbouwel-1"
  test_router = "test-router-pvbouwel"
}

resource "opentelekomcloud_vpc_v1" "testnetwork" {
  name = local.test_network_name
  cidr = "10.0.0.0/16"
  description = "For https://github.com/opentelekomcloud/terraform-provider-opentelekomcloud/issues/2652"
}

resource "opentelekomcloud_vpc_subnet_v1" "subnet_1-1" {
  name       = local.test_subnet
  cidr       = "10.0.1.0/24"
  gateway_ip = "10.0.1.1"
  vpc_id     = opentelekomcloud_vpc_v1.testnetwork.id
}

resource "opentelekomcloud_vpc_route_table_v1" "table_1" {
  name        = local.test_router
  vpc_id      = opentelekomcloud_vpc_v1.testnetwork.id
  description = "created by terraform with routes"

  route {
    destination = "192.168.254.254/32"
    type        = "eni"
    nexthop     = opentelekomcloud_compute_instance_v2.compute_node[0].network[0].port
    description = "Example into VPN tunnel"
  }

  subnets = [
    opentelekomcloud_vpc_subnet_v1.subnet_1-1.id,
  ]
}

resource "opentelekomcloud_compute_instance_v2" "compute_node" {
  count           = 1

  name            = "pvbouwel-test"
  flavor_id       = "<chose>"
  image_name      = "<chose an image inyour account>" # Changing this and running terraform apply again causes issues
  security_groups = ["<your-sec-group>""]
  key_pair        = "<your-keypair>"

  network {
    name = opentelekomcloud_vpc_v1.testnetwork.id
    fixed_ip_v4 = "10.0.1.30"
  }

  depends_on = [ 
    opentelekomcloud_vpc_subnet_v1.subnet_1-1
  ]
}
anton-sidelnikov commented 1 month ago

Hi @pvbouwel , yes sorry, i will add deprecation message to doc, someone forgot to do that. I will investigate issue today, strange behaviour, maybe api issue, will inform you on progression.

anton-sidelnikov commented 1 month ago

Hi @pvbouwel, i found an issue inside API which leads to this force-recreation of route, I need to create internal issue for that. Point is when you create route with type ENI, in api it appears as ECS, which lead to state inconsistency, and in your scenario impossible to use route of type ECS from the beginning, weird behaviour. https://jira.tsi-dev.otc-service.com/browse/BM-5965 - internal ticket

anton-sidelnikov commented 3 weeks ago

Hi @pvbouwel could you check again this resource opentelekomcloud_networking_router_route_v2, while we are waiting for the fixes for new service, i provded some fixes for old one, seems it worke for now but I cannon said when it will be decomissioned.

pvbouwel commented 3 weeks ago

@anton-sidelnikov I am not sure I understand your question. What should I check for https://registry.terraform.io/providers/opentelekomcloud/opentelekomcloud/latest/docs/resources/vpc_route_v2 ? That one does work for VPC-peerings but that is the only supported type according to the terraform docs.

And the docs of the backend API do not mention what are the possible values it only states that "peering" is the default: https://docs.otc.t-systems.com/ansible-collection-cloud/vpc_route_module.html

anton-sidelnikov commented 3 weeks ago

@pvbouwel Sorry again, the same problem with copy-paste, i wanted to said about: opentelekomcloud_networking_router_route_v2

pvbouwel commented 6 days ago

@anton-sidelnikov it took a while before I could test but with latest version I could pull ("1.36.23") it seems the refresh got broken and deletion is not really idempotent yet.

So I used:

locals {
  test_network_name = "test-network-pvbouwel"
  test_subnet = "test-subnet-pvbouwel-1"
  test_router = "test-router-pvbouwel"
}

resource "opentelekomcloud_networking_network_v2" "testnetwork" {
  name = local.test_network_name
  admin_state_up = true
}

resource "opentelekomcloud_networking_router_v2" "router" {
  name             = "${local.test_network_name}-router"
  admin_state_up   = true
}

resource "opentelekomcloud_networking_subnet_v2" "private_network_subnet" {
  name            = local.test_subnet
  network_id      = opentelekomcloud_networking_network_v2.testnetwork.id
  cidr            = "10.0.1.0/24"
  ip_version      = 4
}
resource "opentelekomcloud_networking_router_interface_v2" "private_network_router_interface" {
  router_id = opentelekomcloud_networking_router_v2.router.id
  subnet_id = opentelekomcloud_networking_subnet_v2.private_network_subnet.id
}

resource "opentelekomcloud_networking_router_route_v2" "router_vpn_routes" {
  depends_on       = [opentelekomcloud_compute_instance_v2.compute_node[0]]
  for_each         = toset(["192.168.0.1/32", "192.168.0.2/32"])
  router_id        = opentelekomcloud_networking_router_v2.router.id
  destination_cidr = each.value
  next_hop         = "10.0.1.30"
}

resource "opentelekomcloud_compute_instance_v2" "compute_node" {
  count           = 1

  name            = "pvbouwel-test"
  flavor_id       = "s3.large.8"
  image_name      = "my-image"
  security_groups = ["my-secgroup"]
  key_pair        = "my-keypair"

  metadata = {
    ssh_user   = "my-user"
  }

  network {
    name =  "${local.test_network_name}"
    fixed_ip_v4 = "10.0.1.30"
  }

  depends_on = [ 
    opentelekomcloud_networking_subnet_v2.private_network_subnet
  ]
}

If I then terraform taint 'opentelekomcloud_compute_instance_v2.compute_node[0]' and do a terraform apply the instance gets replaced and the routes are missing.

If I then re-run terraform apply it says No changes. Your infrastructure matches the configuration. eventhough the static routes are missing.

If I taint a static route terraform taint 'opentelekomcloud_networking_router_route_v2.router_vpn_routes["192.168.0.1/32"]' then it will fail destruction:

opentelekomcloud_networking_router_route_v2.router_vpn_routes["192.168.0.1/32"]: Destroying... [id=61e4b20c-c5cf-4ce8-aedb-90686a7cfd25-route-192.168.0.1/32-10.0.1.30]
╷
│ Error: Can't find route to 192.168.0.1/32 via 10.0.1.30 on OpenTelekomCloud Neutron Router 61e4b20c-c5cf-4ce8-aedb-90686a7cfd25

It makes sense that the route cannot be found but then destruction should just realize there is no work to do (because we expect terraform to handle the idempotency via the providers) and just create the new route.

muneeb-jan commented 4 days ago

Hi @pvbouwel

Could you try like this?


locals {
  test_network_name = "test-network-pvbouwel"
  test_subnet       = "test-subnet-pvbouwel-1"
  test_router       = "test-router-pvbouwel"
}

resource "opentelekomcloud_networking_network_v2" "testnetwork" {
  name           = local.test_network_name
  admin_state_up = true
}

resource "opentelekomcloud_networking_router_v2" "router" {
  name           = "${local.test_network_name}-router"
  admin_state_up = true
}

resource "opentelekomcloud_networking_subnet_v2" "private_network_subnet" {
  name       = local.test_subnet
  network_id = opentelekomcloud_networking_network_v2.testnetwork.id
  cidr       = "10.0.1.0/24"
  ip_version = 4
}
resource "opentelekomcloud_networking_router_interface_v2" "private_network_router_interface" {
  router_id = opentelekomcloud_networking_router_v2.router.id
  subnet_id = opentelekomcloud_networking_subnet_v2.private_network_subnet.id
}

resource "opentelekomcloud_networking_port_v2" "instance_port_1" {
  name       = "my_port"
  network_id = opentelekomcloud_networking_network_v2.testnetwork.id
  fixed_ip {
    subnet_id  = opentelekomcloud_networking_subnet_v2.private_network_subnet.id
    ip_address = "10.0.1.30"
  }
}

resource "opentelekomcloud_networking_router_route_v2" "router_vpn_routes" {
  for_each         = toset(["192.168.0.1/32", "192.168.0.2/32"])
  router_id        = opentelekomcloud_networking_router_v2.router.id
  destination_cidr = each.value
  next_hop         = "10.0.1.30"

  depends_on = [opentelekomcloud_compute_instance_v2.compute_node]
}

resource "opentelekomcloud_compute_instance_v2" "compute_node" {
  name            = "pvbouwel-test"
  flavor_id       = "s3.large.8"
  image_name      = "image"
  security_groups = ["default"]
  key_pair        = "your-keypair"

  metadata = {
    ssh_user = "my-user"
  }

  network {
    port = opentelekomcloud_networking_port_v2.instance_port_1.id
  }

  depends_on = [
    opentelekomcloud_networking_subnet_v2.private_network_subnet
  ]
}
anton-sidelnikov commented 4 days ago

@pvbouwel the main idea above is to connect instance not by network name, but by port, seems that port behaves differently and triggers recreation of routes

pvbouwel commented 2 days ago

Thanks @muneeb-jan this indeed helps manipulate the ECS without getting routing issues. Since the port will keep on existing.

@anton-sidelnikov to me this is an acceptable workaround for opentelekomcloud_networking_router_route_v2 and our current setup.

Just curious as you mentioned the deprecation of opentelekomcloud_networking_router_route_v2. The strategic solution is to migrate to VPC. If I understand correctly there won´t be a need for the intermediate port but it requires an internal fix ( https://jira.tsi-dev.otc-service.com/browse/BM-5965 ). Is that correct?

Is there planned work to close the gap between opentelekomcloud_vpc_route_v2 and opentelekomcloud_networking_router_route_v2 ? Because opentelekomcloud_vpc_route_v2 only supports routes of type peering and you can only seem to specify other routes in opentelekomcloud_vpc_route_table_v1 for VPC which means that you need to know all routes (except non-peering ones) at route table creation time which causes more dependencies between terraform modules.

For example if you have a base module that creates your VPC and router. And you want to have an optional module that configures a NAT gateway then the base module needs to know about the NAT gateway module and be the one calling it because it needs to retrieve the NAT gateway IP to configure the route. With the old resources you don´t have that limitation because the NAT module just needs to know the network it arrives into and it can manage its own route resource (with the workaround port + route resource).

anton-sidelnikov commented 2 days ago

Hi @pvbouwel, requires an internal fix ( https://jira.tsi-dev.otc-service.com/browse/BM-5965 ). Is that correct? - Yes, that’s correct.

As for opentelekomcloud_vpc_route_v2, I haven’t heard of any plans to improve this resource—it’s likely they may even deprecate it. I believe the main goal is to fix everything within the API for opentelekomcloud_vpc_route_table_v1.

Regarding the last sentence, we’ll likely discuss potential solutions for this.