radius-project / radius

Radius is a cloud-native, portable application platform that makes app development easier for teams building cloud-native apps.
https://radapp.io
Apache License 2.0
1.45k stars 92 forks source link

Design region for AWS in Bicep #6376

Open Reshrahim opened 1 year ago

Reshrahim commented 1 year ago

Summary

Regions today are currently specified via rad init. This is a similar experience as running aws configure where we specify a default region.

Today, there is no way to default a deployment to a different region than the one specified in config.yaml. If a user wanted to deploy a MemoryDB to us-west-1 and us-east-1, they would need to uninstall and reinstall the control plane or edit the workspace file to specify a new region.

Thing we learned

Tasks

AB#4035

asilverman commented 1 year ago

Context

AWS Regions are separate geographic areas that AWS uses to house its infrastructure. Any resource deployed to AWS must specify a region where it will be hosted.

Newer services and features are deployed to Regions gradually. Although all AWS Regions have the same service level agreement (SLA), some larger Regions are usually first to offer newer services, features, and software releases. Smaller Regions may not get these services or features in time for you to use them to support your workload.

Since the supported services for each region are constantly being updated. AWS makes it possible to inspect https://api.regional-table.region-services.aws.a2z.com/index.json to identify what the supported regions for a given AWS Service.

Corollary: Not all AWS services are available in all AWS regions

  1. At the time of writing of this comment an AWS Provider installation is scoped to a Radius workspace. This means that only one provider can be installed, and it is the one used to make deployment requests to the AWS cloud provider.
  2. The installation of the AWS Provider asks the user to specify a region during installation, this region will be used by UCP to deploy all AWS resources in the Bicep manifests deployed against that Radius workspace.

As part of this investigation I saw several official documents that disprove this claim, AWS supports and allows for multi-region deployments as evidenced by section: Multi-Region deployment in https://aws.amazon.com/blogs/architecture/what-to-consider-when-selecting-a-region-for-your-workloads/ and the official tutorial Creating a Multi-Region Application with AWS Services

I believe this disproves the following claims made in the issue description

Deployment to two different regions in the same deployment is not advised/well supported on AWS

Following we explore the existing proposals and their limitations to address the AWS region configuration for Radius deployments

Option 1: Keep the current implementation

In this option the region is set during the installation of the AWS Provider and used for all AWS resources targeting the hosting Radius workspace.

Limitations

Option 2: Specify the region in the Bicep manifest as part of the AWS extensibility module configuration

The Bicep extensibility features make it possible to implement the region setting as part of configuration fields of an extensibility module as shown in Example 1 below

Example 1:

import aws as aws {
  region: 'us-west-2'  // This is not currently implemented but possible as part of the definition of the extensibility module
}

resource myVpc 'AWS.EC2/VPC@default' = {
  name: 'my-test-vpc'
  properties: {
    CidrBlock: '10.0.0.0/16'
  }
}

resource mySubNet 'AWS.EC2/Subnet@default' = {
  name: 'my-test-subnet'
  properties: {
    VpcId: myVpc.id
    CidrBlock: '10.0.1.0/24'
  }
}

resource sg 'AWS.MemoryDB/SubnetGroup@default' = {
  name: 'test-sg'
  properties: {
    SubnetGroupName: 'test-sg'
    SubnetIds: [ 
      mySubNet.id
    ]
  }
}

resource sampleResource 'AWS.MemoryDB/Cluster@default' = {
  name: 'mysample'
  properties: {
    GroupName: sg.name
    NodeType: 'db.t4g.small' 
    ACLName: 'open-access'
  }
}

Limitations:

* A user will not be able to model a multi-region application like the one described in Creating a Multi-Region Application with AWS Services using Radius

A user can model a multi-region application like the one described in Creating a Multi-Region Application with AWS Services using Radius by specifying multiple import statements with different configurations. The user must fully qualify the resources to use the correct provider symbol as shown below.

import aws as awsA {
  region: 'us-west-1'
}

import aws as awsB {
  region: 'us-west-2'
}

resource myVpc 'awsA:AWS.EC2/VPC@default' = {
  name: 'my-test-vpc1'
  properties: {
    CidrBlock: '10.0.0.0/16'
  }
}

resource myVpc2 'awsB:AWS.EC2/VPC@default' = {
  name: 'my-test-vpc2'
  properties: {
    CidrBlock: '10.0.0.0/16'
  }
}

resource myVpc3 'AWS.EC2/VPC@default' = { // since here the type is not fully qualified it will use the last import defined (`awsB`)
  name: 'my-test-vpc3'
  properties: {
    CidrBlock: '10.0.0.0/16'
  }
}

Eample 2:

import aws as aws {
  region: 'us-west-2' 
}

resource myVpc 'AWS.EC2/VPC@default' = {
  name: 'my-test-vpc'
  properties: {
    CidrBlock: '10.0.0.0/16'
  }
}

resource mySubNet 'AWS.EC2/Subnet@default' = {
  name: 'my-test-subnet'
  properties: {
    VpcId: myVpc.id
    CidrBlock: '10.0.1.0/24'
  }
}

module mdb 'cutomModule.bicep' = {
  name: 'foobar'
  params: {
    memDbClusterName: 'mytestmemorydb'
    subnet: mySubNet
  }
}

cutomModule.bicep

param  memDbClusterName string
param subnet object

import aws as aws {
  region:'us-west-1'
}

resource sg 'AWS.MemoryDB/SubnetGroup@default' = {
  name: 'test-sg'
  properties: {
    SubnetGroupName: 'test-sg'
    SubnetIds: [
      subnet.id
    ]

  }
}

resource mdbUser 'AWS.MemoryDB/User@default' = {
  name: 'test-user'
  properties: {
    UserName: 'test-user'
    AccessString: '~objects:* ~items:* ~public:*'
    AuthenticationMode: {
      Passwords: [
        'abc'
      ]
      Type: 'password'
    }
  }
}

arm.json result of compiling the above manifest

{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
  "languageVersion": "1.9-experimental",
  "contentVersion": "1.0.0.0",
  "metadata": {
    "_generator": {
      "name": "bicep",
      "version": "0.7.29.21231",
      "templateHash": "10794570810347203498"
    }
  },
  "imports": {
    "aws": {
      "provider": "AWS",
      "version": "0.1",
      "config": {
        "region": "us-west-2",
        "account": "12345"
      }
    }
  },
  "resources": {
    "myVpc": {
      "import": "aws",
      "type": "AWS.EC2/VPC@default",
      "properties": {
        "name": "my-test-vpc",
        "properties": {
          "CidrBlock": "10.0.0.0/16"
        }
      }
    },
    "mySubNet": {
      "import": "aws",
      "type": "AWS.EC2/Subnet@default",
      "properties": {
        "name": "my-test-subnet",
        "properties": {
          "VpcId": "[reference('myVpc').id]",
          "CidrBlock": "10.0.1.0/24"
        }
      },
      "dependsOn": [
        "myVpc"
      ]
    },
    "mdb": {
      "type": "Microsoft.Resources/deployments",
      "apiVersion": "2020-10-01",
      "name": "foobar",
      "properties": {
        "expressionEvaluationOptions": {
          "scope": "inner"
        },
        "mode": "Incremental",
        "parameters": {
          "memDbClusterName": {
            "value": "mytestmemorydb"
          },
          "subnet": {
            "value": "[reference('mySubNet')]"
          }
        },
        "template": {
          "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
          "languageVersion": "1.9-experimental",
          "contentVersion": "1.0.0.0",
          "metadata": {
            "_generator": {
              "name": "bicep",
              "version": "0.7.29.21231",
              "templateHash": "2784010825862566031"
            }
          },
          "parameters": {
            "memDbClusterName": {
              "type": "string"
            },
            "subnet": {
              "type": "object"
            }
          },
          "imports": {
            "aws": {
              "provider": "AWS",
              "version": "0.1",
              "config": {
                "region": "us-west-1"
              }
            }
          },
          "resources": {
            "sg": {
              "import": "aws",
              "type": "AWS.MemoryDB/SubnetGroup@default",
              "properties": {
                "name": "test-sg",
                "properties": {
                  "SubnetGroupName": "test-sg",
                  "SubnetIds": [
                    "[parameters('subnet').id]"
                  ]
                }
              }
            },
            "mdbUser": {
              "import": "aws",
              "type": "AWS.MemoryDB/User@default",
              "properties": {
                "name": "test-user",
                "properties": {
                  "UserName": "test-user",
                  "AccessString": "~objects:* ~items:* ~public:*",
                  "AuthenticationMode": {
                    "Passwords": [
                      "abc"
                    ],
                    "Type": "password"
                  }
                }
              }
            },
            "acl": {
              "import": "aws",
              "type": "AWS.MemoryDB/ACL@default",
              "properties": {
                "name": "test-acl",
                "properties": {
                  "ACLName": "test-acl",
                  "UserNames": [
                    "[reference('mdbUser').name]"
                  ]
                }
              },
              "dependsOn": [
                "mdbUser"
              ]
            },
            "testResource": {
              "import": "aws",
              "type": "AWS.MemoryDB/Cluster@default",
              "properties": {
                "name": "[parameters('memDbClusterName')]",
                "properties": {
                  "ClusterName": "[parameters('memDbClusterName')]",
                  "ParameterGroupName": "[reference('sg').name]",
                  "NodeType": "db.t4g.small",
                  "ACLName": "[parameters('memDbClusterName')]"
                }
              },
              "dependsOn": [
                "sg"
              ]
            }
          }
        }
      },
      "dependsOn": [
        "mySubNet"
      ]
    }
  }
}

Example 3:

import aws as aws {
  region: 'us-west-2' //https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html
}

resource myVpc 'AWS.EC2/VPC@default' = {
  name: 'my-test-vpc'
  properties: {
    CidrBlock: '10.0.0.0/16'
  }
}

resource mySubNet 'AWS.EC2/Subnet@default' existing = { 
 // How does the author know if this subnet is available in 'us-west-2' ? 
  name: 'my-test-subnet'
}

module mdb 'cutomModule.bicep' = {
  name: 'foobar'
  params: {
    memDbClusterName: 'mytestmemorydb'
    subnet: mySubNet
    targetRegion: 'us-west-2'
  }
}

Example 4:

import aws as aws {
  region: 'us-west-2' 
}

param storageAccountName string = 'stcontoso'
param storageAccountSettings object = {
  location: resourceGroup().location
  kind: 'StorageV2'
  sku: 'Standard_LRS'
}

resource storageAccount 'Microsoft.Storage/storageAccounts@2021-02-01' = {
  name: storageAccountName
  location: storageAccountSettings.location
  kind: storageAccountSettings.kind
  sku: {
    name: storageAccountSettings.sku
  }
}

resource myVpc 'AWS.EC2/VPC@default' = {
  name: 'my-test-vpc'
  properties: {
    CidrBlock: '10.0.0.0/16'
  }
}

Option 3: Design and include a region metadata field in the Bicep model that is validated against https://api.regional-table.region-services.aws.a2z.com/index.json during authoring.

This option proposes the use of a field equivalent to Azure's location metadata called region that is required for all AWS Bicep extensibility types. The value specified would be used to determine the location of the resource and can be dereferenced for existing types (for example: mySubNet.region in Example 3)

Operationally speaking the Bicep Language Server would fetch the region info (HTTP GET on https://api.regional-table.region-services.aws.a2z.com/index.json) and validate the resource type is available in the specified region, otherwise it would result in a compilation error.

Limitations:

Decision

Task Breakdown

willdavsmith commented 1 year ago

Awesome writeup Ari. One question - what would it mean for region to be validated against the AWS region table during authoring? I don't think this is validation that we could do on the Bicep compiler side. We could do this validation during deployment but it would still be reported at runtime to the user. Am I missing something?

jkotalik commented 1 year ago

As part of this investigation I saw several official documents that disprove this claim, AWS supports and allows for multi-region deployments as evidenced by section: Multi-Region deployment in https://aws.amazon.com/blogs/architecture/what-to-consider-when-selecting-a-region-for-your-workloads/ and the official tutorial Creating a Multi-Region Application with AWS Services

I believe this disproves the following claims made in the issue description

Deployment to two different regions in the same deployment is not advised/well supported on AWS

@rynowak mentioned that though multiregion are supported in AWS, most of the mainline paths and tutorials avoid doing multi-region deployments. Ryan do you have some examples of this?

These are good reference points though and may factor into eventually specifying different regions for different resources in the future.

For Option 1:

To change the target region configuration the user must remove the Radius control plane installation and start a fresh installation. Since AWS provider installation and Radius control-plane installation take place as part of an atomic transaction.

This is a big blocker and seems like something that should be defined in bicep as mentioned in further. Totally agree.

For Option 2:

A user will not be able to model a multi-region application like the one described in Creating a Multi-Region Application with AWS Services using Radius

Someone can always have multiple imports, ex:

import aws as aws {
  region: 'us-west-1'
}

import aws as aws2 {
  region: 'us-west-2'
}

I don't think designing around multiregion is a high priority but we at least have options.

The behavior of DE/UCP in the handling of a Bicep module that contains an extensibility module with a configuration must be designed / defined (see Example 2).

We can confirm with the ARM team what their design is. https://github.com/Azure/bicep/issues/6653 for follow up discussion.

It is implicit and up to the user to discover (by different means outside of the bicep file) if a reference to an existing AWS resource is hosted in the same or a different region than the one specified in the import module statement (see Example 3)

I don't view this as a different problem with existing azure resources today, is there a core difference?

jkotalik commented 1 year ago

Option 3:

Few other downsides:

This option proposes the use of a field equivalent to Azure's location metadata called region that is required for all AWS Bicep extensibility types. The value specified would be used to determine the location of the resource and can be dereferenced for existing types (for example: mySubNet.region in Example 3)

We can also add a linter which has a warning saying unrecognized region. Similar to the location warning for azure resources.

jkotalik commented 1 year ago

Thought I just realized we can probably do the same validation for option 2 and option 3 if we find a way to make it work with linters. One would be on the import and one on the resource.

Another thing to consider is what if we eventually support both specifying the location on the import and have the region on the AWS resource as optional, as that would allow us to incrementally add a multiregion story if we want/need it.

rynowak commented 1 year ago

My perspective on this is that we're doing the right thing for now by keeping things simple and optimizing for the common case. None of the decisions we've made so far paint us into a corner, and everything can be made more flexible in the future as we need to. I think all of the ideas @asilverman shared here are good examples of that.

For a concurring example, the terraform provider for AWS supports a single region per-instance of the provider. If you want to deploy to multiple regions, you can do so by declaring multiple instances of the provider.

Bicep has the same design for providers (intentionally) where you can declare multiple instances with different configuration.


It's also useful to understand two other things:

For starters Azure has many more regions than AWS but fewer availability zones within regions. Many Azure regions only have a single AZ. From what I've seen so far AWS also surfaces the availability zone concept pretty prominently whereas Azure does not except for a few services.

AWS also affinities their UX to regions. In the CLI and Console you can only work with one region at a time. This leads users to think a lot more about using a single region where possible, and get resiliency by configuring AZ policies, for example creating a Kubernetes cluster with nodepools in different AZs.

asilverman commented 1 year ago

Awesome writeup Ari. One question - what would it mean for region to be validated against the AWS region table during authoring? I don't think this is validation that we could do on the Bicep compiler side. We could do this validation during deployment, but it would still be reported at runtime to the user. Am I missing something?

I think the idea would be to have the Bicep Language Server make a HTTP GET on https://api.regional-table.region-services.aws.a2z.com/index.json and cache the results and use them to determine if the value is kosher or not. I updated the original writeup to add these details. FWIW this can be also validated during runtime, but the point is to shift left by leveraging the static analysis.

asilverman commented 1 year ago

Someone can always have multiple imports, ex:

import aws as aws {
  region: 'us-west-1'
}

import aws as aws2 {
  region: 'us-west-2'
}

@jkotalik - yes, you can specify multiple exports but how do you know which resource belongs to which export? That is not clear to me ATM

UPDATE: I answered this myself, the way to do this is to fully qualify the resource with the provider alias (see example below), I also updated the original comment to reflect this new finding.

Note: If not fully qualified, the last import defined will be used, in the example that means that vpc2 will use awsB

import aws as awsA
import aws as awsB

resource vpc1 'awsA:AWS.EC2/VPC@default' = {
  name: 'my-test-vpc1'
  properties: {
    CidrBlock: '10.0.0.0/16'
  }
}

resource vpc2 'AWS.EC2/VPC@default' = {
  name: 'my-test-vpc2'
  properties: {
    CidrBlock: '10.0.0.0/16'
  }
}
asilverman commented 1 year ago

It is implicit and up to the user to discover (by different means outside of the bicep file) if a reference to an existing AWS resource is hosted in the same or a different region than the one specified in the import module statement (see Example 3)

I don't view this as a different problem with existing azure resources today, is there a core difference?

You are right, this is also a gap with Azure resources. TIL that bicep will assume that an existing resource is in the same resource group as the current deployment, this is not well defined for extensibility resource types. In addition, you can also reference resources belonging to a different scope by setting the scope parameter which I don't believe has been designed for extensibility resource types. More info about this can be found here: https://learn.microsoft.com/en-us/azure/azure-resource-manager/bicep/existing-resource

We can work on this gaps separately. I created https://github.com/project-radius/radius/issues/3964 to track this work

asilverman commented 1 year ago
  • AWS' model for regions is different than Azure's model for regions (location field).

I think this is a great point @rynowak. I'm curious if you think it's the concern of Radius to provide an abstraction that can be used for both as part of the modeling of a Radius Type or if this is tangential to Radius and is rather in the scope of a vertically developed Bicep Extensibility provider for AWS resources

rynowak commented 1 year ago

I don't think the Bicep support for AWS should try to provide abstractions. WYSIWG 😆

asilverman commented 1 year ago

I don't think the Bicep support for AWS should try to provide abstractions. WYSIWG 😆

So how does Radius manage the inconsistent user experience between AWS resources and Azure resources in Bicep?

rynowak commented 1 year ago

I don't think the Bicep support for AWS should try to provide abstractions. WYSIWG 😆

So how does Radius manage the inconsistent user experience between AWS resources and Azure resources in Bicep?

What I'm saying is that if we need abstractions then the lowest level is not the right place to provide them. TBH I'm not really sure what we're talking about anymore.

asilverman commented 1 year ago

We can also add a linter which has a warning saying unrecognized region. Similar to the location warning for azure resources.

No such warnings exist, the only thing you get is the following: image

asilverman commented 1 year ago

What I'm saying is that if we need abstractions then the lowest level is not the right place to provide them. TBH I'm not really sure what we're talking about anymore.

We are talking about Azure resources specifying a location field that is incompatible with AWS Resources specifically and Bicep extensibility resources generally and how it affects the ability for Radius customers to model their applications with a consistent experience

rynowak commented 1 year ago

Thanks that helps explain what this issue is about.

I don't think we should try to make AWS resources work like Azure resources. We should expose AWS's concepts to AWS users because they are important. So what I'm saying is that an Azure resource definition and an AWS resource definition should not be consistent, they need to be faithful to the underlying concepts.

I don't understand the comment about Bicep extensibility. The extensibility provider defines the schema of the resources.

how it affects the ability for Radius customers to model their applications with a consistent experience

I don't understand this comment either. Everything we've been discussing is about how Bicep exposes the resource model of Azure and AWS. Where does Radius enter the picture?