IMPORTANT: You are viewing a beta version of the official module to install Weights & Biases. This new version is incompatible with earlier versions, and it is not currently meant for production use. Please contact your Customer Success Manager for details before using.
This is a Terraform module for provisioning a Weights & Biases Cluster on AWS. Weights & Biases Local is our self-hosted distribution of wandb.ai. It offers enterprises a private instance of the Weights & Biases application, with no resource limits and with additional enterprise-grade architectural features like audit logging and SAML single sign-on.
This module is intended to run in an AWS account with minimal preparation, however it does have the following pre-requisites:
If you are managing DNS via AWS Route53 the hosted zone entry is created automatically as part of your domain management.
If you're managing DNS outside of Route53, you will need to:
{subdomain}.{domain}
(e.g test.wandb.ai
)external_dns
option in this moduleYou can learn more about creating a hosted zone for a
subdomain,
which you will need to do for the subdomain you are planning to use for your
Weights & Biases installation. To create this hosted zone with Terraform, use
the aws_route53_zone
resource.
While this is not required, it is recommend to already have an existing ACM certification. Certificate validation can take up two hours, causing timeouts during module apply if the cert is generated as one of the resources contained in the module.
Ensure account meets module pre-requisites from above.
Please note that while some resources are individually and uniquely tagged, all common tags are expected to be configured within the AWS provider as shown in the example code snippet below.
Create a Terraform configuration that pulls in this module and specifies values of the required variables:
provider "aws" {
region = "<your AWS region>"
default_tags {
tags = var.common_tags
}
}
module "wandb" {
source = "<filepath to cloned module directory>"
namespace = "<prefix for naming AWS resources>"
}
terraform init
and terraform apply
We have included documentation and reference examples for additional common installation scenarios for Weights & Biases, as well as examples for supporting resources that lack official modules.
Users can update the EKS cluster version to the latest version offered by AWS. This can be done using the environment variable eks_cluster_version
. Note that, cluster and nodegroup version updates can only be done in increments of one version at a time. For example, if your current cluster version is 1.21
and the latest version available is 1.25
- you'd need to:
1.21
to 1.22
terraform apply
1.23
terraform apply
1.24
...and so on and so forth.Upgrades must be executed in step-wise fashion from one version to the next. You cannot skip versions when upgrading EKS.
Name | Version |
---|---|
terraform | ~> 1.0 |
aws | ~> 4.0 |
kubernetes | ~> 2.23 |
Name | Version |
---|---|
aws | ~> 4.0 |
Name | Source | Version |
---|---|---|
acm | terraform-aws-modules/acm/aws | ~> 3.0 |
app_eks | ./modules/app_eks | n/a |
app_lb | ./modules/app_lb | n/a |
database | ./modules/database | n/a |
file_storage | ./modules/file_storage | n/a |
iam_role | ./modules/iam_role | n/a |
kms | ./modules/kms | n/a |
networking | ./modules/networking | n/a |
private_link | ./modules/private_link | n/a |
redis | ./modules/redis | n/a |
s3_endpoint | ./modules/endpoint | n/a |
wandb | wandb/wandb/helm | 1.2.0 |
Name | Type |
---|---|
aws_region.current | data source |
aws_s3_bucket.file_storage | data source |
aws_sqs_queue.file_storage | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
acm_certificate_arn | The ARN of an existing ACM certificate. | string |
null |
no |
allowed_inbound_cidr | CIDRs allowed to access wandb-server. | list(string) |
n/a | yes |
allowed_inbound_ipv6_cidr | CIDRs allowed to access wandb-server. | list(string) |
n/a | yes |
allowed_private_endpoint_cidr | Private CIDRs allowed to access wandb-server. | list(string) |
[] |
no |
app_wandb_env | Extra environment variables for W&B | map(string) |
{} |
no |
aws_loadbalancer_controller_tags | (Optional) A map of AWS tags to apply to all resources managed by the load balancer controller | map(string) |
{} |
no |
bucket_kms_key_arn | n/a | string |
"" |
no |
bucket_name | n/a | string |
"" |
no |
bucket_path | path of where to store data for the instance-level bucket | string |
"" |
no |
create_bucket | ######################################### External Bucket # ######################################### Most users will not need these settings. They are ment for users who want a bucket and sqs that are in a different account. | bool |
true |
no |
create_elasticache | Boolean indicating whether to provision an elasticache instance (true) or not (false). | bool |
true |
no |
create_vpc | Boolean indicating whether to deploy a VPC (true) or not (false). | bool |
true |
no |
custom_domain_filter | A custom domain filter to be used by external-dns instead of the default FQDN. If not set, the local FQDN is used. | string |
null |
no |
database_binlog_format | Specifies the binlog_format value to set for the database | string |
"ROW" |
no |
database_engine_version | Version for MySQL Auora | string |
"8.0.mysql_aurora.3.05.2" |
no |
database_innodb_lru_scan_depth | Specifies the innodb_lru_scan_depth value to set for the database | number |
128 |
no |
database_instance_class | Instance type to use by database master instance. | string |
"db.r5.large" |
no |
database_kms_key_arn | n/a | string |
"" |
no |
database_master_username | Specifies the master_username value to set for the database | string |
"wandb" |
no |
database_name | Specifies the name of the database | string |
"wandb_local" |
no |
database_performance_insights_kms_key_arn | Specifies an existing KMS key ARN to encrypt the performance insights data if performance_insights_enabled is was enabled out of band | string |
"" |
no |
database_snapshot_identifier | Specifies whether or not to create this cluster from a snapshot. You can use either the name or ARN when specifying a DB cluster snapshot, or the ARN when specifying a DB snapshot | string |
null |
no |
database_sort_buffer_size | Specifies the sort_buffer_size value to set for the database | number |
67108864 |
no |
deletion_protection | If the instance should have deletion protection enabled. The database / S3 can't be deleted when this value is set to true . |
bool |
true |
no |
domain_name | Domain for accessing the Weights & Biases UI. | string |
n/a | yes |
eks_cluster_version | EKS cluster kubernetes version | string |
n/a | yes |
eks_policy_arns | Additional IAM policy to apply to the EKS cluster | list(string) |
[] |
no |
elasticache_node_type | The type of the redis cache node to deploy | string |
"cache.t2.medium" |
no |
enable_dummy_dns | Boolean indicating whether or not to enable dummy DNS for the old alb | bool |
false |
no |
enable_operator_alb | Boolean indicating whether to use operatore ALB (true) or not (false). | bool |
false |
no |
enable_yace | deploy yet another cloudwatch exporter to fetch aws resources metrics | bool |
true |
no |
external_dns | Using external DNS. A subdomain must also be specified if this value is true. |
bool |
false |
no |
extra_fqdn | Additional fqdn's must be in the same hosted zone as domain_name . |
list(string) |
[] |
no |
kms_key_alias | KMS key alias for AWS KMS Customer managed key. | string |
null |
no |
kms_key_deletion_window | Duration in days to destroy the key after it is deleted. Must be between 7 and 30 days. | number |
7 |
no |
kms_key_policy | The policy that will define the permissions for the kms key. | string |
"" |
no |
kubernetes_alb_internet_facing | Indicates whether or not the ALB controlled by the Amazon ALB ingress controller is internet-facing or internal. | bool |
true |
no |
kubernetes_alb_subnets | List of subnet ID's the ALB will use for ingress traffic. | list(string) |
[] |
no |
kubernetes_instance_types | EC2 Instance type for primary node group. | list(string) |
[ |
no |
kubernetes_map_accounts | Additional AWS account numbers to add to the aws-auth configmap. | list(string) |
[] |
no |
kubernetes_map_roles | Additional IAM roles to add to the aws-auth configmap. | list(object({ |
[] |
no |
kubernetes_map_users | Additional IAM users to add to the aws-auth configmap. | list(object({ |
[] |
no |
kubernetes_node_count | Number of nodes | number |
2 |
no |
kubernetes_public_access | Indicates whether or not the Amazon EKS public API server endpoint is enabled. | bool |
false |
no |
kubernetes_public_access_cidrs | List of CIDR blocks which can access the Amazon EKS public API server endpoint. | list(string) |
[] |
no |
license | Weights & Biases license key. | string |
n/a | yes |
namespace | String used for prefix resources. | string |
n/a | yes |
network_cidr | CIDR block for VPC. | string |
"10.10.0.0/16" |
no |
network_database_subnet_cidrs | List of private subnet CIDR ranges to create in VPC. | list(string) |
[ |
no |
network_database_subnets | A list of the identities of the database subnetworks in which resources will be deployed. | list(string) |
[] |
no |
network_elasticache_subnet_cidrs | List of private subnet CIDR ranges to create in VPC. | list(string) |
[ |
no |
network_elasticache_subnets | A list of the identities of the subnetworks in which elasticache resources will be deployed. | list(string) |
[] |
no |
network_id | The identity of the VPC in which resources will be deployed. | string |
"" |
no |
network_private_subnet_cidrs | List of private subnet CIDR ranges to create in VPC. | list(string) |
[ |
no |
network_private_subnets | A list of the identities of the private subnetworks in which resources will be deployed. | list(string) |
[] |
no |
network_public_subnet_cidrs | List of private subnet CIDR ranges to create in VPC. | list(string) |
[ |
no |
network_public_subnets | A list of the identities of the public subnetworks in which resources will be deployed. | list(string) |
[] |
no |
other_wandb_env | Extra environment variables for W&B | map(any) |
{} |
no |
parquet_wandb_env | Extra environment variables for W&B | map(string) |
{} |
no |
private_link_allowed_account_ids | List of AWS account IDs allowed to access the VPC Endpoint Service | list(string) |
[] |
no |
private_only_traffic | Enable private only traffic from customer private network | bool |
false |
no |
public_access | Is this instance accessable a public domain. | bool |
false |
no |
size | Deployment size | string |
null |
no |
ssl_policy | SSL policy to use on ALB listener | string |
"ELBSecurityPolicy-FS-1-2-Res-2020-10" |
no |
subdomain | Subdomain for accessing the Weights & Biases UI. Default creates record at Route53 Route. | string |
null |
no |
system_reserved_cpu_millicores | (Optional) The amount of 'system-reserved' CPU millicores to pass to the kubelet. For example: 100. A value of -1 disables the flag. | number |
70 |
no |
system_reserved_ephemeral_megabytes | (Optional) The amount of 'system-reserved' ephemeral storage in megabytes to pass to the kubelet. For example: 1000. A value of -1 disables the flag. | number |
750 |
no |
system_reserved_memory_megabytes | (Optional) The amount of 'system-reserved' memory in megabytes to pass to the kubelet. For example: 100. A value of -1 disables the flag. | number |
100 |
no |
system_reserved_pid | (Optional) The amount of 'system-reserved' process ids [pid] to pass to the kubelet. For example: 1000. A value of -1 disables the flag. | number |
500 |
no |
use_internal_queue | n/a | bool |
false |
no |
weave_wandb_env | Extra environment variables for W&B | map(string) |
{} |
no |
yace_sa_name | n/a | string |
"wandb-yace" |
no |
zone_id | Domain for creating the Weights & Biases subdomain on. | string |
n/a | yes |
Name | Description |
---|---|
bucket_name | n/a |
bucket_path | n/a |
bucket_queue_name | n/a |
bucket_region | n/a |
cluster_id | n/a |
cluster_node_role | n/a |
database_connection_string | n/a |
database_instance_type | n/a |
database_password | n/a |
database_username | n/a |
eks_node_count | n/a |
eks_node_instance_type | n/a |
elasticache_connection_string | n/a |
internal_app_port | n/a |
kms_key_arn | The Amazon Resource Name of the KMS key used to encrypt data at rest. |
network_id | The identity of the VPC in which resources are deployed. |
network_private_subnets | The identities of the private subnetworks deployed within the VPC. |
network_public_subnets | The identities of the public subnetworks deployed within the VPC. |
redis_instance_type | n/a |
standardized_size | n/a |
url | The URL to the W&B application |
See our upgrade guide here
module "wandb" {
version = "4.x"
# ...
license = "<your license key>"
# ...
}
database_kms_key_arn
bucket_kms_key_arn
This can be donw by adding the following policy document.
{
"Sid": "Allow use of the key",
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::<Account_id>:root"
]
},
"Action": [
"kms:Encrypt",
"kms:Decrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:DescribeKey"
],
"Resource": "*"
}