googlegenomics / gcp-variant-transforms

GCP Variant Transforms
Apache License 2.0
134 stars 55 forks source link

Network parameter not being read #696

Open abalter opened 3 years ago

abalter commented 3 years ago

I'm using this script:

#!/bin/bash
# Parameters to replace:
# The GOOGLE_CLOUD_PROJECT is the project that contains your BigQuery dataset.
GOOGLE_CLOUD_PROJECT=psjh-eacri-data
INPUT_PATTERN=https://storage.googleapis.com/gcp-public-data--gnomad/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.*.vcf.bgz
# INPUT_PATTERN=gs://gcp-public-data--gnomad/release/2.1.1/vcf/exomes/*.vcf.bgz
OUTPUT_TABLE=eacri-genomics:gnomad.gnomad_hg19_2_1_1
TEMP_LOCATION=gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp

COMMAND="vcf_to_bq \
    --input_pattern ${INPUT_PATTERN} \
    --output_table ${OUTPUT_TABLE} \
    --temp_location ${TEMP_LOCATION} \
    --job_name vcf-to-bigquery \
    --runner DataflowRunner \
    --zones us-east1-b \
    --network projects/phs-205720/global/networks/psjh-shared01 \
    --subnet projects/phs-205720/regions/us-east1/subnetworks/subnet01"

docker run -v ~/.config:/root/.config \
    gcr.io/cloud-lifesciences/gcp-variant-transforms \
    --project "${GOOGLE_CLOUD_PROJECT}" \
    --temp_location ${TEMP_LOCATION} \
    "${COMMAND}"

And, yet, the error says that the network was not specified, and the network slot is empty in the JSON output.

What change do I need to make to my script? Or, is some other format needed to specify the network?

The script template doesn't include a network or subnet parameter at all.

base) jupyter@balter-genomics:~$ ./script.sh
 --project 'psjh-eacri-data' --temp_location 'gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp' -- 'vcf_to_bq     --input_pattern gs://gcp-public-data--gnomad/release/2.1.1/vcf/exomes/*.vcf.bgz     --output_table eacri-genomics:gnomad.gnomad_hg19_2_1_1     --temp_location gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp     --job_name vcf-to-bigquery     --runner DataflowRunner     --zones us-east1-b     --subnet subnet03'
Your active configuration is: [variant]
{
  "pipeline": {
    "actions": [
      {
        "commands": [
          "-c",
          "mkdir -p /mnt/google/.google/tmp"
        ],
        "entrypoint": "bash",
        "imageUri": "gcr.io/cloud-genomics-pipelines/io",
        "mounts": [
          {
            "disk": "google",
            "path": "/mnt/google"
          }
        ]
      },
      {
        "commands": [
          "-c",
          "/opt/gcp_variant_transforms/bin/vcf_to_bq --input_pattern gs://gcp-public-data--gnomad/release/2.1.1/vcf/exomes/*.vcf.bgz --output_table eacri-genomics:gnomad.gnomad_hg19_2_1_1 --temp_location gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp --job_name vcf-to-bigquery --runner DataflowRunner --zones us-east1-b --subnet subnet03 --project psjh-eacri-data --region us-east1 --temp_location gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp"
        ],
        "entrypoint": "bash",
        "imageUri": "gcr.io/cloud-lifesciences/gcp-variant-transforms",
        "mounts": [
          {
            "disk": "google",
            "path": "/mnt/google"
          }
        ]
      },
      {
        "alwaysRun": true,
        "commands": [
          "-c",
          "gsutil -q cp /google/logs/output gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp/runner_logs_20210510_230717.log"
        ],
        "entrypoint": "bash",
        "imageUri": "gcr.io/cloud-genomics-pipelines/io",
        "mounts": [
          {
            "disk": "google",
            "path": "/mnt/google"
          }
        ]
      }
    ],
    "environment": {
      "TMPDIR": "/mnt/google/.google/tmp"
    },
    "resources": {
      "regions": [
        "us-east1"
      ],
      "virtualMachine": {
        "disks": [
          {
            "name": "google",
            "sizeGb": 10
          }
        ],
        "machineType": "g1-small",
        "network": {},
        "serviceAccount": {
          "scopes": [
            "https://www.googleapis.com/auth/cloud-platform",
            "https://www.googleapis.com/auth/devstorage.read_write"
          ]
        }
      }
    }
  }
}
Pipeline running as "projects/447346450878/locations/us-central1/operations/13027962545459232820" (attempt: 1, preemptible: false)
Output will be written to "gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp/runner_logs_20210510_230717.log"
23:07:26 Worker "google-pipelines-worker-ab367d994b1cd7881ebf66950fec6c17" assigned in "us-east1-b" on a "g1-small" machine
23:07:26 Execution failed: allocating: creating instance: inserting instance: Invalid value for field 'resource.networkInterfaces[0].network': ''. The referenced network resource cannot be found.
23:07:27 Worker released
"run": operation "projects/447346450878/locations/us-central1/operations/13027962545459232820" failed: executing pipeline: Execution failed: allocating: creating instance: inserting instance: Invalid value for field 'resource.networkInterfaces[0].network': ''. The referenced network resource cannot be found. (reason: INVALID_ARGUMENT)
(base) jupyter@balter-genomics:~$ ./script.sh
 --project 'psjh-eacri-data' --temp_location 'gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp' -- 'vcf_to_bq     --input_pattern https://storage.googleapis.com/gcp-public-data--gnomad/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.*.vcf.bgz     --output_table eacri-genomics:gnomad.gnomad_hg19_2_1_1     --temp_location gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp     --job_name vcf-to-bigquery     --runner DataflowRunner     --zones us-east1-b     --subnet subnet03'
Your active configuration is: [variant]
{
  "pipeline": {
    "actions": [
      {
        "commands": [
          "-c",
          "mkdir -p /mnt/google/.google/tmp"
        ],
        "entrypoint": "bash",
        "imageUri": "gcr.io/cloud-genomics-pipelines/io",
        "mounts": [
          {
            "disk": "google",
            "path": "/mnt/google"
          }
        ]
      },
      {
        "commands": [
          "-c",
          "/opt/gcp_variant_transforms/bin/vcf_to_bq --input_pattern https://storage.googleapis.com/gcp-public-data--gnomad/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.*.vcf.bgz --output_table eacri-genomics:gnomad.gnomad_hg19_2_1_1 --temp_location gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp --job_name vcf-to-bigquery --runner DataflowRunner --zones us-east1-b --subnet subnet03 --project psjh-eacri-data --region us-east1 --temp_location gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp"
        ],
        "entrypoint": "bash",
        "imageUri": "gcr.io/cloud-lifesciences/gcp-variant-transforms",
        "mounts": [
          {
            "disk": "google",
            "path": "/mnt/google"
          }
        ]
      },
      {
        "alwaysRun": true,
        "commands": [
          "-c",
          "gsutil -q cp /google/logs/output gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp/runner_logs_20210511_000846.log"
        ],
        "entrypoint": "bash",
        "imageUri": "gcr.io/cloud-genomics-pipelines/io",
        "mounts": [
          {
            "disk": "google",
            "path": "/mnt/google"
          }
        ]
      }
    ],
    "environment": {
      "TMPDIR": "/mnt/google/.google/tmp"
    },
    "resources": {
      "regions": [
        "us-east1"
      ],
      "virtualMachine": {
        "disks": [
          {
            "name": "google",
            "sizeGb": 10
          }
        ],
        "machineType": "g1-small",
        "network": {},
        "serviceAccount": {
          "scopes": [
            "https://www.googleapis.com/auth/cloud-platform",
            "https://www.googleapis.com/auth/devstorage.read_write"
          ]
        }
      }
    }
  }
}
Pipeline running as "projects/447346450878/locations/us-central1/operations/3293803574088782620" (attempt: 1, preemptible: false)
Output will be written to "gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp/runner_logs_20210511_000846.log"
00:08:56 Worker "google-pipelines-worker-e05c2864661a5ba9f1b29012de1ac56d" assigned in "us-east1-d" on a "g1-small" machine
00:08:56 Execution failed: allocating: creating instance: inserting instance: Invalid value for field 'resource.networkInterfaces[0].network': ''. The referenced network resource cannot be found.
00:08:57 Worker released
"run": operation "projects/447346450878/locations/us-central1/operations/3293803574088782620" failed: executing pipeline: Execution failed: allocating: creating instance: inserting instance: Invalid value for field 'resource.networkInterfaces[0].network': ''. The referenced network resource cannot be found. (reason: INVALID_ARGUMENT)
rcowin-gcp commented 3 years ago

Hi there,

Thanks for the note. There are a few code changes that have been made recently. The first one you should checkout is here. There are a couple of changes, one is using region instead of zones.

If you want to specify your network and subnet, follow these instructions (which we will push to the main branch soon).

docker run gcr.io/cloud-lifesciences/gcp-variant-transforms \ --project "${GOOGLE_CLOUD_PROJECT}" \ --region us-central1 \ --location us-central1 \ --temp_location "${TEMP_LOCATION}" \ --subnetwork regions/us-central1/subnetworks/my-subnet \ "${COMMAND}"

Please let us know if using these updates work. Thanks!

abalter commented 3 years ago

I tried using this script:

#!/bin/bash

# Parameters to replace:
GOOGLE_CLOUD_PROJECT=$$$$$$$$$
GOOGLE_CLOUD_REGION=us-east1-b
GOOGLE_CLOUD_LOCATION=us-east1-b
TEMP_LOCATION=gs://$$$$$$$$/*.vcf.bgz/tmp
INPUT_PATTERN=https://storage.googleapis.com/gcp-public-data--gnomad/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.*.vcf.bgz
OUTPUT_TABLE=eacri-genomics:gnomad.gnomad_hg19_2_1_1

COMMAND="vcf_to_bq \
  --input_pattern ${INPUT_PATTERN} \
  --output_table ${OUTPUT_TABLE} \
  --job_name vcf-to-bigquery \
  --runner DataflowRunner"

docker run -v ~/.config:/root/.config \
  gcr.io/cloud-lifesciences/gcp-variant-transforms \
  --project "${GOOGLE_CLOUD_PROJECT}" \
  --location "${GOOGLE_CLOUD_LOCATION}" \
  --region "${GOOGLE_CLOUD_REGION}" \
  --temp_location "${TEMP_LOCATION}" \
  "${COMMAND}"

I got:

 --project 'eacri-genomics' --region 'us-east1-b' --temp_location 'gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp' -- 'us-east1-b' 'vcf_to_bq   --input_pattern https://storage.googleapis.com/gcp-public-data--gnomad/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.*.vcf.bgz   --output_table eacri-genomics:gnomad.gnomad_hg19_2_1_1   --job_name vcf-to-bigquery   --runner DataflowRunner'
getopt: unrecognized option '--location'

Suggestion for what to try next?

NOTE: I updated to: gcp-variant-transforms:latest

abalter commented 3 years ago

I tried eliminating the --location tag, and this happened:

 --project 'psjh-eacri-data' --region 'us-east1-b' --temp_location 'gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp' -- 'vcf_to_bq   --input_pattern https://storage.googleapis.com/gcp-public-data--gnomad/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.*.vcf.bgz   --output_table eacri-genomics:gnomad.gnomad_hg19_2_1_1   --job_name vcf-to-bigquery   --runner DataflowRunner'
{
  "pipeline": {
    "actions": [
      {
        "commands": [
          "-c",
          "mkdir -p /mnt/google/.google/tmp"
        ],
        "entrypoint": "bash",
        "imageUri": "gcr.io/cloud-genomics-pipelines/io",
        "mounts": [
          {
            "disk": "google",
            "path": "/mnt/google"
          }
        ]
      },
      {
        "commands": [
          "-c",
          "/opt/gcp_variant_transforms/bin/vcf_to_bq --input_pattern https://storage.googleapis.com/gcp-public-data--gnomad/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.*.vcf.bgz --output_table eacri-genomics:gnomad.gnomad_hg19_2_1_1 --job_name vcf-to-bigquery --runner DataflowRunner --project psjh-eacri-data --region us-east1-b --temp_location gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp"
        ],
        "entrypoint": "bash",
        "imageUri": "gcr.io/cloud-lifesciences/gcp-variant-transforms",
        "mounts": [
          {
            "disk": "google",
            "path": "/mnt/google"
          }
        ]
      },
      {
        "alwaysRun": true,
        "commands": [
          "-c",
          "gsutil -q cp /google/logs/output gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp/runner_logs_20210517_170123.log"
        ],
        "entrypoint": "bash",
        "imageUri": "gcr.io/cloud-genomics-pipelines/io",
        "mounts": [
          {
            "disk": "google",
            "path": "/mnt/google"
          }
        ]
      }
    ],
    "environment": {
      "TMPDIR": "/mnt/google/.google/tmp"
    },
    "resources": {
      "regions": [
        "us-east1-b"
      ],
      "virtualMachine": {
        "disks": [
          {
            "name": "google",
            "sizeGb": 10
          }
        ],
        "machineType": "g1-small",
        "network": {},
        "serviceAccount": {
          "scopes": [
            "https://www.googleapis.com/auth/cloud-platform",
            "https://www.googleapis.com/auth/devstorage.read_write"
          ]
        }
      }
    }
  }
}
Pipeline running as "projects/447346450878/locations/us-central1/operations/5404432620223078014" (attempt: 1, preemptible: false)
Output will be written to "gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp/runner_logs_20210517_170123.log"
17:01:32 Execution failed: allocating: selecting resources: selecting region and zone: no regions/zones match request
"run": operation "projects/447346450878/locations/us-central1/operations/5404432620223078014" failed: executing pipeline: Execution failed: allocating: selecting resources: selecting region and zone: no regions/zones match request (reason: NOT_FOUND)
moschetti commented 3 years ago

You listed: --region 'us-east1-b' The letter though indicates a zone. So it should be --region 'us-east1' OR --zone 'us-east1-b'

abalter commented 3 years ago

I tried: --region us-east1

"run": operation "projects/447346450878/locations/us-central1/operations/13169224402019687354" failed: executing pipeline: Execution failed: allocating: creating instance: inserting instance: Invalid value for field 'resource.networkInterfaces[0].network': ''. The referenced network resource cannot be found. (reason: INVALID_ARGUMENT)

Does the "network resource" mean the source or destination?

rcowin-gcp commented 3 years ago

Are you using a custom network or the default network? If custom VPC are you using automode or custom-mode subnets?

abalter commented 3 years ago

I'm in an AI notbook.

abalter commented 3 years ago

Oh, the network. Hold on...

abalter commented 3 years ago

image

abalter commented 3 years ago

I tried:

docker run -v ~/.config:/root/.config \
  gcr.io/cloud-lifesciences/gcp-variant-transforms \
  --project "${GOOGLE_CLOUD_PROJECT}" \
  --region us-east1 \
  --temp_location "${TEMP_LOCATION}" \
  --subnetwork psjh-shared01/subnet01  \
  "${COMMAND}"
17:47:51 Worker released
"run": operation "projects/447346450878/locations/us-central1/operations/5710526366526649430" failed: executing pipeline: Execution failed: allocating: creating instance: inserting instance: Invalid value for field 'resource.networkInterfaces[0].subnetwork': 'projects/psjh-eacri-data/regions/us-east1/subnetworks/psjh-shared01/subnet01'. The URL is malformed. (reason: INVALID_ARGUMENT)
rcowin-gcp commented 3 years ago

Try adding this to the $COMMAND (we're updating some documentation here in GitHub but haven't been able to do a release yet)

--network $NETWORK --subnetwork regions/$REGION/subnetworks/$SUBNETWORK

abalter commented 3 years ago

Yeah, I think not ready for prime time. Doesn't like the network option.

 --project 'psjh-eacri-data' --region 'us-east1' --temp_location 'gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp' --subnetwork 'regions//subnetworks/subnet01' -- 'psjh-shared01'
getopt: unrecognized option '--network'
./gcp_vcf.sh: line 27: vcf_to_bq   --input_pattern https://storage.googleapis.com/gcp-public-data--gnomad/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.*.vcf.bgz   --output_table eacri-genomics:gnomad.gnomad_hg19_2_1_1   --job_name vcf-to-bigquery   --runner DataflowRunner: No such file or directory
rcowin-gcp commented 3 years ago

Hmm...The last time I ran this (in the last 2 weeks) this is what I used for both a custom VPC and custom subnet. If you have a custom VPC (not the "default" and named the same) you need to pass the network name. If you created a custom VPC, but auto-mode subnets, you don't need to pass the --subnet option.

Parameters to replace:

The GOOGLE_CLOUD_PROJECT is the project that contains your BigQuery dataset.

GOOGLE_CLOUD_PROJECT=PROJECT INPUT_PATTERN=gs://BUCKET/*.vcf OUTPUT_TABLE=PROJECT:DATASET.TABLE TEMP_LOCATION=gs://na1287-test-042621/temp NETWORK=CUSTOM_NETWORK SUBNETWORK=CUSTOM_MODE_SUBNET COMMAND="vcf_to_bq \ --input_pattern ${INPUT_PATTERN} \ --output_table ${OUTPUT_TABLE} \ --job_name vcf-to-bigquery \ --runner DataflowRunner --network $NETWORK --subnetwork regions/$REGION/subnetworks/$SUBNETWORK"

docker run -v ~/.config:/root/.config \ gcr.io/cloud-lifesciences/gcp-variant-transforms \ --project "${GOOGLE_CLOUD_PROJECT}" \ --region REGION \ --temp_location "${TEMP_LOCATION}" \ "${COMMAND}"

pgrosu commented 3 years ago

@rcowin-gcp It looks like in the code the command parser is missing the network parameter. If you look at this, it does not parse for the network option, which is probably why it is always empty:

https://github.com/googlegenomics/gcp-variant-transforms/blob/master/docker/pipelines_runner.sh#L25

getopt -o '' -l project:,temp_location:,docker_image:,region:,subnetwork:,use_public_ips:,service_account:,location: -- "$@"

And the subsequent case expression below that does not assign the network either if one is specified. It probably was an assumption in the coding that networks are not required if a subnet is assigned but the other way is the case based on other documentation. Probably that logic should be coded explicitly so there's no guess-work on the expectations.

Hope it helps, ~p

rcowin-gcp commented 3 years ago

You only need to specify the Network and Subnetwork if you are using a non-default network and/or custom-mode subnets. The documentation is not up-to-date right now but we are working on a new release.

pgrosu commented 3 years ago

Thank you, I read the REST spec below -- I was just providing a recommendation on where the code can be updated with a simple fix:

https://lifesciences.googleapis.com/$discovery/rest?version=v2beta

rcowin-gcp commented 3 years ago

Thank you. We're working on a handful of small updates to variant transforms and the documentation both, stay tuned. :)

ypouliot commented 2 years ago

Howdy. I'm getting precisely the same problem. Any updates on when this will be fixed? It's a complete blocker, sigh...

abalter commented 2 years ago

@slagelwa -- you had some insight into the where the code needed to be fixed. Perhaps if you posted it, maybe someone could write the patch.

ypouliot commented 2 years ago

Is there any reason to think the transform will work if run from github per https://github.com/googlegenomics/gcp-variant-transforms#running-from-github?

pgrosu commented 2 years ago

@abalter I think @slagelwa is referring that in the code the command parser is missing the network parameter. If you look at this, it does not parse for the network option, which is probably why it is always empty:

https://github.com/googlegenomics/gcp-variant-transforms/blob/master/docker/pipelines_runner.sh#L25

getopt -o '' -l project:,temp_location:,docker_image:,region:,subnetwork:,use_public_ips:,service_account:,location: -- "$@"

moschetti commented 2 years ago

@ypouliot running from GitHub will not provide different results as the parser is still the same which doesn't include network.

Correct that it does not parse for network, just subnetwork. As it's not currently possible to run across multiple regions, the only argument it takes is the subnetwork so you will provide the subnet that corresponds to the region in question.

Regarding the other issue about format of the subnet, you just use the name of the subnet, not the full path with project or regions. The docs have been updated to clarify that.

pgrosu commented 2 years ago

@moschetti You mean not possible because it's not implemented in the Life Sciences API, or that GCP would not allow it? The Life Sciences API below seems to have an option for it (and GCP/Google Storage allows for multi-region configs), plus was suggested also by @rcowin-gcp:

        "network": {
          "type": "string",
          "description": "The network name to attach the VM's network interface to. The value will be prefixed with `global/networks/` unless it contains a `/`, in which case it is assumed to be a fully specified network resource URL. If unspecified, the global default network is used."
        },
        "subnetwork": {
          "description": "If the specified network is configured for custom subnet creation, the name of the subnetwork to attach the instance to must be specified here. The value is prefixed with `regions/*/subnetworks/` unless it contains a `/`, in which case it is assumed to be a fully specified subnetwork resource URL. If the `*` character appears in the value, it is replaced with the region that the virtual machine has been allocated in.",
          "type": "string"
        },

Thanks, ~p

moschetti commented 2 years ago

@pgrosu Dataflow does not support splitting workers across different regions. Life Sciences API does, but in the case of Variant Transforms Life Sciences API is only starting the first VM that kicks off the Dataflow job. So since the Dataflow workers will all run in a single region we can infer the network based on the single subnetwork that was provided.

However, if that isn't working for folks, we'd be curious to hear the use case to better understand how it's being used and if this is something to look into.

pgrosu commented 2 years ago

@moschetti Of course you can have multi-region DataFlows via standard functional programming methodologies when it is data-driven with the global Cloud Logging, as one DataFlow in one region can launch another set of DataFlows in multiple regions given the geo-location information about the data. This way you have a distributed global flow where the code moves adaptively to the geo-location of distributed data guided via the global log, which saves on costs.

moschetti commented 2 years ago

@pgrosu valid point. Variant Transforms, however, is not set up to be able to start Dataflow jobs in multiple regions so I'm not sure if the network flag gains anything in this use case. If lack of the network flag is blocking something, I'd be glad to hear the concern to see if it's something we need to address.

But I believe that for Variant Transforms which is ingesting from buckets that multi-region wouldn't gain much as you could also run multiple pipelines for each region if you had lots of input data in different regions as the overhead of the job orchestrator is relatively small. Happy to listen though if that is a concern for you.

pgrosu commented 2 years ago

@moschetti Not all buckets are created equal ;) Regional buckets provide huge cost-savings over multi-region ones, which is why one would prefer that the code co-locate accordingly to those sites. For example, here's the monthly Cloud Storage cost for 100 TB calculated using the Google Cloud Pricing Calculator for a regional site (Iowa) as compared to a multi-region (whole US). The result is that an additional cost of $614/month for multi-region buckets would be necessary, which can be quite a lot for some folks that might need it for other Cloud resources during their analysis:

For the Iowa (regional) location ($2048/month)

1x Standard Storage Location: Iowa Total Amount of Storage: 102,400 GiB Egress - Data moves within the same location: 0 GiB Always Free usage included: No USD 2,048.000

For the US (multi-region) location ($2662/month)

1x Standard Storage Location: United States Total Amount of Storage: 102,400 GiB Egress - Data moves within the same location: 0 GiB Always Free usage included: No USD 2,662.400

Additional Egress Costs ($1024/month)

On top of that there could be egress charges for data moves, which can for instance add to the total cost an extra $1024 ($3,072 - $2,048), making even more the case for the free cost of the code moves:

1x Standard Storage Location: Iowa Total Amount of Storage: 102,400 GiB Egress - Data moves between different locations on the same continent: 102,400 GiB Always Free usage included: No USD 3,072.000

Hope it helps, Paul