grailbio / reflow

A language and runtime for distributed, incremental data processing in the cloud
Apache License 2.0
965 stars 52 forks source link

Use instances in same availability zone as data #107

Closed olgabot closed 5 years ago

olgabot commented 5 years ago

Hello, We have a bunch of data in us-west-2a and I noticed that the instances for which our reflow jobs are running are in a different availability zone from the S3 buckets, thus substantially slowing down the jobs due to IO/data transfer. Is there a way to require the instances to be launched in the same availability zone? Warmest, Olga

mariusae commented 5 years ago

Zone, or region? Reflow allows you to configure the region in which to launch instances, but then uses all AZs within that region. I believe S3 buckets are based on regions and not AZs.