grailbio / reflow

A language and runtime for distributed, incremental data processing in the cloud
Apache License 2.0
965 stars 52 forks source link

assoc store stderr: ProvisionedThroughputExceededException #101

Closed olgabot closed 5 years ago

olgabot commented 5 years ago

Hello, A few more of us at Biohub are using reflow at once and I'm starting to see this error:

reflow: assoc store stderr: ProvisionedThroughputExceededException: The level of configured provisioned throughput for one or more global secondary indexes of the table was exceeded. Consider increasing your provisioning level for the under-provisioned global secondary indexes with the UpdateTable API

How can I increase the provisioning level? Is this an AWS-level change or a Reflow-level change?

Warmest, Olga

prasadgopal commented 5 years ago

It is aws change. You should be able to go to the aws console and increase the read/write capacity of the table and the indices.

On Wed, Feb 6, 2019, 12:42 PM Olga Botvinnik < wrote:

Hello, A few more of us at Biohub are using reflow at once and I'm starting to see this error:

reflow: assoc store stderr: ProvisionedThroughputExceededException: The level of configured provisioned throughput for one or more global secondary indexes of the table was exceeded. Consider increasing your provisioning level for the under-provisioned global secondary indexes with the UpdateTable API

How can I increase the provisioning level? Is this an AWS-level change or a Reflow-level change?

Warmest, Olga

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread .


This email message, including attachments, may contain private, proprietary, or privileged information and is the confidential information and/or property of GRAIL, Inc., and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

mariusae commented 5 years ago

It can also be made "elastic" now -- you pay for what you use without having to provision. That's probably the best choice for something like Reflow.

olgabot commented 5 years ago

Here's what the table configuration looks like now - is this correct?

screen shot 2019-02-06 at 5 58 47 pm

It's now on "On Demand" and yet we're still getting the ProvisionedThroughputExceededException error.

olgabot commented 5 years ago

Hello, I'm not sure what else to change because the ProvisionedThroughputExceededException error keeps happening even though I changed the usage on both czbiohub-reflow-quickstart and czbiohub-reflow-quickstart-cache tables to be "On-demand."

Here's a screenshot with reflow runbatch -retry, reflow config, watch aguamenti status (shows how many jobs are canceled/done/running/waiting for a path by parsing reflow listbatch) and watch reflow ps:

screen shot 2019-02-07 at 11 32 14 am

 Thu  7 Feb - 11:32  ~ 
  reflow config | grep repository
zsh: correct 'config' to '.config' [nyae]? n
repository: s3,czbiohub-reflow-quickstart-cache

And here's a reflow -log=debug -cache=off run -trace output:

 Thu  7 Feb - 11:36  ~/code/kmer-hashing/sourmash/maca/10x_spleen_kidney   origin ☊ olgabot/tissue-subset ✔ 1☀ 
  aguamenti check-batch --debug
Found sample with id "Spleen_10X_P4_7"!
Running 'reflow -log=debug -cache=off run -trace /home/olga/code/kmer-hashing/sourmash/maca/10x_spleen_kidney/../../../reflow/sourmash_compute_10x.rf -tenx s3://czbiohub-maca/10x_data/10X_P4_7 -output s3://olgabot-maca/10x/sourmash_compute/ksizes=21,27,33,51_num_hashes=5000/Spleen_10X_P4_7.sig -ksizes 21,27,33,51 -num_hashes 5000'
2019/02/07 11:47:42 reflow version 0.6.7 (go1.10)
2019/02/07 11:47:42 reflowlet image grailbio/reflowlet:1531508213
2019/02/07 11:47:42 run ID: e3457aec
2019/02/07 11:47:43 evaluating program /home/olga/code/kmer-hashing/reflow/sourmash_compute_10x.rf
2019/02/07 11:47:43 ec2cluster: pending{}
2019/02/07 11:47:43 ec2cluster: pending{}
2019/02/07 11:47:43 ec2cluster: allocate {mem:16.0GiB cpu:1 disk:1.0GiB}
2019/02/07 11:47:43 ec2cluster: pending{} waiter0{mem:16.0GiB cpu:1 disk:1.0GiB}
2019/02/07 11:47:43 ec2cluster: launch r4.xlarge{mem:28.4GiB cpu:4 disk:2.4TiB intel_avx:4 intel_avx2:4} pending{mem:28.4GiB cpu:4 disk:2.4TiB intel_avx:4 intel_avx2:4}
2019/02/07 11:47:44 ec2cluster: EC2RunInstances {
  BlockDeviceMappings: [{
      DeviceName: "/dev/xvda",
      Ebs: {
        DeleteOnTermination: true,
        VolumeSize: 200,
        VolumeType: "gp2"
      DeviceName: "/dev/xvdb",
      Ebs: {
        DeleteOnTermination: true,
        VolumeSize: 2500,
        VolumeType: "gp2"
  ClientToken: "1af5f13624606db8",
  DisableApiTermination: false,
  DryRun: false,
  EbsOptimized: true,
  IamInstanceProfile: {
    Arn: ""
  ImageId: "ami-4296ec3a",
  InstanceInitiatedShutdownBehavior: "terminate",
  InstanceType: "r4.xlarge",
  MaxCount: 1,
  MinCount: 1,
  Monitoring: {
    Enabled: true
  SecurityGroupIds: ["sg-661d7f19"],
 UserData: "bigSHAstring",}
2019/02/07 11:47:45 ec2cluster: launched instance i-0db3024931f04f08d: r4.xlarge{mem:28.4GiB cpu:4 disk:2.4TiB intel_avx:4 intel_avx2:4}
2019/02/07 11:48:43 ec2cluster: pending{mem:28.4GiB cpu:4 disk:2.4TiB intel_avx:4 intel_avx2:4} waiter0{mem:16.0GiB cpu:1 disk:1.0GiB}
2019/02/07 11:48:54 ec2cluster: added instance r4.xlarge resources{mem:28.4GiB cpu:4 disk:2.4TiB intel_avx:4 intel_avx2:4} pending{} available{mem:12.4GiB cpu:3 disk:2.4TiB intel_avx:4 intel_avx2:4} npending:0 waiters:0 notified:1
2019/02/07 11:48:54 ec2cluster: pending{}
2019/02/07 11:48:54 accepted alloc
2019/02/07 11:48:54 run state: eval alloc
2019/02/07 11:48:54 evaluating with configuration: executor *client.clientAlloc transferer *repository.Manager flags cacheextern,nocache,nogc,norecomputeempty,topdown flowconfig hashv2 cachelookuptimeout 1m0s
2019/02/07 11:48:54 mutate flow e9e824ed state FlowInit {} k deps 63b7f22a: FlowTODO
2019/02/07 11:48:54 mutate flow 63b7f22a state FlowInit {} k deps 633d94ae: FlowTODO
2019/02/07 11:48:54 mutate flow 633d94ae state FlowInit {} k deps f42688cd: FlowTODO
2019/02/07 11:48:54 mutate flow f42688cd state FlowInit {} k deps e3b25ffc: FlowTODO
2019/02/07 11:48:54 mutate flow e3b25ffc state FlowInit {} k deps fd2bb232: FlowTODO
2019/02/07 11:48:54 mutate flow fd2bb232 state FlowInit {} coerce deps 3cb19a77: FlowTODO
2019/02/07 11:48:54 mutate flow 3cb19a77 state FlowInit {} exec image czbiohub/kmer-hashing cmd "\n        /opt/conda/bin/sourmash compute \\\n            --track-abundance \\\n            --protein \\\n            --dna \\\n             \\\n            --scaled 0 \\\n            --ksizes 21,27,33,51 \\\n            --output %s \\\n            %s\n    " deps 124ba7ed: FlowTODO
2019/02/07 11:48:54 mutate flow 124ba7ed state FlowInit {} coerce deps 5291aef1: FlowTODO
2019/02/07 11:48:54 mutate flow 5291aef1 state FlowInit {} k deps cb1cb78e: FlowTODO
2019/02/07 11:48:54 mutate flow cb1cb78e state FlowInit {} k deps 2d4e6bb3: FlowTODO
2019/02/07 11:48:54 mutate flow 2d4e6bb3 state FlowInit {} k deps cd78382a: FlowTODO
2019/02/07 11:48:54 mutate flow cd78382a state FlowInit {} k deps 479d0186: FlowTODO
2019/02/07 11:48:54 mutate flow 479d0186 state FlowInit {} k deps ff9993d2: FlowTODO
2019/02/07 11:48:54 mutate flow ff9993d2 state FlowInit {} k deps b04a908a: FlowTODO
2019/02/07 11:48:54 mutate flow b04a908a state FlowInit {} coerce deps 0e03c6b5: FlowTODO
2019/02/07 11:48:54 mutate flow 0e03c6b5 state FlowInit {} exec image czbiohub/bam2fastx cmd "\n            bam2fastx fasta %s --all-cells-in-one-file --output %s\n    " deps 33c685c6: FlowTODO
2019/02/07 11:48:54 mutate flow 33c685c6 state FlowInit {} coerce deps ceed79ad: FlowTODO
2019/02/07 11:48:54 mutate flow ceed79ad state FlowInit {} k deps 76e4d495: FlowTODO
2019/02/07 11:48:54 mutate flow 76e4d495 state FlowInit {} k deps 9f3f92a6: FlowTODO
2019/02/07 11:48:54 mutate flow 9f3f92a6 state FlowInit {} k deps d9183eca: FlowTODO
2019/02/07 11:48:54 mutate flow d9183eca state FlowInit {} k deps 29b3ebe2,7a604e2f,6f220341: FlowTODO
2019/02/07 11:48:54 mutate flow 29b3ebe2 state FlowInit {} k deps ea7dd411: FlowTODO
2019/02/07 11:48:54 mutate flow ea7dd411 state FlowInit {} k deps 1d4996e5: FlowTODO
2019/02/07 11:48:54 mutate flow 1d4996e5 state FlowInit {} k deps 9820e550: FlowTODO
2019/02/07 11:48:54 mutate flow 9820e550 state FlowInit {} k deps 6ffadddd: FlowTODO
2019/02/07 11:48:54 mutate flow 6ffadddd state FlowInit {} coerce deps aa06eef7: FlowTODO
2019/02/07 11:48:54 mutate flow aa06eef7 state FlowInit {} intern url "s3://czbiohub-maca/10x_data/10X_P4_7/": FlowTODO
2019/02/07 11:48:54 mutate flow aa06eef7 state FlowTODO {} intern url "s3://czbiohub-maca/10x_data/10X_P4_7/": FlowReady
2019/02/07 11:48:54 mutate flow 7a604e2f state FlowInit {} k deps b2de843f: FlowTODO
2019/02/07 11:48:54 mutate flow b2de843f state FlowInit {} k deps 674e18b1: FlowTODO
2019/02/07 11:48:54 mutate flow 674e18b1 state FlowInit {} k deps 9820e550: FlowTODO
2019/02/07 11:48:54 mutate flow 6f220341 state FlowInit {} k deps 0ff2799e: FlowTODO
2019/02/07 11:48:54 mutate flow 0ff2799e state FlowInit {} k deps b6a16278: FlowTODO
2019/02/07 11:48:54 mutate flow b6a16278 state FlowInit {} k deps 9820e550: FlowTODO
2019/02/07 11:48:54 mutate flow aa06eef7 state FlowReady {} intern url "s3://czbiohub-maca/10x_data/10X_P4_7/": FlowRunning, map[]
2019/02/07 11:49:12  ->  sourmash_compute_10x.Main.tenx_folder aa06eef7 run  intern s3://czbiohub-maca/10x_data/10X_P4_7/
2019/02/07 11:49:12 sourmash_compute_10x.Main.tenx_folder aa06eef7 /home/olga/code/kmer-hashing/sourmash/maca/10x_spleen_kidney/../../../reflow/sourmash_compute_10x.rf:70:26:
2019/02/07 11:49:43 ec2cluster: pending{}
ec2cluster: 1 instances: r4.xlarge:1 (<=$0.3/hr), total{mem:28.4GiB cpu:4 disk:2.4TiB
e3457aec: elapsed: 50s, running:1, completed: 0/3
  sourmash_compute_10x.Main.tenx_folder:  intern s3://czbiohub-maca/10x_data/10X_  49
prasadgopal commented 5 years ago

Can you check the "Indexes" tab and make sure they have Read Capacity and Write Capacity as "On Demand" ?

On Thu, Feb 7, 2019 at 11:50 AM Olga Botvinnik wrote:

Hello, I'm not sure what else to change because the ProvisionedThroughputExceededException error keeps happening even though I changed the usage on both czbiohub-reflow-quickstart and czbiohub-reflow-quickstart-cache tables to be "On-demand."

Here's a screenshot with reflow runbatch -retry, reflow config, watch aguamenti status (shows how many jobs are canceled/done/running/waiting for a path by parsing reflow listbatch) and watch reflow ps:

[image: screen shot 2019-02-07 at 11 32 14 am]

Thu 7 Feb - 11:32  ~ 

 reflow config | grep repository

zsh: correct 'config' to '.config' [nyae]? n

repository: s3,czbiohub-reflow-quickstart-cache

And here's a reflow -log=debug -cache=off run -trace output:


Thu 7 Feb - 11:36  ~/code/kmer-hashing/sourmash/maca/10x_spleen_kidney   origin ☊ olgabot/tissue-subset ✔ 1☀ 

 aguamenti check-batch --debug

Found sample with id "Spleen_10X_P4_7"!

Running 'reflow -log=debug -cache=off run -trace /home/olga/code/kmer-hashing/sourmash/maca/10x_spleen_kidney/../../../reflow/sourmash_compute_10x.rf -tenx s3://czbiohub-maca/10x_data/10X_P4_7 -output s3://olgabot-maca/10x/sourmash_compute/ksizes=21,27,33,51_num_hashes=5000/Spleen_10X_P4_7.sig -ksizes 21,27,33,51 -num_hashes 5000'

2019/02/07 11:47:42 reflow version 0.6.7 (go1.10)

2019/02/07 11:47:42 reflowlet image grailbio/reflowlet:1531508213

2019/02/07 11:47:42 run ID: e3457aec

2019/02/07 11:47:43 evaluating program /home/olga/code/kmer-hashing/reflow/sourmash_compute_10x.rf





















2019/02/07 11:47:43 ec2cluster: pending{}

2019/02/07 11:47:43 ec2cluster: pending{}

2019/02/07 11:47:43 ec2cluster: allocate {mem:16.0GiB cpu:1 disk:1.0GiB}

2019/02/07 11:47:43 ec2cluster: pending{} waiter0{mem:16.0GiB cpu:1 disk:1.0GiB}

2019/02/07 11:47:43 ec2cluster: launch r4.xlarge{mem:28.4GiB cpu:4 disk:2.4TiB intel_avx:4 intel_avx2:4} pending{mem:28.4GiB cpu:4 disk:2.4TiB intel_avx:4 intel_avx2:4}

2019/02/07 11:47:44 ec2cluster: EC2RunInstances {

BlockDeviceMappings: [{

  DeviceName: "/dev/xvda",

  Ebs: {

    DeleteOnTermination: true,

    VolumeSize: 200,

    VolumeType: "gp2"



  DeviceName: "/dev/xvdb",

  Ebs: {

    DeleteOnTermination: true,

    VolumeSize: 2500,

    VolumeType: "gp2"



ClientToken: "1af5f13624606db8",

DisableApiTermination: false,

DryRun: false,

EbsOptimized: true,

IamInstanceProfile: {

Arn: ""


ImageId: "ami-4296ec3a",

InstanceInitiatedShutdownBehavior: "terminate",

InstanceType: "r4.xlarge",

MaxCount: 1,

MinCount: 1,

Monitoring: {

Enabled: true


SecurityGroupIds: ["sg-661d7f19"],

UserData: "bigSHAstring",}


2019/02/07 11:47:45 ec2cluster: launched instance i-0db3024931f04f08d: r4.xlarge{mem:28.4GiB cpu:4 disk:2.4TiB intel_avx:4 intel_avx2:4}

2019/02/07 11:48:43 ec2cluster: pending{mem:28.4GiB cpu:4 disk:2.4TiB intel_avx:4 intel_avx2:4} waiter0{mem:16.0GiB cpu:1 disk:1.0GiB}

2019/02/07 11:48:54 ec2cluster: added instance r4.xlarge resources{mem:28.4GiB cpu:4 disk:2.4TiB intel_avx:4 intel_avx2:4} pending{} available{mem:12.4GiB cpu:3 disk:2.4TiB intel_avx:4 intel_avx2:4} npending:0 waiters:0 notified:1

2019/02/07 11:48:54 ec2cluster: pending{}

2019/02/07 11:48:54 accepted alloc

2019/02/07 11:48:54 run state: eval alloc

2019/02/07 11:48:54 evaluating with configuration: executor client.clientAlloc transferer repository.Manager flags cacheextern,nocache,nogc,norecomputeempty,topdown flowconfig hashv2 cachelookuptimeout 1m0s

2019/02/07 11:48:54 mutate flow e9e824ed state FlowInit {} k deps 63b7f22a: FlowTODO

2019/02/07 11:48:54 mutate flow 63b7f22a state FlowInit {} k deps 633d94ae: FlowTODO

2019/02/07 11:48:54 mutate flow 633d94ae state FlowInit {} k deps f42688cd: FlowTODO

2019/02/07 11:48:54 mutate flow f42688cd state FlowInit {} k deps e3b25ffc: FlowTODO

2019/02/07 11:48:54 mutate flow e3b25ffc state FlowInit {} k deps fd2bb232: FlowTODO

2019/02/07 11:48:54 mutate flow fd2bb232 state FlowInit {} coerce deps 3cb19a77: FlowTODO

2019/02/07 11:48:54 mutate flow 3cb19a77 state FlowInit {} exec image czbiohub/kmer-hashing cmd "\n /opt/conda/bin/sourmash compute \\n --track-abundance \\n --protein \\n --dna \\n \\n --scaled 0 \\n --ksizes 21,27,33,51 \\n --output %s \\n %s\n " deps 124ba7ed: FlowTODO

2019/02/07 11:48:54 mutate flow 124ba7ed state FlowInit {} coerce deps 5291aef1: FlowTODO

2019/02/07 11:48:54 mutate flow 5291aef1 state FlowInit {} k deps cb1cb78e: FlowTODO

2019/02/07 11:48:54 mutate flow cb1cb78e state FlowInit {} k deps 2d4e6bb3: FlowTODO

2019/02/07 11:48:54 mutate flow 2d4e6bb3 state FlowInit {} k deps cd78382a: FlowTODO

2019/02/07 11:48:54 mutate flow cd78382a state FlowInit {} k deps 479d0186: FlowTODO

2019/02/07 11:48:54 mutate flow 479d0186 state FlowInit {} k deps ff9993d2: FlowTODO

2019/02/07 11:48:54 mutate flow ff9993d2 state FlowInit {} k deps b04a908a: FlowTODO

2019/02/07 11:48:54 mutate flow b04a908a state FlowInit {} coerce deps 0e03c6b5: FlowTODO

2019/02/07 11:48:54 mutate flow 0e03c6b5 state FlowInit {} exec image czbiohub/bam2fastx cmd "\n bam2fastx fasta %s --all-cells-in-one-file --output %s\n " deps 33c685c6: FlowTODO

2019/02/07 11:48:54 mutate flow 33c685c6 state FlowInit {} coerce deps ceed79ad: FlowTODO

2019/02/07 11:48:54 mutate flow ceed79ad state FlowInit {} k deps 76e4d495: FlowTODO

2019/02/07 11:48:54 mutate flow 76e4d495 state FlowInit {} k deps 9f3f92a6: FlowTODO

2019/02/07 11:48:54 mutate flow 9f3f92a6 state FlowInit {} k deps d9183eca: FlowTODO

2019/02/07 11:48:54 mutate flow d9183eca state FlowInit {} k deps 29b3ebe2,7a604e2f,6f220341: FlowTODO

2019/02/07 11:48:54 mutate flow 29b3ebe2 state FlowInit {} k deps ea7dd411: FlowTODO

2019/02/07 11:48:54 mutate flow ea7dd411 state FlowInit {} k deps 1d4996e5: FlowTODO

2019/02/07 11:48:54 mutate flow 1d4996e5 state FlowInit {} k deps 9820e550: FlowTODO

2019/02/07 11:48:54 mutate flow 9820e550 state FlowInit {} k deps 6ffadddd: FlowTODO

2019/02/07 11:48:54 mutate flow 6ffadddd state FlowInit {} coerce deps aa06eef7: FlowTODO

2019/02/07 11:48:54 mutate flow aa06eef7 state FlowInit {} intern url "s3://czbiohub-maca/10x_data/10X_P4_7/": FlowTODO

2019/02/07 11:48:54 mutate flow aa06eef7 state FlowTODO {} intern url "s3://czbiohub-maca/10x_data/10X_P4_7/": FlowReady

2019/02/07 11:48:54 mutate flow 7a604e2f state FlowInit {} k deps b2de843f: FlowTODO

2019/02/07 11:48:54 mutate flow b2de843f state FlowInit {} k deps 674e18b1: FlowTODO

2019/02/07 11:48:54 mutate flow 674e18b1 state FlowInit {} k deps 9820e550: FlowTODO

2019/02/07 11:48:54 mutate flow 6f220341 state FlowInit {} k deps 0ff2799e: FlowTODO

2019/02/07 11:48:54 mutate flow 0ff2799e state FlowInit {} k deps b6a16278: FlowTODO

2019/02/07 11:48:54 mutate flow b6a16278 state FlowInit {} k deps 9820e550: FlowTODO

2019/02/07 11:48:54 mutate flow aa06eef7 state FlowReady {} intern url "s3://czbiohub-maca/10x_data/10X_P4_7/": FlowRunning, map[]

2019/02/07 11:49:12 -> sourmash_compute_10x.Main.tenx_folder aa06eef7 run intern s3://czbiohub-maca/10x_data/10X_P4_7/

2019/02/07 11:49:12 sourmash_compute_10x.Main.tenx_folder aa06eef7 /home/olga/code/kmer-hashing/sourmash/maca/10x_spleen_kidney/../../../reflow/sourmash_compute_10x.rf:70:26:


2019/02/07 11:49:43 ec2cluster: pending{}

ec2cluster: 1 instances: r4.xlarge:1 (<=$0.3/hr), total{mem:28.4GiB cpu:4 disk:2.4TiB

e3457aec: elapsed: 50s, running:1, completed: 0/3

sourmash_compute_10x.Main.tenx_folder: intern s3://czbiohub-maca/10xdata/10X 49

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread .


This email message, including attachments, may contain private, proprietary, or privileged information and is the confidential information and/or property of GRAIL, Inc., and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

olgabot commented 5 years ago

Yes, they both do:

screen shot 2019-02-07 at 4 33 48 pm

screen shot 2019-02-07 at 4 33 42 pm

Does that look right?

prasadgopal commented 5 years ago

That looks correct to me. I am not super sure what is going on. Can you file a request with the AWS support folks with RequestId in the failure message to see if they can come up with the reason for the failure?

(just to be double sure, the table in your reflow config is same table you are modifying the capacity, correct?)

On Thu, Feb 7, 2019 at 4:36 PM Olga Botvinnik wrote:

Yes, they both do:

[image: screen shot 2019-02-07 at 4 33 48 pm] screen shot 2019-02-07 at 4 33 42 pm

Does that look right?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread .


This email message, including attachments, may contain private, proprietary, or privileged information and is the confidential information and/or property of GRAIL, Inc., and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

olgabot commented 5 years ago

Hello, Yes I can file a ticket. This is the correct dynamoDB table:

 Tue 12 Feb - 07:38  ~/code/kmer-hashing/sourmash/maca/10x_spleen_kidney   origin ☊ olgabot/tissue-subset ↑1 5☀ 4● 
  reflow config | grep dynamodb
assoc: dynamodb,czbiohub-reflow-quickstart

Warmest, Olga