grailbio / reflow

A language and runtime for distributed, incremental data processing in the cloud
Apache License 2.0
965 stars 52 forks source link

invalid memory address or nil pointer dereference #100

Closed olgabot closed 5 years ago

olgabot commented 5 years ago

Not sure what is happening here but getting this error here. As a side note, there were 5 jobs for star.Index running simultaneously when really there only should be ONE job since the same genome is getting used by all the runs

screen shot 2018-12-18 at 9 45 55 am
➜  minirun git:(master) ✗ reflow version
0.6.8 (go1.10)

Here's the command and error:

➜  minirun git:(master) ✗ reflow runbatch -retry
reflow: batch program /home/ubuntu/reflow-workflows/workflows/rnaseq.rf runsfile samples.csv
retrying run OPS016_mBAL_RNA_239_F11_S111
retrying run OPS016_mBAL_RNA_240_H11_S118
retrying run OPS016_mBAL_RNA_241_J11_S125
retrying run OPS016_mBAL_RNA_246_L11_S133
retrying run OPS016_mBAL_RNA_229_L9_S132
retrying run OPS016_mBAL_RNA_232_N9_S140
retrying run OPS016_mBAL_RNA_234_P9_S148
retrying run OPS016_mBAL_RNA_235_B11_S97
retrying run OPS016_mBAL_RNA_237_D11_S104
reflow: ec2cluster: failed to allocate from pool: unavailable: no allocs available in pool; provisioning new instances
reflow: ec2cluster: failed to allocate from pool: unavailable: no allocs available in pool; provisioning new instances
reflow: ec2cluster: failed to allocate from pool: unavailable: no allocs available in pool; provisioning new instances
reflow: ec2cluster: failed to allocate from pool: unavailable: no allocs available in pool; provisioning new instances
reflow: ec2cluster: failed to allocate from pool: unavailable: no allocs available in pool; provisioning new instances
reflow: ec2cluster: failed to allocate from pool: unavailable: no allocs available in pool; provisioning new instances
reflow: ec2cluster: failed to allocate from pool: unavailable: no allocs available in pool; provisioning new instances
reflow: ec2cluster: failed to allocate from pool: unavailable: no allocs available in pool; provisioning new instances
ec2cluster: 10 instances: c5.18xlarge:2,m4.4xlarge:8 (<=$12.5/hr), total{mem:744.0GiB cpu:272 disk:
  allocate {mem:4.0GiB cpu:8 disk:0B}:     alloc ec2-52-43-91-104.us-west-2.compute.amazonaws.c  6s
  allocate {mem:4.0GiB cpu:8 disk:0B}[3]:  provisioning new instance                             6s
transfers: done: 5 1.5MiB, transferring: 0 0B, waiting: 0 0B
  ec2-35-166-172-215...->ec2-52-43-91-104.us..:  done: 1 462.7KiB, transferring: 0 0B, waiting:  6s
  ec2-52-43-91-104.us..->ec2-35-166-172-215...:  done: 2 534.7KiB, transferring: 0 0B, waiting:  0s
  ec2-52-43-91-104.us..->s3:czbiohub-reflow-..:  done: 2 534.7KiB, transferring: 0 0B, waiting:  0s
batch /home/ubuntu/reflow-batches/rnaseq/mus/20181030_FS10000331_12_BNT40322-1214/minirun: remainin
  c3a8b975:  eval alloc ec2-35-163-221-122.us-west-2.compute.amazonaws.com:9000/8cb20c5725ef  7m36s
  8d15eb05:  eval alloc ec2-54-244-174-188.us-west-2.compute.amazonaws.com:9000/85a862f0cf6a  7m36s
  d7ebcf55:  eval alloc ec2-34-220-242-13.us-west-2.compute.amazonaws.com:9000/8cb11ce66a473  7m36s
  f5f7474f:  eval alloc ec2-34-211-68-224.us-west-2.compute.amazonaws.com:9000/ae4f7fe5a6eec  7m36s
  dd01043d:  eval alloc ec2-35-166-172-215.us-west-2.compute.amazonaws.com:9000/820451891062  7m36s
  52f4b93c:  eval alloc ec2-52-43-91-104.us-west-2.compute.amazonaws.com:9000/b33c5883bfb41e  7m36s
  aec6e245:  eval alloc ec2-52-43-91-104.us-west-2.compute.amazonaws.com:9000/4033ad7be42d9d  7m36s
  fa624313:  eval alloc ec2-52-43-91-104.us-west-2.compute.amazonaws.com:9000/7da6873d20e22d  7m36s
  72c67716:  eval alloc ec2-54-212-57-191.us-west-2.compute.amazonaws.com:9000/fe02fbf175f9d  7m36s
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x88e377]

goroutine 3719 [running]:
github.com/grailbio/reflow.(*Eval).CacheWrite.func1(0xc400000008, 0x103ca10)
        /Users/marius/go/src/github.com/grailbio/reflow/eval.go:1189 +0x87
github.com/grailbio/reflow/vendor/golang.org/x/sync/errgroup.(*Group).Go.func1(0xc42111ad40, 0xc421112600)
        /Users/marius/go/src/github.com/grailbio/reflow/vendor/golang.org/x/sync/errgroup/errgroup.go:58 +0x57
created by github.com/grailbio/reflow/vendor/golang.org/x/sync/errgroup.(*Group).Go
        /Users/marius/go/src/github.com/grailbio/reflow/vendor/golang.org/x/sync/errgroup/errgroup.go:55 +0x66

Here are the samples.csv and config.json, where rnaseq.rf is this workflow.

``` ➜ minirun git:(master) ✗ cat samples.csv id,read1,read2,name,genome,output,region OPS016_mBAL_RNA_229_L9_S132,s3://czb-seqbot/fastqs/20181030_FS10000331_12_BNT40322-1214/OPS016_mBAL_RNA_229_L9_S132_R1_001.fastq.gz,s3://czb-seqbot/fastqs/20181030_FS10000331_12_BNT40322-1214/OPS016_mBAL_RNA_229_L9_S132_R2_001.fastq.gz,OPS016_mBAL_RNA_229_L9_S132,mouse/vM19,s3://olgabot-maca/aguamenti-test/,west OPS016_mBAL_RNA_232_N9_S140,s3://czb-seqbot/fastqs/20181030_FS10000331_12_BNT40322-1214/OPS016_mBAL_RNA_232_N9_S140_R1_001.fastq.gz,s3://czb-seqbot/fastqs/20181030_FS10000331_12_BNT40322-1214/OPS016_mBAL_RNA_232_N9_S140_R2_001.fastq.gz,OPS016_mBAL_RNA_232_N9_S140,mouse/vM19,s3://olgabot-maca/aguamenti-test/,west OPS016_mBAL_RNA_234_P9_S148,s3://czb-seqbot/fastqs/20181030_FS10000331_12_BNT40322-1214/OPS016_mBAL_RNA_234_P9_S148_R1_001.fastq.gz,s3://czb-seqbot/fastqs/20181030_FS10000331_12_BNT40322-1214/OPS016_mBAL_RNA_234_P9_S148_R2_001.fastq.gz,OPS016_mBAL_RNA_234_P9_S148,mouse/vM19,s3://olgabot-maca/aguamenti-test/,west OPS016_mBAL_RNA_235_B11_S97,s3://czb-seqbot/fastqs/20181030_FS10000331_12_BNT40322-1214/OPS016_mBAL_RNA_235_B11_S97_R1_001.fastq.gz,s3://czb-seqbot/fastqs/20181030_FS10000331_12_BNT40322-1214/OPS016_mBAL_RNA_235_B11_S97_R2_001.fastq.gz,OPS016_mBAL_RNA_235_B11_S97,mouse/vM19,s3://olgabot-maca/aguamenti-test/,west OPS016_mBAL_RNA_237_D11_S104,s3://czb-seqbot/fastqs/20181030_FS10000331_12_BNT40322-1214/OPS016_mBAL_RNA_237_D11_S104_R1_001.fastq.gz,s3://czb-seqbot/fastqs/20181030_FS10000331_12_BNT40322-1214/OPS016_mBAL_RNA_237_D11_S104_R2_001.fastq.gz,OPS016_mBAL_RNA_237_D11_S104,mouse/vM19,s3://olgabot-maca/aguamenti-test/,west OPS016_mBAL_RNA_239_F11_S111,s3://czb-seqbot/fastqs/20181030_FS10000331_12_BNT40322-1214/OPS016_mBAL_RNA_239_F11_S111_R1_001.fastq.gz,s3://czb-seqbot/fastqs/20181030_FS10000331_12_BNT40322-1214/OPS016_mBAL_RNA_239_F11_S111_R2_001.fastq.gz,OPS016_mBAL_RNA_239_F11_S111,mouse/vM19,s3://olgabot-maca/aguamenti-test/,west OPS016_mBAL_RNA_240_H11_S118,s3://czb-seqbot/fastqs/20181030_FS10000331_12_BNT40322-1214/OPS016_mBAL_RNA_240_H11_S118_R1_001.fastq.gz,s3://czb-seqbot/fastqs/20181030_FS10000331_12_BNT40322-1214/OPS016_mBAL_RNA_240_H11_S118_R2_001.fastq.gz,OPS016_mBAL_RNA_240_H11_S118,mouse/vM19,s3://olgabot-maca/aguamenti-test/,west OPS016_mBAL_RNA_241_J11_S125,s3://czb-seqbot/fastqs/20181030_FS10000331_12_BNT40322-1214/OPS016_mBAL_RNA_241_J11_S125_R1_001.fastq.gz,s3://czb-seqbot/fastqs/20181030_FS10000331_12_BNT40322-1214/OPS016_mBAL_RNA_241_J11_S125_R2_001.fastq.gz,OPS016_mBAL_RNA_241_J11_S125,mouse/vM19,s3://olgabot-maca/aguamenti-test/,west OPS016_mBAL_RNA_246_L11_S133,s3://czb-seqbot/fastqs/20181030_FS10000331_12_BNT40322-1214/OPS016_mBAL_RNA_246_L11_S133_R1_001.fastq.gz,s3://czb-seqbot/fastqs/20181030_FS10000331_12_BNT40322-1214/OPS016_mBAL_RNA_246_L11_S133_R2_001.fastq.gz,OPS016_mBAL_RNA_246_L11_S133,mouse/vM19,s3://olgabot-maca/aguamenti-test/,west ➜ minirun git:(master) ✗ cat config.json {"program": "/home/ubuntu/reflow-workflows/workflows/rnaseq.rf", "runs_file": "samples.csv"} ```

Here's a screenshot of the final output, showing that the existing instances haven't been shut down properly after the reflow runbatch failed.

screen shot 2018-12-18 at 9 49 37 am
prasadgopal commented 5 years ago

The crash is a bug. That said, I think your cache (assoc) configuration is not working. This crash occurs because the assoc is nil. Can you make sure your dynamodb exists and your config has an assoc field that has the correct dynamodb table name?

olgabot commented 5 years ago

Thanks, yes I was able to fix it by adding the fields to our config. Since not everyone has permission to create tables in our AWS account, we simply concatenate the fields to the config.yaml file:

# Add czbiohub caches to the config
REFLOW_CONFIG=$HOME/.reflow/config.yaml
echo 'repository: s3,czbiohub-reflow-quickstart-cache' >> $REFLOW_CONFIG
echo 'assoc: dynamodb,czbiohub-reflow-quickstart' >> $REFLOW_CONFIG