grailbio / reflow

A language and runtime for distributed, incremental data processing in the cloud
Apache License 2.0
965 stars 52 forks source link

"resources exhausted" for very reasonable requests #86

Closed olgabot closed 5 years ago

olgabot commented 6 years ago

Hello! I'm trying to run a download_sra.rf script (in "Details" below) to get the FASTQ files of an SRA project. For the FastqDump step, I keep getting a "resources exausted" error even when my requests are very reasonable:

resources exhausted: eval download_sra.FastqDump.outdir: requested resources {mem:500.0MiB cpu:8 disk:50.0GiB} exceeds total available {mem:6.9GiB cpu:1 disk:2.4TiB intel_avx:2 intel_avx2:2}

Do you know what may be happening?

```golang param ( // Can be any of SRR, ERR, PRJNA, or SRX ids. // Pipe-separate for multiple, e.g. 'SRR1539523|SRR1539569|SRR1539570' sra_id string // S3 folder location to put the downloaded files output string // GiB of memory for samtools cat fastq_dump_threads = 8 // GiB of storage for downloading SRA files (per file) sra_disk = 50 // GiB of storage for converting to fastq.gz files (per file) fastq_dump_disk = 50 ) // Docker images val bionode = "bionode/bionode-ncbi" val fastq_dump = "quay.io/biocontainers/parallel-fastq-dump:0.6.3--py36_1" // System modules included with Reflow val dirs = make("$/dirs") val files = make("$/files") val strings = make("$/strings") func SearchSRA(sra_id string) = exec(image := bionode) (json file) {" bionode-ncbi search sra {{sra_id}} > {{json}} "} // Outputs a folder with $UID/$SRA.sra, e.g.: // $ ls -lha */*.sra // -rw-rw-r-- 1 ubuntu ubuntu 3.6G May 16 19:57 285026/SRR629557.sra // -rw-rw-r-- 1 ubuntu ubuntu 4.4G May 16 19:59 285027/SRR629559.sra // -rw-rw-r-- 1 ubuntu ubuntu 4.0G May 16 20:00 285028/SRR629561.sra // -rw-rw-r-- 1 ubuntu ubuntu 1.8G May 16 20:01 285029/SRR629562.sra func DownloadSRA(sra_id string) ={ outdir := exec(image := bionode, disk := sra_disk*GiB) (outdir dir) {" cd {{outdir}} bionode-ncbi download sra {{sra_id}} "} sra_files := dirs.Files(outdir) sra_files } // Convert SRA files to FastQ // Recommended flags from https://edwards.sdsu.edu/research/fastq-dump/ // and Trinity documentation: // > If your data come from SRA, be sure to dump the fastq file like so: // > SRA_TOOLKIT/fastq-dump --defline-seq '@$sn[_$rn]/$ri' --split-files file.sra func FastqDump(sra file) ={ outdir := exec(image := fastq_dump, cpu := fastq_dump_threads, disk := fastq_dump_disk*GiB) (outdir dir) {" parallel-fastq-dump --outdir {{outdir}} --gzip \ --skip-technical --readids --read-filter pass \ --dumpbase --split-3 --clip --defline-seq '@$sn[_$rn]/$ri' \ --split-files --threads {{fastq_dump_threads}} \ {{sra}} "} fastqs := dirs.Files(outdir) fastqs } sra_ids := strings.Split(sra_id, "|") sras := flatten([DownloadSRA(sra) | sra <- sra_ids]) json_metadata := SearchSRA(sra_id) fastqs := flatten([FastqDump(sra) | sra <- sras]) val Main = [files.Copy(fastq, output) | fastq <- fastqs] ```

Here's the full error:

 ✘  Wed 31 Oct - 17:56  ~/code/reflow-workflows   origin ☊ olgabot/download-sra ✔ 
  make test_download_sra
reflow run download_sra.rf -sra_id='SRR1539523|SRR1539569|SRR1539570' -output=s3://olgabot-maca/test-download-sra/spider-transcriptome/
reflow: run ID: fad83951
 transfer flow d155ca1c state FlowTransfer {mem:500.0MiB cpu:8 disk:50.0GiB} exec image quay.io/biocontainers/parallel-fastq-dump:0.6.3--py36_1 cmd "\n        parallel-fastq-dump --outdir %s --gzip \\\n            --skip-technical  --readids --read-filter pass \\\n            --dumpbase --split-3 --clip --defline-seq '@$sn[_$rn]/$ri' \\\n            --split-files --threads 8 \\\n            %s\n    " deps 7cd6751a error: transfer sha256:f4f7fd2b6aff9259448e92d3ba52f5abcad50f3132386a3d38f3658b80a12353: canceled:
        readfrom sha256:f4f7fd2b6aff9259448e92d3ba52f5abcad50f3132386a3d38f3658b80a12353 s3r://czbiohub-reflow-quickstart-cache: context canceled
reflow: cache transfer flow 4ee13168 state FlowTransfer {mem:500.0MiB cpu:8 disk:50.0GiB} exec image quay.io/biocontainers/parallel-fastq-dump:0.6.3--py36_1 cmd "\n        parallel-fastq-dump --outdir %s --gzip \\\n            --skip-technical  --readids --read-filter pass \\\n            --dumpbase --split-3 --clip --defline-seq '@$sn[_$rn]/$ri' \\\n            --split-files --threads 8 \\\n            %s\n    " deps 14612bf6 error: transfer sha256:96ba76c2f963f97d2e851b4a97a7998aee399155c8423ac2fab6298d36fbb1ab: canceled:
        readfrom sha256:96ba76c2f963f97d2e851b4a97a7998aee399155c8423ac2fab6298d36fbb1ab s3r://czbiohub-reflow-quickstart-cache: context canceled
resources exhausted: eval download_sra.FastqDump.outdir: requested resources {mem:500.0MiB cpu:8 disk:50.0GiB} exceeds total available {mem:6.9GiB cpu:1 disk:2.4TiB intel_avx:2 intel_avx2:2}
Makefile:2: recipe for target 'test_download_sra' failed
make: *** [test_download_sra] Error 1
prasadgopal commented 6 years ago

This needs to be better documented.

From https://github.com/grailbio/reflow/blob/master/README.md

"// final expression. Finally, Main contains a @requires annotation.

// This instructs Reflow how many resources to reserve for the work // being done. Note that, because Reflow is able to distribute work, // if a single instance is too small to execute fully in parallel, // Reflow will provision additional compute instances to help along. // @requires thus denotes the smallest possible instance // configuration that's required for the program."

This means that Main needs a @requires annotation with the largest set of resources required by any single exec. Please try @requires(cpu := 8) right above val Main = ....

On Wed, Oct 31, 2018 at 6:59 PM Olga Botvinnik notifications@github.com wrote:

Hello! I'm trying to run a download_sra.rf script (in "Details" below) to get the FASTQ files of an SRA project. For the FastqDump step, I keep getting a "resources exausted" error even when my requests are very reasonable:

resources exhausted: eval download_sra.FastqDump.outdir: requested resources {mem:500.0MiB cpu:8 disk:50.0GiB} exceeds total available {mem:6.9GiB cpu:1 disk:2.4TiB intel_avx:2 intel_avx2:2}

Do you know what may be happening?


Pipe-separate for multiple, e.g. 'SRR1539523|SRR1539569|SRR1539570' sra_id
string

// S3 folder location to put the downloaded files
output string

// GiB of memory for samtools cat
fastq_dump_threads = 8

// GiB of storage for downloading SRA files (per file)
sra_disk = 50

// GiB of storage for converting to fastq.gz files (per file)
fastq_dump_disk = 50

)

// Docker images
val bionode = "bionode/bionode-ncbi"
val fastq_dump = "quay.io/biocontainers/parallel-fastq-dump:0.6.3--py36_1"

// System modules included with Reflow
val dirs = make("$/dirs")
val files = make("$/files")
val strings = make("$/strings")

func SearchSRA(sra_id string) =
exec(image := bionode) (json file) {"
bionode-ncbi search sra {{sra_id}} > {{json}}
"}

// Outputs a folder with $UID/$SRA.sra, e.g.:
// $ ls -lha */*.sra
// -rw-rw-r-- 1 ubuntu ubuntu 3.6G May 16 19:57 285026/SRR629557.sra
// -rw-rw-r-- 1 ubuntu ubuntu 4.4G May 16 19:59 285027/SRR629559.sra
// -rw-rw-r-- 1 ubuntu ubuntu 4.0G May 16 20:00 285028/SRR629561.sra
// -rw-rw-r-- 1 ubuntu ubuntu 1.8G May 16 20:01 285029/SRR629562.sra
func DownloadSRA(sra_id string) ={
outdir := exec(image := bionode, disk := sra_disk*GiB) (outdir dir) {"
cd {{outdir}}
bionode-ncbi download sra {{sra_id}}
"}

sra_files := dirs.Files(outdir)
sra_files

}

// Convert SRA files to FastQ
// Recommended flags from https://edwards.sdsu.edu/research/fastq-dump/
// and Trinity documentation:
// > If your data come from SRA, be sure to dump the fastq file like so:
// > SRA_TOOLKIT/fastq-dump --defline-seq '@$sn[

*$rn]/$ri' --split-files file.sra func FastqDump(sra file) ={ outdir :=
exec(image := fastq_dump, cpu := fastq_dump_threads, disk :=
fastq_dump_disk*GiB) (outdir dir) {" parallel-fastq-dump --outdir
{{outdir}} --gzip --skip-technical --readids --read-filter pass --dumpbase
--split-3 --clip --defline-seq '@$sn[*$rn]/$ri'
--split-files --threads {{fastq_dump_threads}}
{{sra}}
"}
fastqs := dirs.Files(outdir)
fastqs
}
sra_ids := strings.Split(sra_id, "|")
sras := flatten([DownloadSRA(sra) | sra <- sra_ids])
json_metadata := SearchSRA(sra_id)
fastqs := flatten([FastqDump(sra) | sra <- sras])

val Main = [files.Copy(fastq, output) | fastq <- fastqs]

</details>

Here's the full error:

✘  Wed 31 Oct - 17:56  ~/code/reflow-workflows   origin ☊
olgabot/download-sra ✔ 
 make test_download_sra
reflow run download_sra.rf -sra_id='SRR1539523|SRR1539569|SRR1539570'
-output=s3://olgabot-maca/test-download-sra/spider-transcriptome/
reflow: run ID: fad83951
transfer flow d155ca1c state FlowTransfer {mem:500.0MiB cpu:8
disk:50.0GiB} exec image
quay.io/biocontainers/parallel-fastq-dump:0.6.3--py36_1 cmd "\n
parallel-fastq-dump --outdir %s --gzip \\n --skip-technical --readids
--read-filter pass \\n --dumpbase --split-3 --clip --defline-seq '@$sn[

*$rn]/$ri' \\n --split-files --threads 8 \\n %s\n " deps 7cd6751a error:
transfer
sha256:f4f7fd2b6aff9259448e92d3ba52f5abcad50f3132386a3d38f3658b80a12353:
canceled: readfrom
sha256:f4f7fd2b6aff9259448e92d3ba52f5abcad50f3132386a3d38f3658b80a12353
s3r://czbiohub-reflow-quickstart-cache: context canceled reflow: cache
transfer flow 4ee13168 state FlowTransfer {mem:500.0MiB cpu:8 disk:50.0GiB}
exec image quay.io/biocontainers/parallel-fastq-dump:0.6.3--py36_1
<http://quay.io/biocontainers/parallel-fastq-dump:0.6.3--py36_1> cmd "\n
parallel-fastq-dump --outdir %s --gzip \\n --skip-technical --readids
--read-filter pass \\n --dumpbase --split-3 --clip --defline-seq '@$sn[*$rn]/$ri'
\\n --split-files --threads 8 \\n %s\n " deps 14612bf6 error: transfer
sha256:96ba76c2f963f97d2e851b4a97a7998aee399155c8423ac2fab6298d36fbb1ab:
canceled:
readfrom
sha256:96ba76c2f963f97d2e851b4a97a7998aee399155c8423ac2fab6298d36fbb1ab
s3r://czbiohub-reflow-quickstart-cache: context canceled
resources exhausted: eval download_sra.FastqDump.outdir: requested
resources {mem:500.0MiB cpu:8 disk:50.0GiB} exceeds total available
{mem:6.9GiB cpu:1 disk:2.4TiB intel_avx:2 intel_avx2:2}
Makefile:2: recipe for target 'test_download_sra' failed
make: *** [test_download_sra] Error 1

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<https://github.com/grailbio/reflow/issues/86>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AfC0Q45ea9-iHfw-aF6AoG072ClTJlsRks5uqlVngaJpZM4YFpeT>
.

--

This email message, including attachments, may contain private, proprietary, or privileged information and is the confidential information and/or property of GRAIL, Inc., and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

olgabot commented 5 years ago

Yay that worked! Thank you