Closed olgabot closed 5 years ago
This needs to be better documented.
From https://github.com/grailbio/reflow/blob/master/README.md
"// final expression. Finally, Main contains a @requires annotation.
// This instructs Reflow how many resources to reserve for the work // being done. Note that, because Reflow is able to distribute work, // if a single instance is too small to execute fully in parallel, // Reflow will provision additional compute instances to help along. // @requires thus denotes the smallest possible instance // configuration that's required for the program."
This means that Main needs a @requires annotation with the largest set of
resources required by any single exec.
Please try @requires(cpu := 8)
right above val Main = ...
.
On Wed, Oct 31, 2018 at 6:59 PM Olga Botvinnik notifications@github.com wrote:
Hello! I'm trying to run a download_sra.rf script (in "Details" below) to get the FASTQ files of an SRA project. For the FastqDump step, I keep getting a "resources exausted" error even when my requests are very reasonable:
resources exhausted: eval download_sra.FastqDump.outdir: requested resources {mem:500.0MiB cpu:8 disk:50.0GiB} exceeds total available {mem:6.9GiB cpu:1 disk:2.4TiB intel_avx:2 intel_avx2:2}
Do you know what may be happening?
Pipe-separate for multiple, e.g. 'SRR1539523|SRR1539569|SRR1539570' sra_id string // S3 folder location to put the downloaded files output string // GiB of memory for samtools cat fastq_dump_threads = 8 // GiB of storage for downloading SRA files (per file) sra_disk = 50 // GiB of storage for converting to fastq.gz files (per file) fastq_dump_disk = 50 ) // Docker images val bionode = "bionode/bionode-ncbi" val fastq_dump = "quay.io/biocontainers/parallel-fastq-dump:0.6.3--py36_1" // System modules included with Reflow val dirs = make("$/dirs") val files = make("$/files") val strings = make("$/strings") func SearchSRA(sra_id string) = exec(image := bionode) (json file) {" bionode-ncbi search sra {{sra_id}} > {{json}} "} // Outputs a folder with $UID/$SRA.sra, e.g.: // $ ls -lha */*.sra // -rw-rw-r-- 1 ubuntu ubuntu 3.6G May 16 19:57 285026/SRR629557.sra // -rw-rw-r-- 1 ubuntu ubuntu 4.4G May 16 19:59 285027/SRR629559.sra // -rw-rw-r-- 1 ubuntu ubuntu 4.0G May 16 20:00 285028/SRR629561.sra // -rw-rw-r-- 1 ubuntu ubuntu 1.8G May 16 20:01 285029/SRR629562.sra func DownloadSRA(sra_id string) ={ outdir := exec(image := bionode, disk := sra_disk*GiB) (outdir dir) {" cd {{outdir}} bionode-ncbi download sra {{sra_id}} "} sra_files := dirs.Files(outdir) sra_files } // Convert SRA files to FastQ // Recommended flags from https://edwards.sdsu.edu/research/fastq-dump/ // and Trinity documentation: // > If your data come from SRA, be sure to dump the fastq file like so: // > SRA_TOOLKIT/fastq-dump --defline-seq '@$sn[ *$rn]/$ri' --split-files file.sra func FastqDump(sra file) ={ outdir := exec(image := fastq_dump, cpu := fastq_dump_threads, disk := fastq_dump_disk*GiB) (outdir dir) {" parallel-fastq-dump --outdir {{outdir}} --gzip --skip-technical --readids --read-filter pass --dumpbase --split-3 --clip --defline-seq '@$sn[*$rn]/$ri' --split-files --threads {{fastq_dump_threads}} {{sra}} "} fastqs := dirs.Files(outdir) fastqs } sra_ids := strings.Split(sra_id, "|") sras := flatten([DownloadSRA(sra) | sra <- sra_ids]) json_metadata := SearchSRA(sra_id) fastqs := flatten([FastqDump(sra) | sra <- sras]) val Main = [files.Copy(fastq, output) | fastq <- fastqs] </details> Here's the full error: ✘ Wed 31 Oct - 17:56 ~/code/reflow-workflows origin ☊ olgabot/download-sra ✔ make test_download_sra reflow run download_sra.rf -sra_id='SRR1539523|SRR1539569|SRR1539570' -output=s3://olgabot-maca/test-download-sra/spider-transcriptome/ reflow: run ID: fad83951 transfer flow d155ca1c state FlowTransfer {mem:500.0MiB cpu:8 disk:50.0GiB} exec image quay.io/biocontainers/parallel-fastq-dump:0.6.3--py36_1 cmd "\n parallel-fastq-dump --outdir %s --gzip \\n --skip-technical --readids --read-filter pass \\n --dumpbase --split-3 --clip --defline-seq '@$sn[ *$rn]/$ri' \\n --split-files --threads 8 \\n %s\n " deps 7cd6751a error: transfer sha256:f4f7fd2b6aff9259448e92d3ba52f5abcad50f3132386a3d38f3658b80a12353: canceled: readfrom sha256:f4f7fd2b6aff9259448e92d3ba52f5abcad50f3132386a3d38f3658b80a12353 s3r://czbiohub-reflow-quickstart-cache: context canceled reflow: cache transfer flow 4ee13168 state FlowTransfer {mem:500.0MiB cpu:8 disk:50.0GiB} exec image quay.io/biocontainers/parallel-fastq-dump:0.6.3--py36_1 <http://quay.io/biocontainers/parallel-fastq-dump:0.6.3--py36_1> cmd "\n parallel-fastq-dump --outdir %s --gzip \\n --skip-technical --readids --read-filter pass \\n --dumpbase --split-3 --clip --defline-seq '@$sn[*$rn]/$ri' \\n --split-files --threads 8 \\n %s\n " deps 14612bf6 error: transfer sha256:96ba76c2f963f97d2e851b4a97a7998aee399155c8423ac2fab6298d36fbb1ab: canceled: readfrom sha256:96ba76c2f963f97d2e851b4a97a7998aee399155c8423ac2fab6298d36fbb1ab s3r://czbiohub-reflow-quickstart-cache: context canceled resources exhausted: eval download_sra.FastqDump.outdir: requested resources {mem:500.0MiB cpu:8 disk:50.0GiB} exceeds total available {mem:6.9GiB cpu:1 disk:2.4TiB intel_avx:2 intel_avx2:2} Makefile:2: recipe for target 'test_download_sra' failed make: *** [test_download_sra] Error 1 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/grailbio/reflow/issues/86>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AfC0Q45ea9-iHfw-aF6AoG072ClTJlsRks5uqlVngaJpZM4YFpeT> .
--
This email message, including attachments, may contain private, proprietary, or privileged information and is the confidential information and/or property of GRAIL, Inc., and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
Yay that worked! Thank you
Hello! I'm trying to run a
download_sra.rf
script (in "Details" below) to get the FASTQ files of an SRA project. For theFastqDump
step, I keep getting a "resources exausted" error even when my requests are very reasonable:Do you know what may be happening?
Here's the full error: