nf-core / chipseq

ChIP-seq peak-calling, QC and differential analysis pipeline.
https://nf-co.re/chipseq
MIT License
190 stars 145 forks source link

cannot download and run test data #189

Closed sunliang3361 closed 4 years ago

sunliang3361 commented 4 years ago

Hi, I was using the singularity to download and run the command line below: nextflow run nf-core/chipseq -profile test, singularity

and it throw the error message below, any idea? I also attached the whole log file with this post. Thank you.

Liang

------ERROR MESSAGE--------- nextflow.log

Oct-12 05:25:32.752 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 6 -- submitted tasks are shown below ~> TaskHandler[id: 57; name: PHANTOMPEAKQUALTOOLS (SPT5_T0_R1); status: RUNNING; exit: -; error: -; workDir: /lab-share/RC-Data-Science-e2/Public/Liang/p14_Wenxiang/work/69/85501f4e3a77ef2ad96073e2026b49] ~> TaskHandler[id: 54; name: PHANTOMPEAKQUALTOOLS (SPT5_T15_R1); status: RUNNING; exit: -; error: -; workDir: /lab-share/RC-Data-Science-e2/Public/Liang/p14_Wenxiang/work/7e/692b92aeee92c5be6c6b8aaf1e817d] ~> TaskHandler[id: 61; name: PHANTOMPEAKQUALTOOLS (SPT5_T0_R2); status: RUNNING; exit: -; error: -; workDir: /lab-share/RC-Data-Science-e2/Public/Liang/p14_Wenxiang/work/7f/9c0ef16a0425153ee9c47017253a18] ~> TaskHandler[id: 64; name: PHANTOMPEAKQUALTOOLS (SPT5_T15_R2); status: RUNNING; exit: -; error: -; workDir: /lab-share/RC-Data-Science-e2/Public/Liang/p14_Wenxiang/work/44/8e9903dc8a91ac5f13c4bfa9227983] ~> TaskHandler[id: 72; name: PHANTOMPEAKQUALTOOLS (SPT5_INPUT_R2); status: RUNNING; exit: -; error: -; workDir: /lab-share/RC-Data-Science-e2/Public/Liang/p14_Wenxiang/work/59/2bb1d23c4e2498562224083c6f768e] ~> TaskHandler[id: 80; name: PHANTOMPEAKQUALTOOLS (SPT5_INPUT_R1); status: RUNNING; exit: -; error: -; workDir: /lab-share/RC-Data-Science-e2/Public/Liang/p14_Wenxiang/work/03/cd81ad85c2b4b753f0e99884039d56] Oct-12 05:26:27.337 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 57; name: PHANTOMPEAKQUALTOOLS (SPT5_T0_R1); status: COMPLETED; exit: -; error: nextflow.exception.ProcessException: Process exceeded running time limit (8h); workDir: /lab-share/RC-Data-Science-e2/Public/Liang/p14_Wenxiang/work/69/85501f4e3a77ef2ad96073e2026b49] Oct-12 05:26:27.409 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'PHANTOMPEAKQUALTOOLS (SPT5_T0_R1)'

Caused by: Process exceeded running time limit (8h)

Command executed:

RUN_SPP=which run_spp.R Rscript -e "library(caTools); source(\"$RUN_SPP\")" -c="SPT5_T0_R1.mLb.clN.sorted.bam" -savp="SPT5_T0_R1.spp.pdf" -savd="SPT5_T0_R1.spp.Rdata" -out="SPT5_T0_R1.spp.out" -p=2 cp spp_correlation_header.txt SPT5_T0_R1_spp_correlation_mqc.tsv Rscript -e "load('SPT5_T0_R1.spp.Rdata'); write.table(crosscorr\$cross.correlation, file=\"SPT5_T0_R1_spp_correlation_mqc.tsv\", sep=",", quote=FALSE, row.names=FALSE, col.names=FALSE,append=TRUE)"

awk -v OFS=' ' '{print "SPT5_T0_R1", $9}' SPT5_T0_R1.spp.out | cat spp_nsc_header.txt - > SPT5_T0_R1_spp_nsc_mqc.tsv awk -v OFS=' ' '{print "SPT5_T0_R1", $10}' SPT5_T0_R1.spp.out | cat spp_rsc_header.txt - > SPT5_T0_R1_spp_rsc_mqc.tsv

Command exit status:

Command output: ################ ChIP data: SPT5_T0_R1.mLb.clN.sorted.bam Control data: NA strandshift(min): -500 strandshift(step): 5 strandshift(max) 1500 user-defined peak shift NA exclusion(min): 10 exclusion(max): NaN num parallel nodes: 2 FDR threshold: 0.01 NumPeaks Threshold: NA Output Directory: . narrowPeak output file name: NA regionPeak output file name: NA Rdata filename: SPT5_T0_R1.spp.Rdata plot pdf filename: SPT5_T0_R1.spp.pdf result filename: SPT5_T0_R1.spp.out Overwrite files?: FALSE

sunliang3361 commented 4 years ago

any ideas why

drpatelh commented 4 years ago

HI @sunliang3361 ! Apologies for the late response. It seems like the process that is failing has exceeded the default resource requirements, in this case the time restriction:

Caused by:
Process exceeded running time limit (8h)

No idea why it should be taking that long on your compute set-up because the whole pipeline test should finish in 30-40 mins. Can you try by adding --skip_spp to the command-line and seeing if the pipeline completes?

sunliang3361 commented 4 years ago

Thank you for your response. If I use --skip_spp, the pipeline can be completed. By using this parameter, how will this affect the final result? I guess it's just a QC step and will not affect so much, right?

sunliang3361 commented 4 years ago

when I tried to run my own samples. I got the following error message. Is this the reference genome issue? Thanks.


Oct-15 09:24:08.663 [main] DEBUG nextflow.cli.Launcher - $> nextflow run nf-core/chipseq -profile singularity --input samples.csv --genome GRCm38 --max_memory 40.GB --max_cpus 8 --outdir results --skip_diff_analysis --skip_spp -r 1.2.1 Oct-15 09:24:08.866 [main] INFO nextflow.cli.CmdRun - N E X T F L O W ~ version 20.01.0 Oct-15 09:24:09.571 [main] DEBUG nextflow.scm.AssetManager - Git config: /home/ch213537/.nextflow/assets/nf-core/chipseq/.git/config; branch: master; remote: origin; url: https://github.com/nf-core/chipseq.git Oct-15 09:24:09.581 [main] DEBUG nextflow.scm.AssetManager - Git config: /home/ch213537/.nextflow/assets/nf-core/chipseq/.git/config; branch: master; remote: origin; url: https://github.com/nf-core/chipseq.git Oct-15 09:24:21.191 [main] DEBUG nextflow.scm.AssetManager - Git config: /home/ch213537/.nextflow/assets/nf-core/chipseq/.git/config; branch: master; remote: origin; url: https://github.com/nf-core/chipseq.git Oct-15 09:24:21.192 [main] INFO nextflow.cli.CmdRun - Launching nf-core/chipseq [marvelous_poincare] - revision: 0f487ed76d [1.2.1] Oct-15 09:24:21.729 [main] DEBUG nextflow.config.ConfigBuilder - Found config base: /home/ch213537/.nextflow/assets/nf-core/chipseq/nextflow.config Oct-15 09:24:21.731 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /home/ch213537/.nextflow/assets/nf-core/chipseq/nextflow.config Oct-15 09:24:21.745 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: singularity Oct-15 09:24:22.483 [main] DEBUG nextflow.config.ConfigBuilder - Available config profiles: [cfc_dev, denbi_qbic, bi, genotoul, bigpurple, uppmax, docker, gis, utd_ganymede, conda, singularity, icr_davros, munin, prince, czbiohub_aws, hebbe, cfc, uzh, ccga_med, debug, test, genouest, cbe, ebc, ccga_dx, crick, google, kraken, phoenix, shh, awsbatch, pasteur, uct_hpc, test_full, binac] Oct-15 09:24:22.554 [main] DEBUG nextflow.Session - Session uuid: 2b400376-b58e-474c-8494-1fd368869076 Oct-15 09:24:22.554 [main] DEBUG nextflow.Session - Run name: marvelous_poincare Oct-15 09:24:22.555 [main] DEBUG nextflow.Session - Executor pool size: 40 Oct-15 09:24:22.579 [main] DEBUG nextflow.cli.CmdRun - Version: 20.01.0 build 5264 Created: 12-02-2020 10:14 UTC (05:14 EDT) System: Linux 3.10.0-1062.12.1.el7.x86_64 Runtime: Groovy 2.5.8 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_144-b01 Encoding: UTF-8 (UTF-8) Process: 5457@gpu-1-0.tch.harvard.edu [10.36.131.248] CPUs: 40 - Mem: 187.6 GB (168.3 GB) - Swap: 0 (0) Oct-15 09:24:22.692 [main] DEBUG nextflow.Session - Work-dir: /lab-share/RC-Data-Science-e2/Public/Liang/p14_Wenxiang/p14/work [nfs] Oct-15 09:24:22.724 [main] DEBUG nextflow.Session - Observer factory: TowerFactory Oct-15 09:24:22.725 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory Oct-15 09:24:22.985 [main] DEBUG nextflow.Session - Session start invoked Oct-15 09:24:22.989 [main] DEBUG nextflow.trace.TraceFileObserver - Flow starting -- trace file: /lab-share/RC-Data-Science-e2/Public/Liang/p14_Wenxiang/p14/results/pipeline_info/execution_trace.txt Oct-15 09:24:23.747 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution Oct-15 09:24:23.757 [main] DEBUG nextflow.Session - Workflow process names [dsl1]: FASTQC, output_documentation, CONSENSUS_PEAKS, MULTIQC, SORT_BAM, CONSENSUS_PEAKS_DESEQ2, PHANTOMPEAKQUALTOOLS, BWA_MEM, BIGWIG, CONSENSUS_PEAKS_COUNTS, BWA_INDEX, get_software_versions, CHECK_DESIGN, MERGED_BAM_REMOVE_ORPHAN, MAKE_GENE_BED, CONSENSUS_PEAKS_ANNOTATE, TRIMGALORE, PLOTPROFILE, MACS2_QC, PICARD_METRICS, PLOTFINGERPRINT, MERGED_BAM_FILTER, MACS2, MACS2_ANNOTATE, MERGED_BAM, IGV, PRESEQ, MAKE_GENOME_FILTER Oct-15 09:24:24.025 [main] DEBUG nextflow.file.FileHelper - Creating a file system instance for provider: S3FileSystemProvider Oct-15 09:24:24.036 [main] DEBUG nextflow.Global - Using AWS temporary session credentials defined in default section in file: /home/ch213537/.aws/credentials Oct-15 09:24:24.037 [main] DEBUG nextflow.file.FileHelper - Using AWS temporary session token for S3FS. Oct-15 09:24:24.053 [main] DEBUG nextflow.file.FileHelper - AWS S3 config details: {session_key=FwoGZX.., secret_key=8VOQ6V.., region=us-east-1, access_key=ASIA5F..} Oct-15 09:24:24.722 [main] WARN c.a.a.p.i.BasicProfileConfigLoader - Your profile name includes a 'profile ' prefix. This is considered part of the profile name in the Java SDK, so you will need to include this prefix in your profile name when you reference this profile from your Java code. Oct-15 09:24:25.432 [main] DEBUG nextflow.Session - Session aborted -- Cause: The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'eu-west-1' (Service: Amazon S3; Status Code: 400; Error Code: AuthorizationHeaderMalformed; Request ID: 8JCR5ZAH0R3GCS9M; S3 Extended Request ID: wwgKyeqItk9uQpWH9XTherNK2Yo+aY3FpwTFPuWBQgbbSxayWnmUbL1f0VAw8k+ajoxvaw6DT1E=) Oct-15 09:24:25.448 [main] ERROR nextflow.cli.Launcher - @unknown com.amazonaws.services.s3.model.AmazonS3Exception: The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'eu-west-1' (Service: Amazon S3; Status Code: 400; Error Code: AuthorizationHeaderMalformed; Request ID: 8JCR5ZAH0R3GCS9M; S3 Extended Request ID: wwgKyeqItk9uQpWH9XTherNK2Yo+aY3FpwTFPuWBQgbbSxayWnmUbL1f0VAw8k+ajoxvaw6DT1E=) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4914) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4860) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4854) at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:880) at com.upplication.s3fs.AmazonS3Client.listObjects(AmazonS3Client.java:104) at com.upplication.s3fs.util.S3ObjectSummaryLookup.lookup(S3ObjectSummaryLookup.java:116) at com.upplication.s3fs.S3FileSystemProvider.getAccessControl(S3FileSystemProvider.java:894) at com.upplication.s3fs.S3FileSystemProvider.checkAccess(S3FileSystemProvider.java:588) at java.nio.file.Files.exists(Files.java:2385) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.callsite.StaticMetaMethodSite.invoke(StaticMetaMethodSite.java:44) at org.codehaus.groovy.runtime.callsite.StaticMetaMethodSite.call(StaticMetaMethodSite.java:89) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.StaticMetaMethodSite.call(StaticMetaMethodSite.java:94) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:127) at nextflow.extension.FilesEx.exists(FilesEx.groovy:453) at nextflow.file.FileHelper.checkIfExists(FileHelper.groovy:971) at nextflow.file.FileHelper$checkIfExists$1.call(Unknown Source) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at nextflow.file.FileHelper$checkIfExists$1.call(Unknown Source) at nextflow.Nextflow.file(Nextflow.groovy:158) at nextflow.Nextflow$file.callStatic(Unknown Source) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallStatic(CallSiteArray.java:55) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callStatic(AbstractCallSite.java:196) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callStatic(AbstractCallSite.java:216) at Script_cc06130a.runScript(Script_cc06130a:171) at nextflow.script.BaseScript.runDsl1(BaseScript.groovy:153) at nextflow.script.BaseScript.run(BaseScript.groovy:189) at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:225) at nextflow.script.ScriptRunner.run(ScriptRunner.groovy:218) at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:126) at nextflow.cli.CmdRun.run(CmdRun.groovy:273) at nextflow.cli.Launcher.run(Launcher.groovy:460) at nextflow.cli.Launcher.main(Launcher.groovy:642)

drpatelh commented 4 years ago

Hi @sunliang3361 It shouldn't be much of a loss if you --skip_spp. It just provides some additional QC and I included it because its something that the ENCODE do routinely, although I almost never look at the results for this. I would be tempted to try the command without --skip_spp to see if the pipeline finishes. It could just have been an intermittent issue unless you have seen it happen reproducibly.

The issue you are seeing above is related to this file in your home directory (can you rename this file temporarily and try again):

Oct-15 09:24:24.036 [main] DEBUG nextflow.Global - Using AWS temporary session credentials defined in default section in file: /home/ch213537/.aws/credentials

I am assuming you have/are using AWS for other stuff and the region specified there is conflicting with the region required to download the reference genome data as indicated by this message:

Oct-15 09:24:25.432 [main] DEBUG nextflow.Session - Session aborted -- Cause: The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'eu-west-1' (Service: Amazon S3; Status Code: 400; Error Code: AuthorizationHeaderMalformed; Request ID: 8JCR5ZAH0R3GCS9M; S3 Extended Request ID:

Unfortunately, this is something that has to be fixed on the Nextflow side so the easiest solution is to temporarily rename that file, run the pipeline to download all of the genome data, save it all in the results/genome/ directory using the --save_reference parameter, and then the next time you run the pipeline you can adjust the parameters to use your local copy so you dont have to download it again i.e.

nextflow run nf-core/chipseq \
    --input samples.csv \
    --genome GRCm38 \
    --fasta <path/to/downloaded/fasta> \
    --gtf </path/to/downloaded/gtf> \
    --bwa_index </path/to/downloaded/bwa/index> \
    --max_memory 40.GB \
    --max_cpus 8 \
    --outdir results \
    --email <your_email_address> \
    --skip_diff_analysis
    -profile singularity
    -r 1.2.1
sunliang3361 commented 4 years ago

Thank you. if I don't rename the aws file and download genome fasta, gtf and bwa_index to local folder, I can directly use --fasta, --gtf, and --bwa_index, and no need to use --genome (I think you have this parameter in the above reply), right?

drpatelh commented 4 years ago

Not exactly, its better to use --genome too because there are some settings that come from using that parameter that would normally have to be specified manually.e.g. these two.

If you want to download everything separately first then thats fine but if you don't want to use --genome then you would have to specify at least the following:

nextflow run nf-core/chipseq \
    --input samples.csv \
    --fasta <path/to/downloaded/fasta> \
    --gtf </path/to/downloaded/gtf> \
    --bwa_index </path/to/downloaded/bwa/index> \
    --macs_gsize 1.87e9 \
    --blacklist </path/to/downloaded/blacklist/file> \
    --max_memory 40.GB \
    --max_cpus 8 \
    --outdir results \
    --email <your_email_address> \
    --skip_diff_analysis
    -profile singularity
    -r 1.2.1
sunliang3361 commented 4 years ago

Thank you so much. The pipeline seems working now.

drpatelh commented 4 years ago

No worries. Glad it's working 🙂