v2.2.2 is not found by nextflow

IdoBar commented 3 years ago

Check Documentation

I have checked the following places for your error:

[X] nf-core website: troubleshooting
[X] nf-core/eager pipeline documentation
- nf-core/eager FAQ/troubleshooting can be found here

Description of the bug

The most recent version (v2.2.2) is not identified by nextflow

Steps to reproduce

Steps to reproduce the behaviour:

Command line: nextflow run nf-core/eager -r 2.2.2 -profile test_tsv

See error:

N E X T F L O W  ~  version 20.07.1
Cannot find revision `2.2.2` -- Make sure that it exists in the remote repository `https://github.com/nf-core/eager`

Also when running nextflow pull nf-core/eager, this is the output:

Checking nf-core/eager ...
done - revision: 7971d89e54 [2.2.1]

Expected behaviour

Should pull and run the most recent version

Log files

This is the content of .nextflow.log file:

Dec-19 00:36:01.000 [main] DEBUG nextflow.cli.Launcher - $> nextflow run nf-core/eager -r 2.2.2 -profile test_tsv,singularity
Dec-19 00:36:01.148 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 20.07.1
Dec-19 00:36:02.620 [main] DEBUG nextflow.scm.AssetManager - Git config: /home/ibar/.nextflow/assets/nf-core/eager/.git/config; branch: master; remote: origin; url: https://github.com/nf-core/eager.git
Dec-19 00:36:02.635 [main] DEBUG nextflow.scm.AssetManager - Git config: /home/ibar/.nextflow/assets/nf-core/eager/.git/config; branch: master; remote: origin; url: https://github.com/nf-core/eager.git
Dec-19 00:36:07.223 [main] DEBUG nextflow.cli.Launcher - Operation aborted
org.eclipse.jgit.api.errors.RefNotFoundException: Ref origin/2.2.2 cannot be resolved
        at org.eclipse.jgit.api.CreateBranchCommand.getStartPointObjectId(CreateBranchCommand.java:279)
        at org.eclipse.jgit.api.CreateBranchCommand.call(CreateBranchCommand.java:132)
        at org.eclipse.jgit.api.CheckoutCommand.call(CheckoutCommand.java:225)
        at nextflow.scm.AssetManager.checkoutRemoteBranch(AssetManager.groovy:930)
        at nextflow.scm.AssetManager.checkout(AssetManager.groovy:912)
        at nextflow.cli.CmdRun.getScriptFile(CmdRun.groovy:347)
        at nextflow.cli.CmdRun.run(CmdRun.groovy:246)
        at nextflow.cli.Launcher.run(Launcher.groovy:466)
        at nextflow.cli.Launcher.main(Launcher.groovy:648)

Have you provided the following extra information/files:

[X] The command used to run the pipeline
[X] The .nextflow.log file
[X] The exact error: Cannot find revision 2.2.2 -- Make sure that it exists in the remote repository https://github.com/nf-core/eager

System

Hardware:
Executor:
OS:
Version

Nextflow Installation

Version: [20.07.1]

IdoBar commented 3 years ago

Seem to work alright in a fresh folder and nextflow v20.10.0.5430

jfy133 commented 3 years ago

Hi @IdoBar thanks for the open and quick close.

It would be useful for us to know, however, whether when you tried running the first time with -r 2.2.2 (when you had got the error), had you run nextflow pull eager -r 2.2.2 already?

This has come up before, and if you had not run pull before, I think we will need to improve our documentation to make it clear that if you have not run a specific version before that you must use the pullcommand first. So your feedback would be useful for us to decide whether we need to clarify this or not!

IdoBar commented 3 years ago

Hi @jfy133, I think that I tried to run it without pulling first. Regardless of this, I still can't run the workflow (running on QRIS Awoonga) It's submitting the jobs to the cluster, but all the jobs get terminated instantly with exit status 1 and nothing informative in the logs (see example below)

-[nf-core/eager] Pipeline completed with errors-
Error executing process > 'fastqc (D11_L1)'
Caused by:
  Process `fastqc (D11_L1)` terminated with an error exit status (1)
Command executed:
  fastqc -t 1 -q D11_1.sampled1M.trimmed.fq.gz D11_2.sampled1M.trimmed.fq.gz
  rename 's/_fastqc\.zip$/_raw_fastqc.zip/' *_fastqc.zip
  rename 's/_fastqc\.html$/_raw_fastqc.html/' *_fastqc.html                                                                                          
Command exit status:
  1
Command output:
  (empty)
                                                                                                                                                     Command wrapper:
  ########################### Execution Started #############################
  JobId:507611.awonmgr2
  UserName:ibar
  GroupName:qris-gu
  ExecutionHost:aw128
  ###############################################################################
  ########################### Job Execution History #############################
  JobId:507611.awonmgr2
  UserName:ibar
  GroupName:qris-gu
  JobName:nf-fastqc_D11_L
  SessionId:72595
  ResourcesRequested:mem=4096mb,ncpus=1,place=free,walltime=04:00:00
  ResourcesUsed:cpupercent=0,cput=00:00:00,mem=0kb,ncpus=1,vmem=0kb,walltime=00:00:03
  QueueUsed:Short
  AccountString:qris-gu
  ExitStatus:1
  ###############################################################################
Work dir:
  /30days/ibar/data/Dingo/Dingo_aDNA_NF_process_20_12_2020/90/69516d849eaecd9557fed831af6a45
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out

When I edit the .command.run file on the work dir to keep only the commands from the nxf_stage() and nxf_main() and submit this to the cluster it runs alright and produces the correct output. My guess is that the jobs are getting killed somewhere in the process management functions (nxf_tree(), nxf_stat(), nxf_trace(), nxf_mem_watch(), etc.).
I can create a new issue if needed, but I'll scan through the archived ones to see if I missed something.

These are the only errors I could find in the log file:

Dec.-21 00:42:19.210 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[jobId: 507611.awonmgr2; id: 5; name: fastqc (D11_L1); status: COMPLETED; exit: 1; error: -; workDir: /3
0days/ibar/data/Dingo/Dingo_aDNA_NF_process_20_12_2020/90/69516d849eaecd9557fed831af6a45 started: 1608475334333; exited: 2020-12-20T14:42:15.474194Z; ]
Dec.-21 00:42:19.226 [Task monitor] DEBUG nextflow.processor.TaskRun - Unable to dump output of process 'fastqc (D11_L1)' -- Cause: java.nio.file.NoSuchFileException: /30days/ibar/data/Dingo/Dingo_aDNA_NF_pr
ocess_20_12_2020/90/69516d849eaecd9557fed831af6a45/.command.out
Dec.-21 00:42:19.229 [Task monitor] DEBUG nextflow.processor.TaskRun - Unable to dump error of process 'fastqc (D11_L1)' -- Cause: java.nio.file.NoSuchFileException: /30days/ibar/data/Dingo/Dingo_aDNA_NF_pro
cess_20_12_2020/90/69516d849eaecd9557fed831af6a45/.command.err
Dec.-21 00:42:19.264 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'fastqc (D11_L1)'

Thanks, Ido

jfy133 commented 3 years ago

Hey @IdoBar

Ok good to know! Then I think we can add that to the troubleshooting docs.

From what you describe, it seems like you could be using a misconfigured profile. Could you maybe send the whole .nexflow.log file, the command you used, and the custom profile (if you used)?

Then we can maybe identify the problem. Typically instant crashes are from stuff like not being able to find a container or being sent to the wrong partition

IdoBar commented 3 years ago

Thanks for your reply @jfy133, Please see the Dingo_samples.tsv file, .json parameters file, my custom awoonga.config and the log file (.nextflow.log) in the attached zip file.

The command that I used is:

nextflow run nf-core/eager \
         -r 2.2.2 \
         -params-file Dingo_aDNA.CanFam3.1.bwaaln.gatkug.json \
         -c /home/ibar/.nextflow/awoonga.config

Please note that nextflow is failing the same way when running the test set (-profile test_tsv), so it must be something in the config of singularity/executor.

Many thanks, Ido

jfy133 commented 3 years ago

Hi @IdoBar thanks for the info. I've edited your post to remove the log files now as it included a personal token.

It looks like it's not a container issue as I first thought.

I've looked through the log fie and noticed the following:

Dec.-21 00:42:19.226 [Task monitor] DEBUG nextflow.processor.TaskRun - Unable to dump output of process 'fastqc (D11_L1)' -- Cause: java.nio.file.NoSuchFileException: /30days/ibar/data/Dingo/Dingo_aDNA_NF_process_20_12_2020/90/69516d849eaecd9557fed831af6a45/.command.out
Dec.-21 00:42:19.229 [Task monitor] DEBUG nextflow.processor.TaskRun - Unable to dump error of process 'fastqc (D11_L1)' -- Cause: java.nio.file.NoSuchFileException: /30days/ibar/data/Dingo/Dingo_aDNA_NF_process_20_12_2020/90/69516d849eaecd9557fed831af6a45/.command.err

This would suggest somehow that there is maybe either a missing directory or permissions issue somehow? These are Nextflow specific errors rather than nf-core/eager, as those two files are what Nextflow writes for you.

I lookied in your profile and I also see that you have the boolean for scratch in quotes ('true' rather than true). I don't know if that would make a difference but maybe something to try. You could also try temporarily explicitly specifying the scratch directory as described here, to see if that fixes the issue.

IdoBar commented 3 years ago

Thanks, I tried it and it still fails. Also there seems to be no issues producing the rest of the intermediate files in the same folder (.command.run, .command.sh, .command.log, .command.begin`). Any other ideas? Just to rule that it's a nextflow issue, I'll look for a quick easy workflow to test-run.

Thanks, Ido

jfy133 commented 3 years ago

Not for the moment, it's definitely a Nextflow issue rather than nf-core though (lucky for me :sweat_smile:).

You could also ask on the Nextflow gitter: https://gitter.im/nextflow-io/nextflow.

I'll let you know if I think of anything else!

IdoBar commented 3 years ago

Thanks @jfy133, I figured this out... Apparently my .bashrc was loading the system /etc/bashrc, which in turn loaded a series of system-specific scripts from folder /etc/profile.d/. Seems like one of those scripts was breaking the workflow. I removed those lines from my .bashrc and now the workflow runs well (so far). I'll dig in further to see exactly which of those scripts is causing the error and why.

EDIT This is the offensive script (/etc/profile.d/00-modulepath.sh):

[ -z "$MODULEPATH" ] && [ "$(readlink /etc/alternatives/modules.sh)" = "/usr/share/lmod/lmod/init/profile" -o -f /etc/profile.d/z00_lmod.sh ] && export MODULEPATH=/etc/modulefiles:/usr/share/modulefiles

Many thanks for your help, Ido

jfy133 commented 3 years ago

Glad to hear!

nf-core / eager