nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.75k stars 629 forks source link

Unexpected behaviour when NXF_HOME contains a space character #4360

Closed SamStudio8 closed 1 year ago

SamStudio8 commented 1 year ago

Bug report

Expected behavior

NXF_HOME can contain a space character and nextflow will run without error.

Actual behavior

Steps to reproduce the problem

export NXF_HOME='/tmp/my _nextflow _home'
export CAPSULE_LOG=none
rm -rf '/tmp/my _nextflow _home' # clean slate
echo "Trying to run nextflow with NXF_HOME='${NXF_HOME}' (init)"
nextflow
echo "Trying to run nextflow with NXF_HOME='${NXF_HOME}' (from classpath)"
nextflow | head -n1
echo "Found some unexpected directories..."
find /tmp/my _nextflow _home

Program output

Trying to run nextflow with NXF_HOME='/tmp/my _nextflow _home' (init)
Error: Could not find or load main class _nextflow
Caused by: java.lang.ClassNotFoundException: _nextflow
Trying to run nextflow with NXF_HOME='/tmp/my _nextflow _home' (from classpath)
Usage: nextflow [options] COMMAND [arg...]
Found some unexpected directories...
/tmp/my
_nextflow
_home
_home/tmp
_home/tmp/ver

Environment

Repros on Mac

and Ubuntu:

Additional context

When Nextflow is run for the first time, it attempts to generate a file containing the full Java command required to launch Nextflow with its options and the classpath for dependencies installed by Capsule. This is achieved by running the capsule and capturing the classpath information and writing it to the LAUNCH_FILE path.

If NXF_HOME has spaces, so too will the generated classpath. The result of the process substitution used to collect the options and classpath by calling JAVA_CMD with JAVA_OPTS on the NXF_JAR is word split into the cli array; breaking spaced components into separate elements. The Java cmd_base is found with a regex on cli, and everything else ends up in cmd_tail (still split by words). The cmd_tail is eventually appended to the launcher array.

The launcher array is used by launch_nextflow to generate the cmdline string that will be exec'd to actually run Nextflow for the first time. Due to previous problems with bash double quotes leading to "no such variable" errors, the launch_nextflow function strips double quotations from elements of launcher, breaking the escaping of classpath elements with spaces when they are concatenated into the cmdline string. This leads to -classpath containing the argument _nextflow and causing the ClassNotFoundException.

On the second and subsequent invocations of Nextflow, the launcher array is instead populated by the contents of the previously generated LAUNCH_FILE (if it exists). The crucial difference in this case is that when the contents of the LAUNCH_FILE are read into the launcher array, the quoted elements are not broken on words. The surrounding quotes of each element are still removed by launch_nextflow, but this does not matter because both the addition of the element to cmdline and the call to exec are correctly quoted later.

Additionally, the superfluous directories are created when mkdir is called on the unquoted result of dirname to set up the JAVA_KEY dir.

Patch for inspiration

As a means to expedite a fix, the following patch offers some inspiration to fix the above behaviour by:

It is worth noting that although the %q conversion operator appears to be well supported, it is not portable according to POSIX. In practice, I have not found this to be a problem, but I don't like it. Hopefully someone with more sh-fu than me has a better idea.

--- nextflow  2023-09-28 15:42:01
+++ nextflow  2023-09-28 15:48:06
@@ -317,7 +317,7 @@
   if [[ ! $JAVA_VER =~ ^(11|12|13|14|15|16|17|18|19|20) ]]; then
       echo_yellow "NOTE: Nextflow is not tested with Java $JAVA_VER -- It's recommended the use of version 11 up to 20\n"
   fi
-  mkdir -p $(dirname "$JAVA_KEY")
+  mkdir -p "$(dirname "$JAVA_KEY")"
   [[ -f $JAVA_VER ]] && echo $JAVA_VER > "$JAVA_KEY"
 fi

@@ -402,7 +402,7 @@
     cmd_pattern='"([^"]*)"(.*)'
     [[ "${cli[@]}" =~ $cmd_pattern ]]
     cmd_base=(${BASH_REMATCH[1]})
-    cmd_tail=(${BASH_REMATCH[2]})
+    declare -a cmd_tail="(${BASH_REMATCH[2]})"

     if [[ "$JAVA_VER" =~ ^(9|10|11|12|13|14|15|16|17|18|19|20) ]]; then
       launcher="${cmd_base[@]}"
@@ -434,7 +434,7 @@
     if mkdir -p "${NXF_LAUNCHER}" 2>/dev/null; then
         STR=''
         for x in "${launcher[@]}"; do
-        [[ "$x" != "\"-Duser.dir=$PWD\"" ]] && STR+="$x "
+        [[ "$x" != "\"-Duser.dir=$PWD\"" ]] && STR+=$(printf "%q " "$x")
         done
         printf "$STR">"$LAUNCH_FILE"
     else
pditommaso commented 1 year ago

Space characters in the home path

pditommaso commented 1 year ago

(sorry for closing I was too depressed 😆)

SamStudio8 commented 1 year ago

😹 I should have perhaps added that the context behind this is that WSL creates home dirs somewhere impractical for the uninitiated user. We encountered this problem when trying to set a more sensible NXF_HOME for our friends on WSL whose IT departments have carelessly provisioned their usernames with spaces in them. As much as spaces in a file path makes me feel queasy, it would be great to patch this to better support WSL users.

pditommaso commented 1 year ago

Anyhow, thanks for the excellent issue report 💪

bentsherman commented 1 year ago