I think we should do #4 in the short term, but in the longer term, I'd like us to explicitly accept spaces and split arguments in some fashion in a reliable cross-platform way.
I'm thinking a bit about why we need relexec to exist. One way to describe relexec is that it's a tool like env in the common use of #!/usr/bin/env python3 etc., in that it sits at a well-known path and abstracts over not knowing the exact path to your real interpreter in advance. It just has different semantics about how it finds the absolute path; env looks at $PATH but relexec looks relative to the script.
Arguably, if env had supported relative paths, there would be no need for relexec to exist. You could just have done #!/usr/bin/env --relative python3. But that doesn't work, because env also doesn't split on spaces. (On other kernels like xnu, that would actually work.)
Also, it's sometimes useful to be able to pass environment variables to processes - think e.g. PYTHONUNBUFFERED or PYTHONIOENCODING (which has no command-line equivalent). We also have at least one internal use case where we need to set environment variables to place build dependencies on PYTHONPATH. But #!/usr/bin/env cannot do this, at least on Linux, despite that kind of being env's whole purpose.
And, of course, you cannot pass arguments to the command in question like python -u.
I think we should do something like the following:
Read in the shebang from the file to work around any kernel splitting on spaces, truncation, etc., for the reasons described in https://github.com/twosigma/relexec/issues/4#issuecomment-852100297. If the shebang is longer than 4096 bytes (i.e. if there is no newline in the first 4096 bytes), error.
Word-split it with POSIX sh quoting rules, ignoring the variable expansion rules, namely:
A backslash escapes the next character
A single quote quotes everything until the next single quote
A double quote quotes everything until the next double quote, except that a backslash escapes the next double quote or backslash
If there is no character following a backslash or no closing quote for an opening quote, error
Split on whitespace
For each token,
If it is the exact string --, discard it, then ignore this rule and the next two for all future tokens
If it starts with a -, treat it as an option to relexec (of which there are none currently defined and so this produces an error)
If it contains an =, treat it as an environment variable assignment
Otherwise add it to argv
Locate argv[0] relative to the script, and execve(relative_argv0, argv)
That lets us subsume the behavior of env and also gives us an affordance for future extensions (e.g., expanding ~, expanding variables, adding wrappers that are not found relatively, etc.) via command-line options.
In particular, at this point, #!/usr/lib/relexec PYTHONIOENCODING=ebcdic python3 and #!/usr/lib/relexec python3 -u will both work, and #!/usr/lib/relexec --something python is defined as an error, so we can redefine it later without breaking compatibility.
I think we should do #4 in the short term, but in the longer term, I'd like us to explicitly accept spaces and split arguments in some fashion in a reliable cross-platform way.
I'm thinking a bit about why we need
relexec
to exist. One way to describerelexec
is that it's a tool likeenv
in the common use of#!/usr/bin/env python3
etc., in that it sits at a well-known path and abstracts over not knowing the exact path to your real interpreter in advance. It just has different semantics about how it finds the absolute path;env
looks at$PATH
butrelexec
looks relative to the script.Arguably, if
env
had supported relative paths, there would be no need forrelexec
to exist. You could just have done#!/usr/bin/env --relative python3
. But that doesn't work, becauseenv
also doesn't split on spaces. (On other kernels like xnu, that would actually work.)Also, it's sometimes useful to be able to pass environment variables to processes - think e.g.
PYTHONUNBUFFERED
orPYTHONIOENCODING
(which has no command-line equivalent). We also have at least one internal use case where we need to set environment variables to place build dependencies onPYTHONPATH
. But#!/usr/bin/env
cannot do this, at least on Linux, despite that kind of beingenv
's whole purpose.And, of course, you cannot pass arguments to the command in question like
python -u
.I think we should do something like the following:
--
, discard it, then ignore this rule and the next two for all future tokens-
, treat it as an option torelexec
(of which there are none currently defined and so this produces an error)=
, treat it as an environment variable assignmentargv
argv[0]
relative to the script, andexecve(relative_argv0, argv)
That lets us subsume the behavior of
env
and also gives us an affordance for future extensions (e.g., expanding~
, expanding variables, adding wrappers that are not found relatively, etc.) via command-line options.In particular, at this point,
#!/usr/lib/relexec PYTHONIOENCODING=ebcdic python3
and#!/usr/lib/relexec python3 -u
will both work, and#!/usr/lib/relexec --something python
is defined as an error, so we can redefine it later without breaking compatibility.