`path/PATH`-style aliasing doesn't work through the environment when defined outside of `initial.es`

wryun / es-shell

es: a shell with higher-order functions

http://wryun.github.io/es-shell/

Other

312 stars 25 forks source link

`path/PATH`-style aliasing doesn't work through the environment when defined outside of `initial.es` #127

Open jpco opened 1 week ago

jpco commented 1 week ago

Repro:

; cat ./pair-test.es
#!/usr/local/bin/es

# Set up the foo/FOO pair in the manner of path/PATH.

set-foo = @ {local (set-FOO = ()) FOO = <={%flatten : $*}; result $*}
set-FOO = @ {local (set-foo = ()) foo = <={%fsplit  : $*}; result $*}

noexport = $noexport foo

# Test the foo/FOO pair.

foo = 'one two' three
var foo; var FOO

# Test passing it down the environment.

es -c 'echo in subshell; var foo; var FOO'
; ./pair-test.es 
foo = 'one two' three
FOO = 'one two:three'
in subshell
foo = 
FOO = 'one two:three'

I think this is discussed somewhere in the mailing list... aha, here: http://wryun.github.io/es-shell/mail-archive/msg00837.html

The fact that foo is in noexport is to imitate path, which is due to this: http://wryun.github.io/es-shell/mail-archive/msg00763.html

It's not immediately clear to me why the path/PATH and home/HOME pairs in initial.es work fine but a novel pair doesn't.

Incidentally, for these kinds of pairs, it also doesn't make sense for noexport to be noexport itself, because it will "forget" that foo is noexport in a child shell and then in a grandchild shell, foo will be imported from the environment which can cause the problems motivating path to be noexport.

Motivation: exploring cdpath/CDPATH for #123 as well as ls-colors/LS_COLORS for my own use.

memreflect commented 1 week ago

It's not immediately clear to me why the path/PATH and home/HOME pairs in initial.es work fine but a novel pair doesn't.

es basically executes initial.es, which defines set-home/HOME and set-path/PATH, then the environment gets imported (there's more to it, but that's the gist). With your es -c command, foo is not imported, FOO is imported (there is no set-FOO yet), and set-FOO and set-foo are imported at some point after FOO, meaning foo is never set.

To get the same behavior as es, your script is correct, but you are pretty much looking for the following if you want to import foo using FOO:

es -c 'echo in subshell; FOO = $FOO; var foo; var FOO'

Alternatively, as i said, each new es executes initial.es, so you could split the file and use . foo.es to set foo each time to get the same behavior:

es -c 'echo in subshell; . foo.es; var foo; var FOO'

If that doesn't solve your issue, more context regarding why you're using es -c string inside the file instead of something like . filename will be needed.

jpco commented 1 week ago

es basically executes initial.es, which defines set-home/HOME and set-path/PATH, then the environment gets imported (there's more to it, but that's the gist).

Oh right, of course, the initial.es stuff doesn't use the environment in the same way. That explains it :)

If that doesn't solve your issue, more context regarding why you're using es -c string inside the file instead of something like . filename will be needed.

Well, this is only a repro script; how I'm actually encountering this is

log in with agetty(8)/login(1)
these start a login -es, which runs .esrc, which defines cdpath/CDPATH/etc.
.esrc, having set up the environment, execs a window manager
window manager starts new terminals which start their own es processes

By the time these inferior es processes run, $cdpath has been lost, and $noexport no longer even contains cdpath.

There's certainly the CDPATH = $CDPATH workaround, but I find this rather obnoxious, and it's not clear to me that this workaround was ever expected or intended to be necessary.

The behavior I describe here is also unreliable -- if we change foo/FOO to zoo/ZOO and set noexport = $noexport ZOO instead of noexport = $noexport zoo, then the "right" behavior happens, since set-zoo and set-ZOO are sorted by es before zoo in the environment and are therefore imported first. It definitely seems like the Wrong Thing to have settor invocation at startup depend on which letter the variable starts with!

memreflect commented 1 week ago

There's certainly the CDPATH = $CDPATH workaround, but I find this rather obnoxious, and it's not clear to me that this workaround was ever expected or intended to be necessary.

I agree. It seems like your only options are to export cdpath or leave it unset because if es is not run as a login shell, whether by actually logging in or by using es -l, then .esrc won't be executed, your variables won't be set properly, etc.

POSIX sh uses an environment variable named ENV when an interactive shell is invoked. Whether it's .esrc or .profile, an interactive login shell would need to set that variable and use . $ENV to fully initialize things. That said, i don't particularly like that idea for es. There's no reason why an interactive login shell should have to manually load a file meant for interactive use just because it's a login shell; it should be done automatically.

Instead, maybe %interactive-loop should do something like if {!~ $fn-init ()} {init} prior to starting its actual work, and you would just need an init function? I honestly don't know how things were done back in the day, but this seems like a reasonable option to me, and putting it in %interactive-loop means you don't need to worry about where to place the variable lookup and execution in main().

The behavior I describe here is also unreliable -- if we change foo/FOO to zoo/ZOO and set noexport = $noexport ZOO instead of noexport = $noexport zoo, then the "right" behavior happens, since set-zoo and set-ZOO are sorted by es before zoo in the environment and are therefore imported first. It definitely seems like the Wrong Thing to have settor invocation at startup depend on which letter the variable starts with!

Yes, that's definitely the Wrong Thing.

jpco commented 1 week ago

I see a couple potential options.

We could imitate what dump.c does and load variables in the order (fn-*, set-*, others). This is only a partial fix due to the potential cases of set-fn-* or set-set-* or the like, but it's fairly easy to implement and it's the same as how initial.es works.

The other option is to do what's suggested in http://wryun.github.io/es-shell/mail-archive/msg00837.html:

I believe that during initialization, all variables should be initialized from the environment without trying to call settor functions. Only afterwards should settors be looked for and called, for each variable that was initialized from the environment in the first place.

This is more "complete", allowing things like set-fn-* and set-set-* to work, but it's also a bit trickier to implement, it's not quite symmetric with initial.es, and there's still some degree of sensitivity to the order in which settors are called. I'm not convinced it's worthwhile to pursue this option.

memreflect commented 1 week ago

We could imitate what dump.c does and load variables in the order (fn-*, set-*, others). This is only a partial fix due to the potential cases of set-fn-* or set-set-* or the like, but it's fairly easy to implement and it's the same as how initial.es works.

That's certainly an option, but as you say, it could be potentially error-prone with set-fn-* and set-set-*, leaving you with the same issue all over again.

The other option is to do what's suggested in http://wryun.github.io/es-shell/mail-archive/msg00837.html:

I believe that during initialization, all variables should be initialized from the environment without trying to call settor functions. Only afterwards should settors be looked for and called, for each variable that was initialized from the environment in the first place.

This is more "complete", allowing things like set-fn-* and set-set-* to work, but it's also a bit trickier to implement, it's not quite symmetric with initial.es, and there's still some degree of sensitivity to the order in which settors are called. I'm not convinced it's worthwhile to pursue this option.

Yes, this certainly has problems as well.

I think my %init hook idea might be close to what you're looking for:

; cat ~/.esrc
fn %init {
  . ~/set-foo.es
}

# Execute es without the above %init executing, e.g. `noinit es -c '...'`
fn noinit {
  local (fn-%init = {}) {$*}
}

let (bloop = $fn-%batch-loop
     iloop = $fn-%interactive-loop) {

  fn %interactive-loop {
    # avoid infinite recursion
    local (fn-%batch-loop = $bloop
           fn-%interactive-loop = $iloop) {
      %init
    }
    $iloop
  }

  fn %batch-loop {
    # avoid infinite recursion
    local (fn-%batch-loop = $bloop
           fn-%interactive-loop = $iloop) {
      %init
    }
    $bloop
  }
}   # let (bloop; iloop)

I'm guessing what you're actually seeking is a way to create shared variables that aren't necessarily exported, yet are still usable by all subshells, not just the login shell. That means that after the initialization of foo, you would need to have a way to update it to its current value, which can be obtained from FOO if you don't define foo unconditionally or perhaps even by executing an external file that contains those shared variables similar to how fish's universal variables work. In other words, you'd be able to control the initialization order from a single hook, which is probably what you want.

I'm not saying %init itself is the complete solution to your problem, but it has the potential to be made into a complete solution without modifying the shell at all.

jpco commented 1 week ago

Well, here's how I see it right now: There's this established convention of "es var/UNIX var" that has been used for path, cdpath, and other variables. The way it has been used historically is that the two are kept in sync via settor functions, and the "es" var of the pair is not exported, with the expectation that the UNIX var's settor function will maintain it on startup. Keeping the es var unexported isn't really a goal in itself, it just helps disambiguate which one has "precedence" for inferior shell startup. BUT, the startup behavior this relies on is flaky and mostly broken, so these variable pairs, other than the ones defined in initial.es, can't typically use the pattern successfully.

So based on this, what I really want are these two things:

Consistent, non-flaky, well-defined semantics for the situation of "settor functions on startup".
A universally (in terms of variable pairs defined in both initial.es and .esrc) functional way to do the "es var/UNIX var" pattern.

What you're proposing with %init here works for 2 -- the pattern then would be to define the settor functions and add a

let (i = $fn-%init)
fn %init {
  $i
  VAR = $VAR
}

However, it doesn't quite resolve 1 for me, so I'd still want to either get one of the better-working startup options I mention in https://github.com/wryun/es-shell/issues/127#issuecomment-2403850404, OR alternatively just stop running settor functions on startup at all. That latter choice is certainly not flaky, and it could be argued to be "correct" on the basis that on startup you're just inheriting the environment and not changing any values.

The %init hook certainly "feels" more es-ish than relying on these settor function behaviors, given es' design preference for implementing shell behavior via script in a few hook functions, rather than a lot of magic shell behaviors. HOWEVER, my personal preference is to maintain backwards compatibility, and an %init hook feels pretty dang "infrastructural" to me (especially if it's in front of %batch-loop), so I'd want it to be pretty rare to want to actually edit.

ACTUALLY (forgive my thinking-as-I-type)... in theory we could get a pretty nice compromise of all of the above:

Do not (in C) call settor functions when importing the environment at all
Have an %init hook function called on startup (have to define exactly when it runs in the startup sequence -- before running .esrc, or after? For what I'm proposing here, it might have to be before)
Define %init in initial.es to scan through the set of variables with $&var (with maybe some filtering) and, for any $var that has a settor function, run $var = $$var.

This covers the set-fn-* and set-set-* cases better, since all variables have been imported by the time %init runs (there's still some order-dependency, though I think that's basically unavoidable). It also enables the es-var/UNIX-var pattern to stay in use in its current form, and has the usual benefits of hook functions, while (generally) avoiding requiring users to edit %init very often since its default behavior covers most cases.

Heck, to resolve the question of "before or after .esrc?", we could even make it so that .esrc invocation is itself part of %init (we'd have to signal to %init whether the shell is a login shell to make that work). That sounds like a baby step toward the %main I propose in #79 :)

Is that too wacky? I like the idea a lot, though I also like #79, which is definitely wacky.

memreflect commented 6 days ago

the "es" var of the pair is not exported, with the expectation that the UNIX var's settor function will maintain it on startup. Keeping the es var unexported isn't really a goal in itself, it just helps disambiguate which one has "precedence" for inferior shell startup. BUT, the startup behavior this relies on is flaky and mostly broken, so these variable pairs, other than the ones defined in initial.es, can't typically use the pattern successfully.

The idea of mutually-dependent variables via settor functions is certainly a problem and seemingly new ground as no other shell encourages such a dependency to my knowledge. That said, ast-ksh provides "discipline functions" with multiple verbs like set and get that are similar in concept, so with ast-ksh in mind, why not introduce gettor functions as well? More specifically, they could be used exclusively with es variable names to provide "pseudo-variables" (variables that exist solely as functions of other variables and are never actually defined):

get-path = { %fsplit : $PATH }
set-path = @ { local (set-PATH = ()) PATH = <={%flatten : $*}; result () }

Pros:

Natural es variable semantics are retained from parent shell to child shell.
There is no need to add the es variable name to the noexport list so long as the settor function's result is always an empty list.
A settor function for the UNIX variable name might not be needed at all. This would eliminate the mutual dependency issue caused by having only settor functions (meaning condition 1 would be eliminated) while also fulfilling condition 2.
Simpler to implement than special handling for the imported environment (e.g. not executing settors, delaying execution of settors, etc.), whether to do things before or after .esrc, or some ad-hoc system relying upon an %init hook.

Cons:

UNIX variables cannot be pseudo-variables if you want es variables to remain unexported, so while you can get and set PATH indirectly by getting and setting path, you cannot get and set path indirectly by getting and setting PATH and still have PATH be the one that gets exported (not in the case of this initial design anyway).
It's easy to forget to return an empty list at the end of a settor, resulting in a regular variable instead of a pseudo-variable. This isn't a huge issue since path = /bin /sbin could set both path and PATH, with get-path relying upon the new value of PATH, but it means path could be exported unintentionally, so adding to noexport would still be a good safeguard in case of accidental exposure.
How gettor functions interact with $&var, %var, and var needs to be determined. $&var might not even consider gettor functions, but %var (and consequently var) might check for gettor functions, similar to how $&cd and cd differ; this variant of %var could either be installed out-of-the-box or be distributed as one of the "canonical extensions" proposed in #123.
If both a gettor function and an actual variable are defined, is the variable preferred before the gettor function is used? Either way, the semantics should be the same as for %var lookup if it considers gettor functions.

jpco commented 6 days ago

Ah yes, "getter functions" are reasonably easy to hack into the shell and would provide a way to do these pairs. I've played with them before.

I can't say the idea of putting them in the shell appeals to me much, though. As mentioned when this came up back in the day, es already has a mechanism for dynamically generating a value -- the function. Getter functions are somewhat redundant with both settor functions (when do I modify this value at set time and when do I modify it at reference time? are there cases where I'd want to do both?) and functions (when do I use a variable with a getter function vs. a normal function?) and they create confusing, subtle semantics for variables (when is the value of a variable not the value of that variable, and how do I reason about that?)

I wouldn't want to add getter functions to es without really going through the entire shell and determining how they would be best integrated with existing shell semantics. Unless we were able to prove their worth that way, if we were to go so far as to eschew the settor-based pair for path/PATH I would honestly rather just have a path function defined as %fsplit : $PATH and replace references to $path with <=path.

And then, even after using getter functions for $path, or replacing $path with <=path, you'd STILL be liable to have problems with settors on startup. Consider:

; set-history = @ {echo I am doing something special with the history file; $&sethistory $*}
; history = $history
I am doing something special with the history file
; ./es
; # no call to set-history at startup!

So this isn't even a problem with es/UNIX var pairs; it's just generally a problem with settors (I should have led with that, but I didn't realize it at first). And this isn't all that far-fetched of an example; with readline the history file is automatically created, so it's easy to imagine that someone would want to avoid that by setting in their .esrc

; set-history = @ f {if {~ $#f 1 && access $f} {$&sethistory $f} {result ()}}

which wouldn't work as desired.

So I think that no matter what we wanted with path/PATH, we should make settor-calls-on-startup work better -- and if we did that, then the path/PATH case fixes itself without any additional change.

memreflect commented 5 days ago

After a lot of reflection, i think you were right about not calling imported settors. This behavior is more consistent, is easy to implement, and still allows .esrc to override things on login since it is executed after the environment has been imported. es cannot control the order in which environment variables are imported (not all shells sort the environment), so this feels like the best solution to me.

This unfortunately doesn't resolve the issue you initially raised regarding the inability of foo/FOO to mimic the behavior of path/PATH, but the %init hook previously discussed could accomplish this. I'm thinking of %init as a user hook to be executed at shell startup — whether it's a login shell, an non-login interactive shell, or a script interpreter — not something for es itself to use, so executing a default %init implementation to run delayed settors and/or .esrc isn't a good idea to me.

Speaking of delayed invocation of imported settors, it's too easy to call things in the wrong order because there is no way to know that one settor might call a function, but that function might call another settor, which calls yet another settor. Even discounting the ordering issue, it's still not a good idea:

; history = $home/.histfile
; let (old = $set-history) {
    local (set-history = @ {echo SET-HISTORY $*; $old $*}) {
      es -c 'echo $history'
    }
  }
SET-HISTORY /home/user/.histfile
/home/user/.histfile

I'd be wondering why the new settor called if history was never explicitly set within the context of the new settor and .esrc was not executed because the shell started in that context wasn't executed as a login shell (i.e. no -l flag). It feels like confusing behavior that could lead to needless debugging. Imported settors should not be executed at all unless actual es code after the environment has been imported causes them to be executed.

jpco commented 2 days ago

es cannot control the order in which environment variables are imported (not all shells sort the environment)

Well, this is true, but I disagree a bit with the implication. Es can't control the environment it's given, but it's under no obligation to respect the environment's ordering, so if the problem with settors on startup could be reduced to "order X works but the environment isn't guaranteed to follow it", then we could just enforce that order at import time and be golden.

Speaking of delayed invocation of imported settors, it's too easy to call things in the wrong order because there is no way to know that one settor might call a function, but that function might call another settor, which calls yet another settor.

Yup, this is a problem. Sadly, it's not even all that far-fetched, given invoking a binary with a relative name calls %pathsearch, which uses $path, which is set to its non-default value only if set-PATH has been invoked. (The fact that path defaults to /usr/ucb /usr/bin /bin from initial.es means things still mostly work in this case, but it's sketchy).

It's interesting to me how little this has been a practical issue for folks in the past. I suppose it's really because settor functions are really only used for path/PATH pairs or for primitives like $&setsignals. (I imagine it might also be because es sorts capitalized words ahead of lowercase.)

As far as your core suggestion here (no settors + %init), it makes some sense to me. It's the only option we've discussed that has no ordering problems or flakiness at all. However, I'll reiterate my desire from earlier that I would want something that works the same between initial.es settors and .esrc settors, and unease with the backwards-incompatibility of it. I'm not against the idea, but I'm not sure I have a confident enough idea of it to implement it at the moment.

In the meantime I have a PR for the more simple/backwards-compatible fix (delayed settors at startup) done. It's simpler than I initially thought. I'll send that out. I don't think doing the simple fix blocks the more thorough, backwards-incompatible change in the future. (Maybe at that point this issue should be converted to a discussion?)