Open bar-g opened 9 months ago
Something like local entail state_var
might also work. Outside of osh it would just declare an additional local variable called entail
without any further semantics.
local no_strict_vars my_current_nesting_depth
Or, possibly even giving room for compatible fall-back 'shellcheck's or similar: local _entail_myvar myvar
So, mutation in non-local conext would be ok if the same variable name also exists with some specific allowance string added?
Hm I'm confused by what the problem is here
YSH has var, const, setvar, and setglobal
It doesn't have dynamic scope -- in procs shopt --unset dynamic_scope
is automatically applied
Ah, I see you're deep in ysh at the moment ;-). I was mainly thinking compatible osh mode (not ysh).
If I'm not mistaken it should be possible to program safely with dynamaic scope. In practice, by always using local vars in functions by default, and being mindful of exceptions (where not to shadow dynamic vars with a local one, but to actually use a local variable from a calling function). Global vars are still discouraged. Using/allowing some limited dynamic scopes of local vars can actually avoid using global vars.
I think these programing safeguard principles could be nicely checked for shell compatible osh scripts by a corresponding strict_vars option.
[To further the limited scope (and external usage) at least to some more deliberate (visual) code formatting rules, it could also be considered to only allow dynamically shared vars (i.e. those declared with something like 'local entail_myvar myvar') to be accessed only by nested functions, or sub-functions whose name starts with the parent function' name and two underscores (or just the two underscores as abbreviation).]
Because dynamic scope allows to avoid having to pass a lot of values around, and should be save within a controlled scope of sub-functions) it would seem to make sense to me to support its safe usage in shell scripting. At least in osh.
In ysh dynamic scope is already off completely as general default, but there are still merits of dynamic scope as an opt-in, e.g. seen if reading through: https://langdev.stackexchange.com/questions/253/what-are-the-pros-and-cons-of-dynamic-scoping
That's why I mentioned entailvar x = 'value'
as a limited dynamic scope opt-in idea for ysh procs and funcs.
I found https://www.oilshell.org/release/latest/doc/variables.html#appendix-a-more-on-shell-vs-ysh somehow confusing.
A minor thing is that it's assigning capitalized values to globals instead of having globals with capitalized names.
But why is there this implicit setvar newglobal = 'G' # create new global
, and not something like global G = 'value'
to only explicitly create a new global (later mutated by setglobal
)?
EDIT: or newglobal G = 'value'
or globalvar G = 'value'
It's for osh (and other shells). To avoid requiring to manually implement output reference vars for each and every function in a "module" just to be safe.
Instead only having to implement output variables to pass values between "modules" (complying with strict_vars in osh), while keeping each module simple, working with independent local vars. Well, actually even local within each function by default, but allowing the mindful usage of local dynamic scopes within modules, while staying safe when one module calls another.
I think the only remaining point of possible variable naming conflicts would then be the name of the referenced output variable, but using module names in these should be easy enough without being a burden here, to avoid conflicts. And maybe an actual conflict of these selected variables could also be simple enough to detect, e.g. if the referenced variable stays unchanged from its initial "non-value", or remains uninitialized.
Oh, strict_vars would also a be good intermediate step to migrate to procs/funcs.
Maybe even easier:
Requiring explicit allowance for inherited variables, e.g. a corresponding formal "allowance-variable" declaration like local _inherit_<myvar>
within each function that should be allowed to use a non-local <myvar>
.
(Instead of marking <myvar>
only once at the entailing function as pictured above.)
Technically, it may be correct that shopt --set ysh:upgrade
itself does not break much, yet typical shell code will not work when switching shell function definitions to procs, because of the disabled dynamic scope.
It's that breakage that stict_vars is helpful for to manage in preparational steps to keep the final changes necessary to use procs minimal.
I've updated the problem/solution explanation in the description.
If I'm not mistaken it should be possible to program safely with dynamaic scope. In practice, by always using local vars in functions by default, and being mindful of exceptions (where not to shadow dynamic vars with a local one, but to actually use a local variable from a calling function). Global vars are still discouraged. Using/allowing some limited dynamic scopes of local vars can actually avoid using global vars.
I think these programing safeguard principles could be nicely checked for shell compatible osh scripts by a corresponding strict_vars option.
[To further the limited scope (and external usage) at least to some more deliberate (visual) code formatting rules, it could also be considered to only allow dynamically shared vars (i.e. those declared with something like 'local entail_myvar myvar') to be accessed only by nested functions, or sub-functions whose name starts with the parent function' name and two underscores (or just the two underscores as abbreviation).]
Because dynamic scope allows to avoid having to pass a lot of values around, and should be save within a controlled scope of sub-functions) it would seem to make sense to me to support its safe usage in shell scripting. At least in osh.
I don't really agree with these principles
https://github.com/oilshell/oil/wiki/Language-Design-Principles
I would say any kind of proposal like this should be accompanied with an actual shell program (or 2 or 3 ...) that has a problem, and an example of how the proposed feature would make it better / clearer / more familiar to users
I'll see if I can condense to principles of problems and solutions. Examples, I guess, would then be useful to show where conceived principles may not hold (yet).
Concerning the side-topic of ysh implicit global assignments:
Not knowing python, it seems to use a global
keyword against this, but still with pitfalls according to follow ups at: https://www.reddit.com/r/learnpython/comments/ubvzeo/comment/i671eiy/
global variables problems
Using dynamic scope with all functions always declaring variables as local by default is similar to static (lexical) scope, but provides a better alternative than globals to optionally share some variables, i.e. as scoped variables with limited access, which can only be accessed by those subfunctions that are executed by the function that declares the dynamic scope variable.
External code that executes a function that creates a local dynamic scope can not mess with the dynamic scope variables.
And functions can safely protect all outer globals, and thus the dynamic scope of their calling functions, simply by always explicitly declaring all variables they use as local. This is not the default with conventional shells, but could be enforced by a strict_vars option.
As long as a function is only calling functions within its own file or module, and no external libs, all variable access stays completely in the authors own control.
And somewhat more stricter interpreter errors should allow the author to always maintain the code written explicitly enough to be always in the position to be aware of and stay on top of all (only intentionally createable) variable dependencies.
The strict_vars requirements that are placed out for discussion are:
local var=$var
)To deviate from this in order to allow intentional shared use of a variable within a limited dynamic scope:
local _entailed_var var
? (simple enough to check for the interpreter)local _inherit_var
.From what I gathered the requirements should lead to safe code, and improve the ability to reason about the code, in order to arrive at safer, more modular code.
Some code examples of problems and explanation of the "locals only" approach can be found at https://www.binaryphile.com/bash/2018/09/01/approach-bash-like-a-developer-part-20-scoping
Picturing a viable migration path from arbitrary "messy" shell code to strict_vars, and then very well possibly to ysh at the end...
After enabling strict_vars:
I'm not sure if strict_vars is better to emit errors or just warnings, to ease adjusting code step by step or in chunks.
In the principled osh vs. ysh comparison strict_vars
would be in between. At the same time maintaining compatibility with other shells and fostering improving the code structure towards safety and compatibility with ysh priciples. Whereas say switching a shell function to a proc effectively breaks both existing shell code, and shell compatibility, quite hard in the general case (due to disabling the shell's default dynamic_scope).
The basic encapsulation comes by simply using lines like local var=$var
(as of some helpful basic strict_vars requirement explaining error messages) at all places where "in-variables" (opposite of "out-variables") are enough.
Let's think of a generic example, maybe some of the shell scripts used to test osh compatibility.
When turning a shell function into a proc, there will be all sorts of breakages, requiring changing the code into ysh. Just one example, the "POSIX shell arithmetic isn't allowed (parse_sh_arith)". Whereas a strict_vars
(in osh) would instead allow the user to much better focus on and fix the important code structure, more precisely its variable data structure usage, first, before having to deal with many other (minor) issues. (Thus improving existing code and migrating in a more structured way.)
Let me try to explain why I'm not sure if having strict errors or just warnings for strict_vars
would be better.
Assume enabling strict_vars for some shell function. Running it may in general depend on many dynamic scope vars, but would it always be desired or necessary to fulfill the strict requirements for each and every variable? And thus emit hard errors? Especially if calling (external?) code that can not be migrated just yet. It may be needed (possible?) to cover and work well with some subset of functions or variables called by these functions, and others staying unchanged. (Or simply emit only warnings to stay on the less breaking side?)
As far as I can oversee this, the strict_vars requirement check might merely be something like: Is each <var>
a local variable, or does a corresponding local variable exist with the same name prepended by _inherit_<var>
, else error or warning. Explicitily checking for a corresponding _entailed_<var>
at all places that call the function may still be a useful addition to restrict dynamic scope variable use only to where really explicitly called for.
It's true there will be breakages when turning shell functions into procs
But I think adding more complexity isn't going to help
I still would like to see an actual example of a program that has a problem, and where the upgrade has a problem
And how the solution would make the upgrade better
I don't see that here
I don't think most users will upgrade OSH to YSH, although I think some critical ones will
The people who upgrade will tend to be the "experts" who already have some shell knowledge.
The people who upgrade will tend to be the "experts" who already have some shell knowledge.
The strict options have much more to it I think. Upgrading of larger code bases is only a part of it. I think I can answer this from a "non-expert" point of view.
Ordinary shell just holds so many pitfalls it's better to avoid it (yes, even for those not knowing python or other). But for some things "shell" just still is the "best" or only available tool, and using shellcheck + "strict osh" is the only viable way (I found) to steer clear of the pitfalls as best as possible.
However, I learned it's not enough since the dynamic scope can not yet be dealt with an osh strict option, and it's still too easy to hit that pitfall for non-expert. Only afterwards I noticed it fixed in ysh (but the hard way, not yet allowing the ease of limited dynamic scope for where it makes sense in a shell).
But I think adding more complexity isn't going to help
I'm not sure where you see the complexity, here. (Do you see the proposed variable checks as being complex?) For the user I think the created awareness, and stricter variable usage brings a great deal of reduced complexity and complication: If one uses strict_all and is made aware as soon as taking a bad step. Or, if one is able to switch on one strict option after the other, and learn of and be pointed to deal with one topic at at time.
I think it's way preferable, and recommendable, to learn taking first steps with shell using strict osh (and shellcheck) than any other shell. And those were my practical examples, file copying and date adjusting functions. Even if I ended up re-copying the whole thing using another method.
So your're right that I'm myself currently not "ysh:upgrade"ing anything for real. Rather trying things out to learn if new features make worth using ysh, and wasn't even really aware of the dynamic vs static scope before.
But strict_vars is not only about easing upgrades through better, more precise error messages, it's also the error preventing (strict) messages and understanding/learning experience for non-expert users.
(Nonetheless, after shell code fulfills the strict_vars requirement (and non-experts were pointed to and understood to deal with non-local variables), I think using and learning procs does become way easier. (The then obsolete local variable definitions within procs wouldn't even have to be hard errors, and could be just warnings.)
Here is the example from the page that I linked, and with oils you can see that after adapting the syntax to ysh the output result changes!
Ideally strict_vars
(as part of strict_all and ysh:upgrade) could bring up this issue or pitfall before and even after changing the syntax.
shopt -s ysh:upgrade
shopt -s redefine_proc_func
echo "Unmodified shell code with ysh:upgrade:"
outer_function () {
local lvar
lvar=outer
inner_function
echo $lvar
}
inner_function () {
lvar=inner
}
outer_function
#############
echo "Adapted code with ysh:upgrade:"
proc outer_function () {
#local lvar
var lvar
#lvar=outer
setvar lvar = "outer"
inner_function
echo $lvar
}
proc inner_function () {
#lvar=inner
var lvar
setvar lvar = "inner"
}
outer_function
############
echo "Adapted code with ysh:all:"
shopt -s ysh:all
shopt -s redefine_proc_func
proc outer_function () {
#local lvar
var lvar
#lvar=outer
setvar lvar = "outer"
inner_function
echo $lvar
}
proc inner_function () {
#lvar=inner
var lvar
setvar lvar = "inner"
}
outer_function
The output of the example script above is:
Unmodified shell code with ysh:upgrade:
inner
Adapted code with ysh:upgrade:
outer
Adapted code with ysh:all:
outer
Interpretation: It's a given that the first output, i.e. "inner", is the correct one, because shell code uses dynamic scope by default.
Still, the third output is technically also correct, because ysh code does not use dynamic scope by default.
So we are seeing how a working shell script is effectively broken when adapting the syntax, without the user ever getting an error or notice about it.
I think the solution lies in ysh:upgrade providing error messages (or warnings) instead (or with) the first output, i.e. by having the ysh:upgrade option group include --set strict_vars
.
Some example output draft:
Unmodified shell code with ysh:upgrade:
[strict_vars error]
inner_function () {
lvar=inner
^ assigning to a non-local variable (and lvar is not a designated 'entailed_*' dynamic scope variable)
To avoid depending on dynamic scope (it's disabled in ysh) while maintaining the code's logic:
posix: return the value in stdout, if function is fine to be called non-interactively in command substitution subshell
posix: take passed reference, e.g. as 'out_var="$1"', and return value with 'echo "$return_value" | read -r "$out_var"'
bash: define a nested `local -n out_var="$1"' and return the value with 'out_var="$return_value"'
However, strict_vars
would also be very helpful to foster safe (and only explicit dynamic scope) shell varible usage and thus better interoperabilty and reasoning with the stricter shell code in general. (According to the link above.)
This is about the "upgrade path from bash to a better language and runtime", and in the FAQ you're calling out for a smooth upgrade (emphasis from there). :-)
Forgot to mention, that I think the most efficient idea to "mark" dynamic scope variables as explcicitly 'entailed' to called functions could be to simply check (enforce) a convention, like variable name starting with 'entailed_*'. (It's explicit, and visible everywhere where the variable is accessed.)
On the other hand, maybe it's ok to not allow any direct access to dynamic scope varibales at all with strict_vars? So that there will always be an error or warning as long as shell code won't work in the same way with dynamic_scope disabled?
I think we are following our principles and using different syntax for different meanings
So the breakage doesn't happen automatically when you ysh:upgrade
It only comes when you actually change the source code, which seems reasonable.
My feeling is that many shell programmers don't explicitly rely on dynamic scope. Only the real experts know about it and use it.
But I think we may be missing some docs that talk about this
Could be here:
https://www.oilshell.org/release/0.19.0/doc/ysh-vs-shell.html
and here:
https://www.oilshell.org/release/0.19.0/doc/upgrade-breakage.html -- hm
My feeling is that many shell programmers don't explicitly rely on dynamic scope. Only the real experts know about it and use it.
Yes, could be, as explicit use is an expert method to avoid hitting the pitfalls. (Enforcing explicit use would a nice by-product of the strict_vars idea.)
Unfortunately, I think a vast amount of shell scripters rely implicitly on the dynamic scope, especially, non-pros without knowing. (And that's were many problems arise, and the notion of "pitfalls" comes from.) They think (I did) something like: All variables are simply always available in a script and called functions, and subfunctions "just work" without bothering with passing and returning values. (Not much programming background, just working with their collections of shell commands to execute.) So ysh breaks their scripts, and how should they know that some remark about dynamic scope in the docs is relevant for the breaking of the shell script that worked perfectly all the time before.
Another, possibly simpler?, idea to implement error messages when the general logic of existing shell code is getting broken by migrating to ysh syntax: Could it be based just on a modification to the shopt dynamic_scope?
Could a disabled dynamic_scope emit error messages, e.g. for shell variable assignments outside of procs and top-level, and their use? Then ysh:upgrade could already unset the dynamic_scope globally and shell functions would produce errors, where they rely on it.
Though, I don't know if a separate strict_vars option with a couple of dynamic checks couldn't be easier to implement and maintain and also provide the benefit of also allowing for strict "compat" shell code.
It [breakage] only comes when you actually change the source code, which seems reasonable.
With the creditable focus on fixing shell error handling and pitfalls, and this being implemented and working so great already for other areas, however, I think it would still also seem reasonable to expect, and be really great, if some errors would point to where shopt -u dynamic_scope
, or a "middle" shopt -s strict_vars
does "break", or would actually otherwise silently change the working logic, when migrating existing scripts.
(Users already have several strict_* and other migration shopts emit failures as the backbone for adapting their scripts, just as you have the test driven development errors as backbone for modifying osh and ysh itself.)
s/strict_vars/strict_scope ?
Thought about
strict_vars
, or ratherstrict_scope
?, as a compatible osh option, to ensure conflict safe variable usage.This strict option serves two purposes:
Enforcing safely isolated dynamically scoped shell code. This prevents the pitfalls of sloppy variable usage under the default dynamic scope behavior. (By throwing an error on all variable assignments that would implicitly create new globals (or overwrite existing ones) in functions (i.e. all non-local variables that have not been explicitly "entailed" to called functions).
[From a practical attempt to migrate a script to using procs.] Support the user to migrate shell code in successive, controlled, intermediate steps into statically scoped ysh procs/funcs. For example, by enabling strict_vars for parts of existing code, which first maks variable usage explicitly writen out, then weeding out unsafe global and unnecessary dynamic scope variable usages by changing the use of these variables into passing args in, and strings back using stdout, or
read -r $var <<-
to set referenced variables. All the while refactoring running code in small (error message controlled) steps until a shell function is ready to be changed into a proc (i.e. when using only local vars plus out-vars). Without strict_vars, changing a function into a proc means not being able to run the code anymore until every aspect of the code has been successfully refactored blindly, without any interpreter error messages helping with the required refactoring of existing code all over the place.The pictured method is simply to require the user to please always explicitly define all variables, as local or (discouraged) global, better just local on a higher level before mutating is allowed, together with some explicit annotation or definitins to allow for entailing/inheriting a variable in called functions (which inherit the variable, and more importantly will then be allowed mutating the variable without throwing an error).
Ideas to (compatibly) allow mutating (entail?) variables for called shell functions that inherit the variable upon execution could be special comments:
"semantic comment" ideas for the declaring (entailing) function:
or
local state_var #strict_var entail
[EDIT: Idea added from below:]
Alternatively, an idea for declaring in all inheriting functions:
A formal "allowance-variable" declaration like
local _inherit_<myvar>
within each function that should then be allowed to use a non-local<myvar>
.And for ysh:
[Also see YSH comment below] Maybe disallow that implicit globals creation as well?: https://www.oilshell.org/release/latest/doc/variables.html#appendix-a-more-on-shell-vs-ysh (
setvar newglobal = 'G'
)Support explicit definitions like
entailvar x = 'value'
? To opt-in to the shell simplicity of (local!) dynamic scope variables? (alternative to nesting procs/funcs?)