bar-g commented 9 months ago

Successor issue (but lacking the overview/tables):

https://github.com/oilshell/oil/issues/1831

Discussion: https://oilshell.zulipchat.com/#narrow/stream/433024-low-priority/topic/value.2EPlace.20feedback.20.2F.20integrate.20rebinding.20and.20mutability

Basic Problem:

The same setvar b = a assignment syntax does different things, depending on the type of the variable.

Immutable (atomic) variables like Bool, Int, Str show value-copy-behavior when assigned (new variable can be modified independently).
Mutable[container] variables show pointer-assignment-behavior when assigned (changes made to new variable are actually applied to the original value and affect all variables pointing to this value [container]).

Unlike with "atom" data types, when mutable[containers] are passed to procs/funcs, and they assign new values to their local variables the changes are applied to the outside environment -- without any noticeable syntax difference within the proc/func -- so they are not confined, they change the original outside-values.

Discussed:

New option `strict_type_interaction`

Table of Behavior (Overview)

Recognizable Mutating/Aliasing Behavior	Places	Examples
expressions: `->` is mutating call (chain)	`call var->myFunc('update')`
procs/funcs: that don't take any args may use a `->`-suffix in their name to denote mutating	`if (error) { reset-app-> }`
at definition-site: specific variable behavior prefixes that require types	`proc apply-update-to (; x, ->y )`
within proc/func: (strict) only the `set->` keyword is allowed to mutate an outer ->passed_var (Place or mutable), and returns error if var was not required as `->` or `?` in proc/func definition	`set-> passed_var[key] = 'value'` (implicit `->` on the passed_var) (`setplace` for local places)
at call-site: variable behavior prefixes (as required by definition-site)	`... \\| read --lines (->x) ; apply-update-to ( x, ->y )`

List of Variable-Passing Behavior-Prefixes (implicit type-restrictions)

Assignment/dereferencing behavior to request or indicate	Variable with Prefix	Implicitly restricts data-type to
read-only (default)	`var` (no prefix)	(no restrictions)
value-copy / none	`:var`	(no restrictions, deepcopy for mutables)
pointer-assignment / dereference (pointing/mutating)	`->var`	Place (auto-created if not Place)
pointer-assignment / dereference without type-changes (insignificant special explicit case, could be interpreter auto-optimization of `->`)	`->:var`	mutable containers (not rebindable)
inconsistent / type-dependent (special library/meta-programing case to work with dynamic types) [previous behavior]	`?var`	(no restrictions)

List of Apparent Assignment Behaviors

Behavior	Assignment Operator
value-copy-assignment (the most common, and default behavior)	= (atoms only)
value-copy-assignment (special case for mutables)	=: (mutables only)
pointer-assignment (mostly only initial List/Dict definitions, possibly tree-node-travel/filter-collections	->
inconsistent / type-dependent (special case to work with dynamic types, the legacy default)	=*

Short Documentation of Affected Syntax

At the proc/func definition-site:

Typed parameter prefixes (for access behavior / type specification):
(no prefix) just read-only passing of atoms and mutables (with assign/copy-behavior),
             on `setvar` write access error: advice about the following prefixes:
`:` "assign/copy-behavior" standard passing for atoms, deepcopy() for mutables,
             (usual for atoms, slow and rarely used for mutables)
`->` "pointer/reference"
    accept the catch-all "place", proc/func will alter any passed value type outside of it
             (even allows re-binds changing the type itself)
`->:` "pointer/reference without type-changes" 
    accept only mutables (lists/dicts) directly, i.e. all mutable[containers], forbid atoms
              (disallows changing of referenced type, only mutating the contained data is possible)
`?` "accept any" (just as it was before), inconsistent behavior that depends on type

Type annotations:
Make value.Place transparent to type specifications, otherwise have, e.g.
(->var1 Dict<-) to mean that var1 needs to be a "Dict pointed to", in this case by a value.Place.

At the proc/func call-site:

Typed argument prefixes (for variable preparation/manipulation/syntax-checks):
(no prefix) shows proc/func is only accepting atoms (separate-copy behavior) 
`->` "pointer/reference" (mutating-original behavior) create or accept place
`->:` required to "make it obvious" that proc/func required a pointer/reference whose type can 
                  not change (mutating[container])
                  (mainly no-op, but parser? could check if proc/func actually still requires it,
                    possibly even auto-update source accordingly?).
`?` required to "make it obvious" that proc/func allows any variable type as is
                   (mainly no-op, but parser? could check if proc/func actually still allows it,
                     possibly even auto-update source accordingly?).

Specifying defaults in the proc/func definition:

# Assigning default values
param=default # (simple read-only assignment) (for atoms *and* mutables)
              # solves the "mutable defaults" wart in python (without breaking the
              # workaround to act on specified null defaults)

param=:default # ensure (re)assign/copy-at-call-time behavior (for atoms *and* mutables)
              # no "mutable defaults" wart due to copy behavior

param->default # (value.Place/pointer) always create place (even if mutable containers could do without)
              # no "mutable defaults" wart due to apparent aliasing/pointing behavior

# not needed? param->:default   # allows only mutable container (directly, no place)
# not needed: param=*default    # not supporting this avoids the unexpected mutable default pitfall

Assignment keywords

set-> passed_var =   # only way to mutate an outside container or place (left accepts only passed mutable or place)
                     # * `->`-prefix on the left hand side is always implicit
                     # * support 'set-> foo[...] =' index usage for write access even if value.Place
                     # * support '$[foo[...]]' index usage for read access even for value.Place
setplace local_var = # 'setplace' is an alias of set-> for in-func/in-proc and top-level assignments
                     # to "local" (not outside) places (implicit `->`-prefix on the left hand side)
                     # members of local lists/dicts must be set normally with `setvar mydict[key] = `

setvar local_var =         # to assign only immutable (atoms) to the variable, on error: advise about the following:
setvar local_var =:        # syntax sugar for deepcopy()
setvar local_var ->        # to assign a mutable container or place (depends on right hand side)
setvar local_var =*        # to assign without type restriction (as python/before, inconsistent behavior to be expected)

Let `set->`, `setplace`, `setvar`, `var`, and `const` all exit with error status > 0, if `=` is
used for places and mutables, and mention in the error message to use `->` or `=*` instead.

discouraged/not needed:

# no:
call assigned_local_var->setValue('immutable-value-type')  # passed_var place can't be read directly 
                                                           # like passed mutable container
# yes
set-> assigned_local_var = 'any-value-type'  # rebind, same syntax for place or mutable ('setplace' for within-scope)
set-> var[key] = 'value'                     # mutate, no rebind
                                             # note: implicit '->' on the left, to avoid redundancy,
                                             # allows assigning atoms, as well mutable[containers] with `->`,
                                             # a right hand side prefixed with `->`/`->:` takes a place/mutable[container], otherwise creates one

Explanation of the idea in coherent words

A shopt strict_type_interaction enables slight syntax additions with type requirements to reflect different behavior.

Basics

In the expression language -> already means mutation (call var->myFunc()), but what to do for command language or function calls in expression syntax that don't pass a variable? Allow, warn, or require procs/funcs to be named with a ->-suffix? (call reset-app->()), to have an indication that it will mutate a variable when called?

Variant 1: "Value-copy-behavior, by-default."

Require special assignment operators only for the lesser used cases, i.e. indicating pointer-setting-behavior (->) for mutables and value.Place, and indicating behavior that varies depending on variable type (=*). While the classical = only accepts "atom" types, thus indicating value-copy-behavior.
In proc/func definitions and calls, require the params/args to have special variable prefixes to request and indicate pointer-behavior with outside mutation, i.e. indicating to require a dereferenceable value.Place (->var) whose referenced type itself can also be changed by the proc/func, or (rare) for choosing an explicitly fixed-type mutable[container] whose type can't be changed (->:var). (The : is more like assignment, and already used for the mutable type List literal :|...|, what could allow for a shorthand syntax sugar ->:|one two|.) And also require to indicate inconsistent behavior that varies depending on the variable type (?var).
New assignment keyword for de-referenced values, i.e. for use within procs/funcs indicating to change an "outside" value of type value.Place or mutable[container] (set-> local_varname =), and also a "local" alias for it to use on the top-level, or if accessing only a temporarily used locally created place (setplace local_var =).

Variant 2: "Pointer-assignment-behavior, by default."

This could possibly use a =: operator to require value-copy assignments, and a prefix of :... in proc/func definitions and calls, besides having the inconsistent operator =* and variable prefix ?....

Variant 3: "Only optional, specific syntax and behavior."

This would keep the inconsistent behavior of the = assignment operator, but add =: for value-copy and -> for pointer-assignment.

Originally posted by @bar-g in https://github.com/oilshell/oil/issues/1791#issuecomment-1893356505

Minimal example showing that the original variable is being changed:

setvar mydict = {}

echo passed variable before:
= mydict
echo

#proc a(; input ) {
func a( input ) {
  echo "input:"
  = input
  echo

  setvar input['a_added'] = 'something'
}

#a (mydict)
var dump
setvar dump = a(mydict)

echo retured:
= dump
echo

echo passed variable after:
= mydict
echo

For the better or worse, mutating the func's local "input" variable within the func actually also mutates the original global dict! (As seen after the function has run.)

This behavior, and the the "same-as-for-var-mutation" syntax, actually seems exactly as wished for for "out" variables in https://github.com/oilshell/oil/issues/1789 , but I really expected that general passing of variables to procs or funcs works through call-by-value (i.e. on a copy) not call-by-reference, i.e. to not mutate the original variables that were passed, at least by default.

bar-g commented 9 months ago

A proposed idea to fix this "same syntax is used for wildly different semantic" problem in python: https://github.com/oilshell/oil/issues/1796

bar-g commented 9 months ago

Not sure what to do here, it now seems to me it's working as inteded, but it's a very surprising not-consistent, not-obvious behavior.

Maybe we can discuss the ideas at https://github.com/oilshell/oil/issues/1796 and I can close here with a sensible conclusion.

andychu commented 9 months ago

Yes so this is a conflict between

being consistent with Python and JS -- they have mutable containers List and Dict
expectations of shell users, where there are no mutable containres

I think @Melkor333 brought this issue up as well

My answer then is if that if you stick to the "proc subset"` of YSH, then you don't have this issue

var myarray = :|one two three|

myproc @myarray  # splicing copies the arguments

But if you do

myproc (myarray)

then you are passing a reference to a mutable container, and then you have this potentially unexpected behavior.

There is a difference between

"atoms"
mutable containers - List and Dict

As of Oils 0.20.0, we will actually print these differently! List and Dict get an address, to show you its a container

ysh ysh-0.19.0$ = []
(List 0x7fed5548e8c0)   []   # <= NOTE ADDRESS HERE

ysh ysh-0.19.0$ = 42
(Int)   42

The issue is rebinding vs. mutating

value.Place lets you rebind an Int or a List

But to mutate a List, you don't need to use value.Place. You just pass the list itself, and then you mutate it with

call mylist->append('foo')

setvar mydict.key = 'zzz'

etc.

Possibly we could allow &mylist as a no-op? Not sure

bar-g commented 9 months ago

Possibly we could allow &mylist as a no-op? Not sure

That sounds like a good part of a solution to make the different behaviour of rebinding/mutating vs. copying when passing or assigning a variable apparent! Let's see if I get you right:

Being able to specify at the definition-site what behavior a proc/func expects seems like a good idea to me.

So, for example in

proc blank-var (&x)

the & would mean to require that passed arguments will always be mutated "in-place".

it makes it obvious for every reader of the code that alteration of x will affect the outer variable

And as a consequence of specifying the & in the definition, all calls of this proc would be required to also specify the & sigel:

var mydict
blank-var (&mydict)

it makes it obvious for every reader of the code that alteration of mydict is to be expected
it does not specify the actual type passed (here, no type has been specified anywhere, so blanking can depend on type etc., e.g. remove contained entries or overwrite with empty string)
if e.g. a Dict "container" is passed, the & operator on the call-site does no-op, but it creates a value.Place for immutable "atoms"

andychu commented 9 months ago

I thought about it a little more, I think any no-op should be separate from &x to avoid confusiong

&x is for rebinding / re-assigning the name x
&&x or +x could be to "let you know" we're going to mutate this container

I was thinking +x because it's already a no-op on integers. Although it's confusing because +'55' may already do something

(Hm actually I just noticed +x doesn't crashes! Need to fix it)

Anyway you could imagine something like

var mydict
clear-dict (mydict)
clear-dict (+mydict)  # same thing

clear-dict (&&mydict)  # another syntax

rebind-name (&mydict)  # this is different

andychu commented 9 months ago

Hm now that I think about it, +x breaks a "language design principle"

https://github.com/oilshell/oil/wiki/Language-Design-Principles

Even though I never use +x in an arithmetic context, it is valid in JavaScript and C as type conversion to integer

So we probably shouldn't use that

It's possible we may have %x available, or &&x kinda makes sense
*x would break the principle since it looks close to Python splat
~x is overloading an operator, probably not
^x is similar to a quotation, probably not

I find && a bit noisy but maybe it's OK

Oh actually !x is possible!! In Ruby and Lisp that sometimes means "mutate", so it could be useful


clear-dict (x)
clear-dict (!x)

However in Lisp it's part of the name

clear-dict! (x)

So actually that could be another thing -- we could allow ! in names

var x = clearDict!(x)
clear-dict! (x)

bar-g commented 9 months ago

I think any no-op should be separate from &x to avoid confusiong
&x is for rebinding / re-assigning the name x
&&x or +x could be to "let you know" we're going to mutate this container

I'm not sure why or where you think what could be confused as what. Maybe I don't understand yet what the rebind-name proc actually does in the example.

rebind-name (&mydict) # this is different

Do you mean mydict is put in a place? I noticed and you wrote something like value.Place can hold Dict, but is it actually neccesary to allow mutables in value.Place? If there is no need, couldn't value.Place only allow immutable "atoms" so that & would always either mean mutable container or place. (And & be a noop in rebind-name (&mydict), if mydict actually is of type Dict.)

1) I'm not sure yet if I completely understood you, but if we consider the topic of assingments, e.g.

setout &foo_value =  # to mutate container or place ('&...' required), setvalue could be an alias of it for in-func/proc and top-level usage
setvar foo =         # to assign immutable (atom) to the variable
setvar foo =&        # to assign a mutable container or place
setvar foo =*        # to explicitly assign whatever type, atom, mutable container or place

Then maybe that "additional rebind/re-assignment?" of the dict in rebind-name (&mydict) could be rebind-name (=&mydict)?

bar-g commented 9 months ago

setvar foo =& # to assign a mutable container or place

I notice setvar foo =& &myinteger, here we would kind of get your double &, to get the same as what &myinteger would do implicitly when specyfied in a proc call?

andychu commented 9 months ago

I noticed and you wrote something like value.Place can hold Dict, but is it actually neccesary to allow mutables in value.Place? If there is no need, couldn't value.Place only allow immutable "atoms" so that & would always either mean mutable container or place.

One way to explain the difference

var mydict = {}
rebind-name-to-a different-object (&mydict)

$ = mydict
(Str)  'this can be a string now'

versus

var mydict
mutate-dict-in-place (mydict)   # note no &

$ = mydict
(Dict)  {'key': 'this MUST be a Dict, CANNOT be a string'}

So it doesn't really make sense to say that &x is a Dict or whatever, it's a name that can refer to ANY type

x is a Dict, but &x is a Place -- NOT a place for a dict, it's a place for any value

And of course we actually use this

json read (&x)   # x can be Dict if the message is {}, List if it's [], etc.

we do NOT know ahead of time what the type of x is -- and &x a Place, full stop

Melkor333 commented 9 months ago

I personally think it's best to hide the pointer concept as much as possible and therefore I don't like & that much. It also doesnt feel very wrong for me that dicts/arrays are passed by reference.

But both is probably because of my python background. So I don't think my input is very valuable here 😅

andychu commented 9 months ago

Yeah I don't think value.Place is common in most code, but we kinda need it for read (&x) and json read (&x)

Though note you can just omit it, and you use the default _reply magic variable

I'm leaning pretty hard toward just borrowing Ruby and Lisp, and allowing

mutate-dict! (x)
call mutateDict!(x)

in the name. It's relatively well known, and easy to implement

It's just a convention, there's no enforcement, like in Ruby/Lisp

It doesn't introduce anything that's not in another language

bar-g commented 9 months ago

First, thank you very much for your patience, I think I got it, now that I see the "json read (&x)" can not know the type it needs to set.

The mutate-dict! (x), json! read (x)? or json read! (x)? syntax, would it only work for one param?, all?, or just a return value?

By now I also understand all this isn't that new and surprising with a python background. (For me, with the current state of affairs, it was quite "shocking", though, actually convinced having hit very fundamental behavior inconsistency bugs in ysh, before starting to figure this out, while almost everything has been working so nice with osh before. :-). Personally, I think it took me considerably more than a week, so quite a bummer if everyone migrating from shell has to go through this.

I think what called for this trouble were the hidden and surprising pointers/dereferencing/mutables, and I hope there can be some good defaults to let a consistent syntax grow up naturally, in a simple and understandable way.

So here is my updated attempt to bring together all the loose ends that I could get a hold of:

[The overview is now maintained within the issue's description.]

[EDIT: thoughts concerning finding the pointer assignment operator]

I can also see some similarity to the 1>&2 file descriptor assignments in shell "set 1 to where 2 is set".

The added > here resembles a bit more of the pointer/referencing meaning.

So re-using that, 'setvar foo =& ...` could be compared to:

setvar foo >& bar
setvar foo =>& bar   # a full arrow for visual understanding

# but that's cryptic and => allone is very similar to the other operators, so we're back to `=->`

setvar foo =-> bar

bar-g commented 9 months ago

Hm, going though my compilation of affected things again, it seems !x could actually very well work in place of &xalso.

bar-g commented 9 months ago

I think only setout may allow to effortlessly set an outer variable to be a new place (compared to ->setValue()).

[overcome stuff removed]

So that would be: [Corrected] setout &passed_var =& &other_var or setout !passed_var =! !other_var (if =! is not too similar to !=, and not too bad as asignment operator)

setout !passed_var =-> !other_var (pretty solid in uniqness, intuitiveness, and looks)

[Overall, ! seems to make a good impression as prefix, even in the most complicated case of creating a new outside place for some variable, best if combined with a distinct more "pointer-like" assignment operator.]

bar-g commented 9 months ago

Ok, I produced an overview again, and put it up in the issue description.

If you waited, I think it has settled now (three variants for the default).

(Have a look at the new overview maintained in the description: https://github.com/oilshell/oil/issues/1793)

Note on Assignment Operators The connotation of ! as dereference!, follow!, or mutating! fits very well when it's used as prefix for!some_var.

However, the connotations don't fit that well when ! is used in a "pointer assignment" operator (seldomly used), and would only make it easy to confuse it with prefixes that may still follow on the right hand side in special cases.

And most importantly: Using =! as "pointer assignment" operator would falsely associate it with places, which is wrong, because it is also used and needed to make new and existing variables point to mutable containers.

So, I settled (back again) for =-> as generic "pointer assignment".

bar-g commented 9 months ago

The only places I currently see where variables would have to be required to be prefixed in order to implement a 100% consistent variable interaction behavior seem to be:

The arguments passed at proc/func call-sites (customizable by proc/func definitions).
~~On the left hand side of setout/setvalue assignments.~~ (see two messages down)

I think there may be no need to require prefixes on the right hand side of assignments, because there, the behavior would already be apparent by the =-> assignment operator. Actually, adding a !-prefix on the right hand side can properly mean to create a place (or no-op if type is already a place).

bar-g commented 9 months ago

Well, as a place can take all types, & (or !) could be the universal and only commonly used one.

I guess the : (or whatever prefix for mutable[containers]) would only have to be used in order to explicitly disable complete variable re-assignments and to disable changing the type from within a proc/func.

bar-g commented 9 months ago

[Requiring indicating prefixes] On the left hand side of setout/setvalue assignments.

That might be relaxable, if it's also supported to read name[indexes] from Place type, as if they were mutables, transparently (if they refer to mutables). (https://github.com/oilshell/oil/issues/1794)

bar-g commented 9 months ago

Hm, since there is one universal place-prefix that will usually be used, and it's not nice if the seldomly needed "no-rebind" variant is completely different...

I now adapted the overview to use the "mutable container" identifier only in addition, i.e. behind the usual place identifier: &:out or (if switching away from the current choice) !:out

The overview seems to gravitate towards pretty consistent, nice and just few unobtrusive syntax requirements!

bar-g commented 9 months ago

Found a further nice simplification: using the !-... prefix for the "without type-changes" case (i.e. mutables only), with - being just a rare special case "modifier" to !. (Issue overview is reworked and includes a table to check.)

What do you think about making the Place type non-nestable and transparent for "index[notation]", just as it already is the case with mutable containers? (https://github.com/oilshell/oil/issues/1794)

That seems to be quite a requirement for smooth and consistent Variable<->Type<->Behavior interaction (usability) that's based on a universal Place, serving as the default "fully-featured" reference type (allowing for re-bind/ type changes), even for mutable containers by-default.

bar-g commented 9 months ago

It's really a mess with no solution in python, other than putting all the burden of the inconsistencies on the users (https://stackoverflow.com/questions/986006/how-do-i-pass-a-variable-by-reference), but ysh has already implemented the data-type part (universal Place type), so can fix this for good.

(https://github.com/oilshell/oil/wiki/Language-Design-Principles) if our syntax looks like JavaScript or Python, it should behave like JavaScript or Python, unless we're fixing a wart.

Isn't this a wart on the nose of languages, if they omit a small amount of syntax exactness which would maintain fully consistent behavior. (And thus obsolete a lot of head scratching and justification "theory" in learning and teaching.)

What do you think about making the Place type non-nestable, and transparent for "index[notation]" so that it works just as with mutable containers directly? (https://github.com/oilshell/oil/issues/1794)

bar-g commented 9 months ago

Examples of new, fully consistent assignment behavior

var a =-> { key: 'value' }
setvar b =-> a            # ok, done two clearly indicated pointer assignments
setvar b.key = 'changed'  # all regular mutation continues to be done with simple syntax
echo $[a.key]             # shows 'changed'  (same value read through pointer variable 'a', as indicated)

func setStart( !dict ) { setout dict.start = 'set' } # no ambiguity (not even if re-assigning dict)
call setStart( !a )                                  # clearly mutating 'a'

var value = 1.01          # floats  are "immutable atoms", but syntax stays 100% consistent:
proc plus1(; !num ) { setout num +=  1 }
plus1 ( !value )
echo $value               # shows 2.01, outer value referenced and changed as indicated

andychu commented 9 months ago

Hm =-> is pretty weird, no language I know of has that. It also can make the interpreter less efficient to have to test which operators are used for what.

I agree that shell users are going to be confused a bit by the new mutable containers, but

Python and JS both agree on this behavior
the rule is fairly simple -- Dict and List are the mutable containers. That's pretty much it.

Basically you have to learn this new rule to get additional power ... BTW people sometimes call it aliasing -- two names that refer to the same value. Shell doesn't really have that idea.

i.e. There is one dict here, not two. a and b are names for the same value; in other words they are aliases.

var a = {k: 'val'}
var b = a

I think that we will allow mutate-dict! because Ruby and Lisp have it

I think that mutate-dict! (!d) is possible, with !d as a no-op, though it's slightly redundant. It seems like you want to put the ! in one place or another, not both places

bar-g commented 9 months ago

Hi, thanks for checking this out,

=-> is pretty weird, no language I know of has that. It also can make the interpreter less efficient to have to test which operators are used for what.

Isn't there already a check now whether to create a immutable copy or just a pointer/alias?

Hm, but I'd say all those q&a pages about the mutable type stuff (with many actually confused answers) and separate gotcha pages actually show what is a weird language shortcoming of not having something like strict_typeinteract, by default.

(Again, I think python et.al. may not be able to fix this wart right away, but ysh can, thanks to the universal Place type.

Couldn't it be a static parsing check, at least for distinguishing "immutable atom" vs. universal Place !... in assignments definitions and calls? I think the dynamic combination of not-rebindable/reassignable, i.e. combining !-... and =* behavior will be much less needed, only for some specific corner cases, if at all. (So, maybe for this case it's enough to print a warning instead of implementing dynamic checks only for this case.)

I think that mutate-dict! (!d) is possible, with !d as a no-op, though it's slightly redundant. It seems like you want to put the ! in one place or another, not both places

Hm, maybe like this: If the name ends with ...! then all defined typed params (and their defaults?) are considered as defined with !... (Place), and all args passed in calls are implicitly converted into type Place? (So it's not necessary to mention the ! on individual args in calls or defined params, but they would not do any harm.)

That might also become possible in a straight forward way, if places are not nestable, i.e ! being a no-op on the Place type itself, and if the container[index] syntax works transparently on Places as well, just as it works on referencing mutable type "alias" variables.

bar-g commented 9 months ago

Oh, there is also a new case of assigning places:

var a  =  place   # 'a' is independent new place, initially pointing to the same
var a =-> place   # 'a' points to same place

Would it be ok to pick only one that makes sense, you think?, i.e. only allow the second one, to never create any possibility of double indirection?

So, a check may be necessary in any case.

bar-g commented 9 months ago

Hm, there are more consequences of strict_typeinteract.

Actually, I think only having the consistent difference in syntax is what would also allow:

var a  =  dict  # 'a' is an independent new dict (copy of dict)
var a =-> dict  # 'a' points to same dict

So, the first line's syntax could actually do an implicit copy (efficient dupe) if given a mutable container.

And in retrospect, I realize that from looking at the current proc/func signatures and code, one can not tell at all how they behave. This is contrary to the impression that the proc/func guide https://www.oilshell.org/release/0.19.0/doc/proc-func.html gave me when reading it beforehand.

Currently, it's not possible to reason about procs/funcs just by looking at their signature. One needs to know the type of each variable, to really know if they can have outside effects or not. And also the code within procs/funcs doesn't tell it all, it's just the same setvar everywhere.

bar-g commented 9 months ago

Hm, after distancing some days from https://github.com/oilshell/oil/issues/1793#issuecomment-1907380297, can you maybe recognize in parts a reasoning bias you may have encountered yourself at some time, when bringing up some shell pitfall and possible fixes? I mean after one got really used to something it is to some degree well understandable, to underestimate the problem, and tending to see problems in a solution, rather.

For example, a parser, here having to parse two different operators (one more) and the execution to check for the operator after having had to check for mutable type anyway, is that a real efficiency issue?

Or, ysh being a more powerful language, what does that have to do with a consistent syntax, it's power would not be reduced at all with consistent and apparent syntax. The same power may become easier to reason about, though, and present and express itself much more naturally based on syntax differences, instead of being solely based on backgound knowledge. (And as a nice side-effect, the syntax can even solve the "mutable defaults" pitfall.)

For experienced python users, what would be new? Creating pointer assignments with setvar and others is a rare thing, it does clean up the initial declarations, and a helpful error message is there to help, before "pointing to new mutables", i.e. var mydict =-> {} becomes a natural thing. Assigning defaults for proc/func variables can work as expected (var=mydict), and if needed as (var=->mydict).

bar-g commented 9 months ago

Actually, only a consistent syntax may ultimately allow for a more powerful language, e.g.:

var a =-> {k: 'val'}  # new dict
var b =-> a           # alias, pointing to same dict
var c =   a           # separate new dict (internally duplicated datastructure, no copy/deepcopy pitfalls)

andychu commented 9 months ago

I thought about it and brainstormed on Zulip, and I think we can do something similar to what you're saying, basically reuse -> in 3 places to connote "mutation" or "aliasing"

We already have

call mylist->append(42)  # mutation

And I decided against ! because it's a different symbol than ->. It's weird to have 2 symbols for a same thing -- it looks a little noisy and perl-ish.

So I think then we could have

var a = {k: 'val'}
var b -> a  # alias, pretty much what you wrote

setvar b.other = 42   # mutation visible through BOTH a and b

however you can also write it like this

var a -> {k: 'val'}  # this is the "first" pointer, not an alias
var b -> a

-> is exactly like =, but it checks if the RHS is a List or Dict -- a mutable container

And then I think instead of myFunc!(mutated) or myFunc(!mutated), we can simply use the same symbol as an optional prefix operator

call myFunc(->mutated)  # creating an "alias" by passing a pointer

clear-dict (->mutated)

Again -> will check if the value is a List or Dict, so that ->[1,2,3] is legal, but ->42 and ->"mystr" are runtime errors

This is not foolproof or a static check, but I think it's a nice way of making code that cares about mutation and aliasing look different

But I'll also note that I expect this to be fairly rare in YSH code, except for library code and frameworks that use metaprogramming

Most YSH code will be simpler transformations on JSON and so forth. Copying files around, and that sort of thing.

When you use JSON, you're creating a copy, so there is no mutation or aliasing.

This is a very "advanced" feature that won't appear in most code

There are a bunch of priorities before this, but I think using -> consistently everywhere makes sense, and is pretty similar to what you proposed.

bar-g commented 9 months ago

we can do something similar to what you're saying, basically reuse -> in 3 places to connote "mutation" or "aliasing"

Oh, yes reusing -> is an even better idea!

I noticed that the overview table that I announced here actually was not in the description (anymore), it seems it got deleted when I later updated something in the description from another stale browser window, I'm sorry.

I've now re-added updated tables, and also re-worked all short descriptions that follow the overview with your idea to re-use ->. That gives a great overall impression.

This is not foolproof or a static check, but I think it's a nice way of making code that cares about mutation and aliasing look different

But I'll also note that I expect this to be fairly rare in YSH code, except for library code and frameworks that use metaprogramming

Hm, I assume most day-to-day assignments are made with atoms (value-copy-behavior), and the amount of aliasing/pointing assignments is quite low (i.e. mostly only the initial declarations, since thereafter mutables get passed along and, well, mutated in place).

And the thing that is really only needed very rarely, or as you say:

fairly rare[ly] in YSH code, except for library code and frameworks that use metaprogramming

is rather assignments that need to take (allow) all types, i.e. the current default = assignments.

With a generally quite moderate use of aliasing/pointer assignments and a large majority of value-copy-behaviour, I had been thinking to require atoms for =, and mutable or Place, for ->, and have =* to allow all types for use in library code or other rare cases.

So the burden of requiring atoms for = assignments and thus having a "foolproof or a static" consistent solution may actually not be that heavy at all.

So,

You wrote to allow var a = {} and not require var a -> {}. But how would you warrant this? Not requiring it would not ensure clearly consistent code, I think.

I understand that it should always be possible to add a -> prefix operator to a function call (once a Place can can be used transparently like the type it refers to), however:

Would you leave the prefix operator optional in myFunc(->mutated) even if the definition explicitly required it, e.g. as in func myFunc(->target)? Not requiring it would not ensure clearly understandable code, I think.

Melkor333 commented 9 months ago

sorry for my late input. It. was a bit too much too fast iteration and I didn't really have much of an opinion anyway.

But I really like the reuse of ->! I honestly wouldn't even mind that much if it was required (and otherwise a copy would happen) but I think the way it works right now even though "inconsistent" is much closer to how it's gonna be used most of the time anyway so I think it's OK to be optional.

andychu commented 9 months ago

Thanks for the feedback @Melkor333 ! That's useful

@bar-g

Having -> in signatures is an interesting idea, probably a good one. So we can reuse -> in 4 places and not 3
The reason to allow var x = [] in addition to var x -> [] is simply that MOST usages of lists should not involve any aliasing. You're just creating an argv array and using it in one place, etc.

So I think most usages of lists can "pretend" that they are values. If you don't have an alias, then a List behaves just like an Int or Str. Most users will think of it that way ... again the aliasing is an "advanced" feature

Most shell code is pretty straightforward -- like imagine 50,000 lines of shell code to build a distro -- how to download, build, and test. You basically would never use aliases for List or Dict there.

The other thing I want to say is that -> would still be a dynamic check, and dynamic checks like const / readonly are fundamentally limited

There is also the notion of static type annotations like this

func f(x List[Int]) {
  return (x)
}

We can parse that but we don't do anything with it yet, and may not ever. But it would interact with

func f(->x) { ... }

and

func f(->x List[Int]) { ... }

andychu commented 9 months ago

I guess one guideline is

we support writing -> in four places
But you should actually avoid using anything that could benefit from ->, because it's a pretty "advanced" usage of shell!

The presence of -> should make you think twice!

Just like in Python, most Dict and List usages are like "values", and you don't worry about aliasing.

But aliasing is extremely useful sometimes, like walking a tree and accumulating values in the tree. That's even popular in Lisp

bar-g commented 9 months ago

Hi, sorry the iteration is due to me being new to python things, so I had -- and also thanks to things I learned here -- could re-work the idea, while maintaining an eye on things problematic in general, and new for shell users.

we support writing -> in four places But you should actually avoid using anything that could benefit from ->, because it's a pretty "advanced" usage of shell!

The presence of -> should make you think twice!

Sure, but wouldn't there only be a -> present in an assignment for sure, and wouldn't one only notice that one is creating an alias and not a copy, if -> is required for (pointer/alias) assignments of mutables?

The reason to allow var x = [] in addition to var x -> [] is simply that MOST usages of lists should not involve any aliasing. You're just creating an argv array and using it in one place, etc.

So I think most usages of lists can "pretend" that they are values. If you don't have an alias, then a List behaves just like an Int or Str. Most users will think of it that way ...

Really, no, it should not be hidden that every List or Dict variable only aliases/points-to the List or Dict, even the one used for the definition. It's not good to hide that at all, as List and Dict variables behave very differently! That was the original reason that let to filing several issues here.

The main thing: They are mutated outside of procs/funcs, and that is currently absolutely not obvious, because of a language syntax that "pretends" to do the same, but really does something differently in the background.

When var x = [] fails and hints to use var x -> [] instead, it's immediately clear that x is not the ~~variable~~List itself, but pointing/aliasing the List. By seeing this, one may already perfectly deduce and understand that procs/funcs may mutate a passed list or dict. That is why I'd say the -> should be required for assignments, it'll make definitions clear, and prevents from unintentionally aliasing, while one may be wrongly expecting value-copy-behavior.

However, in the proc/func definitions like func myFunc(->participants), the -> serves to

re-confirm and show the external mutation, e.g. instead of just accepting and passing just a value from a list or dict myFunc(participant)
require -> to be present in proc/func calls

So, in the definitions, the presence of -> may only be desired if the proc/func is actually mutating anything. Could there be a "static" check determining if there is a setout participants call within the proc/func?

Note: The idea is to require a separate setout keyword for all outside mutables, to make it obvious to mutate external values, even when only assigning atoms to list or dict members. Because there is no need to use the thin arrow in these kind of assignments: setout participants[key] = 'value'.

However, at the proc/func call sites, I think the -> should always be required if present in the definition (i.e. required by it). May this be a "static" check?

In case the passed argument variable already is a Place, the -> prefix at the call site would do nothing, otherwise it would create a Place.

bar-g commented 9 months ago

A part of the idea that seems has gotten lost a bit by the mangling edit:

Using -> in proc/func definitions/calls would always mean using a Place (adding one if needed) for consistent behavior accross all types. And passing a plain, non-rebindable mutable would become a rarely needed special case (->:).

Melkor333 commented 9 months ago

I think at this point it's mostly differing opinions

The status quo is inconsistent but convenient and the question is if we weight consistency more than convenience. A compromise is making it optionally consistent - which I honestly don't like that much, it makes code from 2 people look different.

IMO both options make sense and -> seems consistent enough (and only 1 char more) that I'm personally fine with enforcing it. But it's still a bit unusual - i don't think -> is used anywhere else for assignment? I'm also not really aware of all the consequences of such a breaking change...

bar-g commented 9 months ago

[optionally consistent] makes code from 2 people look different.

Hm, yes. So maybe rather a plain strict_type_interaction? option, allowing to experiment with it and making it a default option in ysh if it works out as expected.

I could imagine that it's actually more convenient to have it enabled, because then the language is unobtrusively showing the difference, exactly in the relatively few situations when things actually behave differently than usual (i.e. aliasing/pointing way instead of the usual default copy-value-behavior).

bar-g commented 9 months ago

To assess it better, what would be significant impact examples of really loosing convenience?

Some initial List/Dict definitions? Proc/func definitions or calls?

bar-g commented 9 months ago

I've augmented and restructured the first table in the description (now 5 places of recognizable mutating behavior) and noticed a slight inconsistency:

"Within procs/funcs" the setout place = 'value' does not show the common ->, it's implicit on the left hand side and I think that makes it simpler to use and think about.

But what would you think about the idea to already use the deviced proc/func naming convention for shortening the setout keyword itself?

For example:

set-> place.key =    'value'          # (mutation)
set-> place     =    'value'          # (rebind to string)
set-> place     -> [ 'one', 'two' ]   # (rebind to list)

bar-g commented 9 months ago

the way it works right now even though "inconsistent" is much closer to how it's gonna be used most of the time anyway so I think it's OK to be optional.

If by that you mean that external mutables are set from within procs/funcs still with an inconsisten "covert"
setvar mutable[key] = "value", I don't think that would need to be true.

Because that would then be an error, hinting to use the shorter, external:
set-> mutable[key] = "value".

[EDIT:] Quite rare, local places:
setplace place[key] = "value".

While locally declared mutable vars continue to work with, as always:

setvar mutable[key] = "value"  # mutate
setvar mutable      = 'string' # re-bind

andychu commented 8 months ago

The details will have to be something we work out when we do it

I think we should use -> more, but it will have to wait awhile, since there are lots of other things in YSH to do like the flag parsing, unit testing, module system, fixing C++ bugs, etc.

I will open a new bug with a rough idea -- thanks for brainstorming

andychu commented 8 months ago

Closing in favor of #1831

oils-for-unix / oils

Interaction (and syntax) differences between: `->` (any type in value.Place) -vs.- atoms. Further: `?`(any) and `->:`("no-rebind" mutable). #1793

Discussed:

New option `strict_type_interaction`

Table of Behavior (Overview)

Short Documentation of Affected Syntax

Explanation of the idea in coherent words

[The overview is now maintained within the issue's description.]

Examples of new, fully consistent assignment behavior

oils-for-unix / oils

Interaction (and syntax) differences between: `->` (any type in value.Place) -vs.- atoms. Further: `?`(any) and `->:`("no-rebind" mutable). #1793

Discussed:

New option strict_type_interaction

Table of Behavior (Overview)

Short Documentation of Affected Syntax

Explanation of the idea in coherent words

[The overview is now maintained within the issue's description.]

Examples of new, fully consistent assignment behavior

New option `strict_type_interaction`