symbols are not idempotent

janet-lang / janet

A dynamic language and bytecode vm

https://janet-lang.org

MIT License

3.55k stars 229 forks source link

symbols are not idempotent #1192

Closed primo-ppcg closed 1 year ago

primo-ppcg commented 1 year ago

Shouldn't they be?

In the "Prepare for 1.27.0" commit, idempotent? was refactored, with :symbol being added to the list of non-atomic-types:

https://github.com/janet-lang/janet/commit/01aab666676c5f93529ba8db5d97b9db1f4c499a#diff-e120880268b4f0f04177470180f50ee0d2c7ac13cb83bb778b6d81efda1cbbcc

I think this may have been inadvertent. Throughout the core library, this function is used to determine if a gensym is necessary to prevent a form from being evaluated twice.

As a result, we have more complicated expansions than necessary:

(def a 1)
(def b 2)
(macex ~(or a b))
# ->
(do (def _00000r a) (if _00000r _00000r b))

where

(if a a b)

would clearly be sufficient.

ml-2 commented 1 year ago

Symbols are not idempotent by the actual definition of the word. This can be important in macros if the symbol is redefined, either because it is a var or because it enters a new scope. Here is an example that gives different results if idempotent? returns true for symbols:

(var step 1)
(loop [x :range [0 10 step]]
  (++ step)
  (prin x))
(print)

Currently, this prints 0123456789. If I reimplement loop so that symbols are considered idempotent, it prints 0259.

But you do seem to be correct that the gensym in or is unnecessary for symbols, since as far as I can tell there's no way for the user to modify the variable in between the two uses.

bakpakin commented 1 year ago

So yes, the issue is fundamentally that symbols are only idempotent in certain contexts (we know that the symbol is not redefined or mutated between all evaluations of the symbol)

We probably could have a separate (possibly internal) macro with such a constraint and use it there.

primo-ppcg commented 1 year ago

Currently, this prints 0123456789. If I reimplement loop so that symbols are considered idempotent, it prints 0259.

I've confirmed that this was the behavior in 1.26.0, that is, until just a few months ago. One or the other is not obviously more correct to me.

as far as I can tell there's no way for the user to modify the variable in between the two uses.

A macro with side effects, perhaps:

(var a nil)
(defmacro toucha [] (set a 1))
(pp (and a (toucha)))
# ->
1

Although, that is true whether symbols are idempotent or not.

ml-2 commented 1 year ago

Consider this modified version of my previous snippet, where a do is added around step:

(var step 1)
(loop [x :range [0 10 (do step)]]
  (++ step)
  (prin x))
(print)

This will always print 0123456789, even in Janet 1.26. But it's more consistent and predictable if adding a do around step does not change what the code does, so I think the current behavior is better.

Regardless of that, though, if idempotent? goes back to returning true for symbols, it should also probably be renamed, since symbols don't fit the definition of the dictionary word idempotent (my opinion is that the fact that just a few months ago symbols were considered idempotent was a bug).

primo-ppcg commented 1 year ago

idempotent? is used to answer several different questions throughout the core library.

As it relates to symbols specifically:

Q: Can this appear multiple times in an expansion without potentially causing side-effects? (which includes and, or, case, match, each and any other place where these are used) A: Yes.
Q: Does this represent an atomic binding target? (which includes if-let, when-let, if-with, when-with and any other place where these are used) A: Yes.
Q: Is this known not to change throughout subsequent iterations of a user provided body? (which very specifically applies to the step argument of a range iterator, and nowhere else as far as I have been able to determine) A: No.

For case 2., symbol? could be used instead (in fact I've done so as a work-around). For case 3., number? could be used instead if it's really a concern.

For case 1., which is by far the most common usage of idempotent?, there is no alternative, other than to define a separate function.

Futhermore, the definition of the function is as given in the documentation, not more or less. idempotent? Check if x is a value that evaluates to itself when compiled.

(setdyn *pretty-format* "%m")
(pp (disasm (fn [] 'a)))
# ->
{ :arity 0
  :bytecode @[(ldc 0 0) (ret 0)]
  :constants @[a]
  :defs @[]
  :environments @[]
  :max-arity 0
  :min-arity 0
  :slotcount 1
  :source "sym.janet"
  :sourcemap @[(2 14) (2 14)]
  :structarg false
  :vararg false}

And finally, I'm not convinced that he behavior of 1.26.0 regarding the :range iterator is wrong or undesirable. If the step argument is modified within the body, it almost certainly would have been done with intent. For example:

(var b 1)
(seq [a :range [1 200 b]]
  (set b (- a b)))
# ->
@[0 1 1 2 3 5 8 13 21 34 55 89]

If we're really caught up on the name, because Webster or whoever disagrees with the definition given in the documentation, we could define a second function, and then just never use the original anywhere. I don't see any practical value in that, though.

sogaiu commented 1 year ago

Thanks for the detailed explanation.

I did some digging into the past...

IIUC, there used to be an ancestor(?) function named atomic? and this was changed here.

Although I do find the behavior slightly surprising given the current name, the intent of "when compiled" seems to have been there for quite some time -- in fact it seems to have been in the very first docstring.

I think it's often difficult to come up with an appropriate short name for something that adequately captures nuance and if one has reflected on naming frequently enough this becomes something one is likely to eventually accept (though people might differ in opinion concerning how much effort / energy should be put into naming) :)

For internal use it's probably clearer how something ought to behave here, but perhaps it's less clear how a public function should behave (in this case there was a history of acting in a particular fashion). So I am in favor of bakpakin's idea of:

We probably could have a separate (possibly internal) macro with such a constraint and use it there.

In any case, I'm not sure it was used that much in folks' code -- at least a search across the Janet code I've collected locally doesn't suggest it was a very popular function.

primo-ppcg commented 1 year ago

So I am in favor of bakpakin's idea of:

We probably could have a separate (possibly internal) macro with such a constraint and use it there.

We could go back to the original definition of atomic?:

diff --git a/src/boot/boot.janet b/src/boot/boot.janet
@@ -129,7 +129,7 @@
 # For macros, we define an imcomplete odd? function that will be overriden.
 (defn odd? [x] (= 1 (mod x 2)))

-(def- non-atomic-types
+(def- non-idempotent-types
   {:array true
    :tuple true
    :table true
@@ -140,4 +140,15 @@
 (defn idempotent?
   "Check if x is a value that evaluates to itself when compiled."
   [x]
+  (not (in non-idempotent-types (type x))))
+
+(def- non-atomic-types
+  {:array true
+   :tuple true
+   :table true
+   :struct true})
+
+(defn- atomic?
+  "Check if x is an atomic type."
+  [x]
   (not (in non-atomic-types (type x))))

... and then use that everywhere. Perhaps the docstring for idempotent? could be updated as well.

ml-2 commented 1 year ago

Q: Is this known not to change throughout subsequent iterations of a user provided body? (which very specifically applies to the step argument of a range iterator, and nowhere else as far as I have been able to determine) A: No.

And finally, I'm not convinced that he behavior of 1.26.0 regarding the :range iterator is wrong or undesirable.

This applies in other places too. That was just the first practical example I found. It applies in each:

(var my-tuple [1 2 3])
(var i 0)
(each val my-tuple
  (prin i)
  (++ i)
  (set my-tuple []))
# Prints `012` in current Janet, prints `0` in Janet 1.26

It applies in case:

(var a 0)
(case a
  (do (++ a) 0) :a
  1 :b)
# Returns :a in current Janet, would return :b in Janet 1.26

And importantly, it will also apply in any user macro that imitates the standard library and uses idempotent?.

I understand that my examples are contrived and edge cases, but they're just examples, what I could come up with in 5 minutes of looking at the code. And edge cases are important because occasionally you'll hit them by accident and get a hard-to-find bug.

Futhermore, the definition of the function is as given in the documentation, not more or less. idempotent? Check if x is a value that evaluates to itself when compiled.

Yes, and symbols do not evaluate to themselves when compiled.

(def x 5)
(= ((compile 'x)) 'x)
# false

Q: Can this appear multiple times in an expansion without potentially causing side-effects? (which includes and, or, case, match, each and any other place where these are used) A: Yes.

I don't agree that that's what use 1 is for. Use 1 is for knowing if you can do an optimization without changing the semantics. Removing the extra def in or, loop, each, etc, is an optimization. For example, if I changed this line in for-var-template in boot.janet:

    (def st (if (idempotent? step) step (gensym)))

to this:

    (def st (gensym))

With the current implementation of idempotent?, no user of the macro would be affected by this change. Their code would maybe be slightly slower, but that's it. However, if idempotent? went back to being like it was in Janet 1.26, then the change would be noticeable in user code in the examples I gave above.

Implementation details and optimizations like this should not be visible in user code when it's trivially avoidable, so symbols should not be considered idempotent?.

Q: Does this represent an atomic binding target? (which includes if-let, when-let, if-with, when-with and any other place where these are used) A: Yes.

For use case 2, that sounds like a totally separate use case, and trying to make the same function handle both is a bad idea. Given that this is the use case that has nothing to do with idempotence, this is the one that should have a new function with a different name.

So I agree with the proposal of adding a function called atomic? for that use case, but I think it should only be used in the if-let family of macros (those which are use case 2).

primo-ppcg commented 1 year ago

It applies in each:

Yes, and this is also very useful. Appending to an array as you iterate it is a very common paradigm for implementing a breadth-first search. As an example, a Sudoku solver:

(def s @[`
3__5____4
49___8___
___4915__
__1____8_
___1___47
543___9_6
__6_4____
839275___
_7_61__59`])

(each x s
  (when-let [i (index-of 95 x)]
    (def row (seq [j :range [0 9]] (in x (+ (- i (mod i 10)) j))))
    (def col (seq [j :range [0 9]] (in x (+ (mod i 10) (* j 10)))))
    (def box (seq [j :range [0 9]]
      (in x (+ (- i (mod i 30) (mod (mod i 10) 3)) (mod i 10) (mod (* j 10) 29)))))
    (def used (distinct [;row ;col ;box]))
    (def chars (filter (fn [c] (= nil (index-of c used))) "123456789"))
    (each c chars
      (def y (buffer x))
      (set (y i) c)
      (array/push s y))))

(print (last s))
# ->
318527694
495368172
627491538
761934285
982156347
543782916
156849723
839275461
274613859

I understand that my examples are contrived and edge cases

That I will agree with. Not to mention that what is happening entirely intuitive and useful. It is my opinion that the user can be trusted to manage their own variable references in a sensible way. The only reason to not allow it is because WeSaySo™, and by doing so have deliberately slowed your code besides. You're welcome.

Their code would maybe be slightly slower, but that's it.

The difference is in fact much bigger than you realize. That's precisely why I raised the issue in the first place.

Using this profiling tool:

(def a 1)
(def b 2)
(timeit-loop [:repeat 500_000_000] (or a b))

master     6.026s, 0.01205µs/body
no-symbol  4.167s, 0.008335µs/body

That's 44% faster.

ml-2 commented 1 year ago

The use cases you're talking about are not hard to do in other ways, so I don't think this overrides the general idea that optimizations should not change semantics. In any case the example you just gave involved mutating an array, not mutating the binding itself, so it is unaffected by the subject of discussion.

Yes, I am in favor of optimizations. The point is that they shouldn't change semantics. You can do the optimization without changing semantics in almost every case. In the case of or, we manually determined that even though symbols are not idempotent, they are safe to reuse in this specific case, so we could manually check if it's a symbol without changing idempotent? and breaking other macros. I would be in favor of adding such a check.

ml-2 commented 1 year ago

That aside, I apologize for my rudeness. I prefer to be dispassionate in debate, and everything I say should be taken to just be my opinion. If you still disagree with me now that I have laid out my arguments, then that's fine and you can just do what you determine is best.

primo-ppcg commented 1 year ago

I'm also just expressing opinions. I'm not the one who makes decisions around here 😉

Really though, I don't see the benefit. To deliberately slow every each, every range iterator, every match, every case, because someone, somewhere, at some point it time might try to modify a variable - doesn't make much sense to me. And more likely than not, the aforementioned hypothetical person will have done so deliberately, and be mildly annoyed that it doesn't work. Perhaps I'm just too much of a pragmatist.

It's a fair point, though, that there is a reasonable expectation for functions in the public API to be correct in their implementation. That the adding of :symbol to the list of non-idempotent-types - which, I now agree was probably correct - slowed many dependent functions and macros I still believe was inadvertent. All of the so affected, of course, pre-date the 1.27 release.

ml-2 commented 1 year ago

I think that how much it slows down depends on how much work is being done inside the expanded code. Other than the or example, the slow down will usually be much smaller than 44%. I just tried it on an each with an empty body iterating on an empty tuple, and the difference was only around 2% the first time I ran it.

(def b [])

# Current Janet
(timeit-loop [:repeat 500_000_000] (each a b))
# Elapsed time: 11.206s, 0.02241µs/body

# Modified version of each where `idempotent?` returns true for symbols
(timeit-loop [:repeat 500_000_000] (my-each a b))
# Elapsed time: 10.991s, 0.02198µs/body

If I make the tuple non-empty and add a body that does a bit of work, the difference becomes even smaller.

(def b [1])

(timeit-loop [:repeat 100_000_000] (each a b (math/pow 3.7 2.5)))
# Elapsed time: 6.091s, 0.06091µs/body

(timeit-loop [:repeat 100_000_000] (my-each a b (math/pow 3.7 2.5)))
# Elapsed time: 6.062s, 0.06062µs/body

And in fact, performance showing itself to be difficult to get right, here is an example where the optimization is actually slower:

(var b [1])

(timeit-loop [:repeat 100_000_000] (each a b (math/pow 3.7 2.5)))
# Elapsed time: 6.300s, 0.063µs/body

(timeit-loop [:repeat 100_000_000] (my-each a b (math/pow 3.7 2.5)))
# Elapsed time: 7.508s, 0.07508µs/body

From what I can tell, the reason it's slower is because b is a var in this example, and accessing mutable bindings seems to be slower than accessing immutable bindings.

I think part of our disagreement lies in weighing the tradeoffs differently. I made my peace with lisp being slow a long time ago, but I have a strong preference for simplicity, so I prefer avoiding special cases even if it costs a bit of speed.

Possibly an even better way to get this speed back might be to add a compiler pass that detects unnecessary defs and removes them. In theory this pass could be smart enough to remove the def only if the value is an immutable binding, which would be even better than what macros can do on their own, since they can't know if a binding is mutable or not. But I suppose this suggestion is a cold comfort, since it would be significantly harder to implement and I'm not exactly volunteering... (though this could be an interesting thing to implement, so I'm also not not volunteering)

In the interests of reproducibility, this was Janet 1.28, and here is the implementation I used for my-each:

(defn- each-template
  [binding inx kind body]
  (with-syms [k]
    (def ds (if (or (idempotent? inx) (symbol? inx)) inx (gensym)))
    ~(do
       ,;(if (= ds inx) [] [~(def ,ds ,inx)])
       (var ,k (,next ,ds nil))
       (while (,not= nil ,k)
         (def ,binding
           ,(case kind
              :each ~(,in ,ds ,k)
              :keys k
              :pairs ~[,k (,in ,ds ,k)]))
         ,;body
         (set ,k (,next ,ds ,k))))))

(defmacro my-each
  "Loop over each value in `ds`. Returns nil."
  [x ds & body]
  (each-template x ds :each body))

primo-ppcg commented 1 year ago

Possibly an even better way to get this speed back might be to add a compiler pass that detects unnecessary defs and removes them.

Great progress has already been made towards this, just a few weeks ago: #1163

Testing methodology is also slightly different, I recompiled the current master with :symbol removed from the list of non-atomic-types. But I also found that a var binding benefits from being aliased. I wonder if this could somehow be exposed (and then we could be in complete agreement that var bindings are not idempotent). I suspect, however, that idempotent? would need to be rewritten in C. Maybe I'll give it a go.

I made my peace with lisp being slow a long time ago

I used Lisp specifically because it was fast (SBCL).