Open Vaguery opened 9 years ago
Another thing I realized: since keep-number-reasonable
is used in the generic adder
and divider
macros, anybody who tried to add (for instance) a Rational
type and use those constrained macros would arbitrarily lose values smaller than the min-number-magnitude
. In other words, the perfectly reasonable Rational
value 1/1000000000000
would become 0.0
.
Whenever the value of
max-number-magnitude
is the default value of1000000000000
, then the result of(float_mult 10000000.0 10000000.0)
(or any largerfloat
result) will be dropped down to theinteger
value1000000000000
.
I don't see this. If the result is a float then wouldn't keep-number-reasonable
return (* 1.0 max-number-magnitude)
which would be a float?
But what do you want it to do, really? Besides "not raise exceptions"?
I think that's actually the whole purpose.
I expect the origin of the truncation kludge is something about the way Clojure handles numeric data type arithmetic...
I think its origin was mainly in the fact that without it random variation could (and often would) produce single numbers that exhaust all of memory, e.g. from a loop that performs repeated exponentiation. We have to make the environment safe from such run-killing excesses, which is also why we limit the size of values on the code
and exec
stacks (which could otherwise grow exponentially).
That explains the max-number-magnitude
. I think that min-number-magnitude
was added when we noticed a lot of runs crashing because of floating point underflow errors.
Another thing I realized: since
keep-number-reasonable
is used in the genericadder
anddivider
macros, anybody who tried to add (for instance) aRational
type and use those constrained macros would arbitrarily lose values smaller than themin-number-magnitude
. In other words, the perfectly reasonableRational
value1/1000000000000
would become0.0
.
Maybe that's a fine value, but if the denominator is allowed to grow exponentially then we'd be in trouble again, possibly consuming all memory and crashing the run because one program did something foolish. Conceivably, though, we'd want different limits for rationals, in which case we should ditch the generic approach and break out everything by numeric type.
Whenever the value of max-number-magnitude is the default value of 1000000000000, then the result of (float_mult 10000000.0 10000000.0) (or any larger float result) will be dropped down to the integer value 1000000000000.
I don't see this. If the result is a float then wouldn't keep-number-reasonable return (* 1.0 max-number-magnitude) which would be a float?
No, you're looking at the underflow. If the result of float
multiplication is very large, the result of keep-number-reasonable
is always an integer.
So (not asking for a history, just proposing) why not simply check values pushed to the stacks?
In other words why not state explicitly that any integer
must be between -1000000000000
and 1000000000000
(which is in some sense true now), and that every float
must be between the equivalent limits (which is not true, except approximately), and also must not underflow.
(float
values now simply disappear (becoming integers
) when they overflow, while integers
are rounded down to the hard limit.)
Then we can simply check each numeric value before pushing it to the stack, and change or discard it as appropriate.
That separates the gate-keeping responsibility from the instructions (which are doing it wrong now), and expresses your goals better. As things are now, they are tied up inextricably with one another.
Alternately, just use exception-handling as it's intended, and capture over- and underflows and do the truncation when it occurs. So (expt 10000 10000)
(if that comes up) can be intercepted and replaced on the fly.
No, you're looking at the underflow. If the result of
float
multiplication is very large, the result ofkeep-number-reasonable
is always an integer.
I still don't see it.
Here's the full definition:
(defn keep-number-reasonable
"Returns a version of n that obeys limit parameters."
[n]
(cond
(integer? n)
(cond
(> n max-number-magnitude) max-number-magnitude
(< n (- max-number-magnitude)) (- max-number-magnitude)
:else n)
:else
(cond
(> n max-number-magnitude) (* 1.0 max-number-magnitude)
(< n (- max-number-magnitude)) (* 1.0 (- max-number-magnitude))
(and (< n min-number-magnitude) (> n (- min-number-magnitude))) 0.0
:else n)))
If n
is a float then we're in the :else clause. And then if (> n max-number-magnitude)
then we return (* 1.0 max-number-magnitude)
, which will be a float, right?
So (not asking for a history, just proposing) why not simply check values pushed to the stacks?
Isn't that exactly what we do? When a numeric instruction is about it push a result to the stack it runs it through keep-number-reasonable
which checks it and handles it if necessary. I don't see what you are suggesting that's different.
Alternately, just use exception-handling as it's intended, and capture over- and underflows and do the truncation when it occurs.
Perhaps that's a good alternative implementation, but I'm not sure. I liked and understood exceptions ("conditions") in Common Lisp, but when I moved to Clojure I found them confusing and never got them to do what I wanted (I think because they're handled in a different context than I expected). So I've avoided them. If there's a way to get the functionality of keep-number-reasonable
with exceptions, and if it would really be better in some way, then maybe we should do that.
@lspector you're right about the float
thing; my mistake. I was aghast at how many places it was called and got flummoxed trying to think through how somebody (me) could possibly add a new numeric type to the language without kicking the wall.
On that note: :smile:
I don't see what you are suggesting that's different.
All I'm saying that instead of having fifty different calls to a single function called keep-number-reasonable
scattered all over the codebase inside macros that are themselves called indirectly by instruction-creating definitions, and which is responsible for keeping every damned number reasonable no matter what type it is… that you should have a single function with clear and explicit responsibility for making numbers the right size when they are pushed to the stack, and have one function for each type of number.
I'm saying that if you want to have integers fall within a particular range, then there should one one call to the function that forces them into that range, and it should be right there where every integer
is pushed onto the :integer
stack. No other places. Not inside vector_of_integers_add
or something. One place.
Similarly, there should not be a cond
for handling two types, there should be a separate function for each type.
My heartfelt and carefully-considered frustration comes, to be honest, from trying to imagine how to make simple, small changes that could be incorporated into your working codebase. And failing. Because it's so tied together in knots, there's almost no single method or object that I can approach without potentially affecting a whole slew of globals and at least one thing tucked away in util.clj
that actually affects a whole slew of other untested stuff that might also break.
To avoid introducing dangerous regressions (whatever that means when there are no automated tests) the entire codebase either has to all be tested first as it is (in other words without making any changes to improve or change it), or started over from scratch.
I wish somebody from your side had the resources to actually pair with me. Preferably you, since you are the final gatekeeper, and seem to have the most to gain from understanding the strengths of automated testing and the extraordinary dangers this untested and hard-to-extend codebase poses to your students and research program. :(
should have a single function with clear and explicit responsibility for making numbers the right size when they are pushed to the stack,
I thought we were sort of doing this by having keep-number-reasonable
which any instruction should call on numbers as the instruction pushes it onto the stack. But now I see that we could instead put this into push-item
. This would indeed clean up the code for instructions. I guess a minor cost would be that push-item
would have to do have a switch based on the stack it is pushing to, but that seems like it should be fine.
I wish somebody from your side had the resources to actually pair with me. Preferably you, since you are the final gatekeeper, and seem to have the most to gain from understanding the strengths of automated testing and the extraordinary dangers this untested and hard-to-extend codebase poses to your students and research program. :(
I would like this too, but it will be unusually tough for me in the next couple of weeks at least, since I've just taken on an administrative position, covering for a colleague who is out on maternity leave... But I will bring this up at our lab meeting tomorrow and see what we can do!
In the meantime I've kicked to wall too many times, and started this project. I'm not a good enough programmer to write any interpreter without tests, and definitely am not the man to fix this born-legacy code.
As I move forward (much faster now), what I'll do is stop opening unresolvable issues here that involve architecture changes nobody can make, and start asking specifically what it is each de facto "feature" does. So for example in this case, the code defines the Push integer
type as being an integer between the bounds set by max-number-magnitude
... and not existing outside that range. Is that the intention?
started this project
Cool.
the code defines the Push
integer
type as being an integer between the bounds set bymax-number-magnitude
... and not existing outside that range. Is that the intention?
I guess. My real intention was more like "integers, but don't let them get big enough to crash the world." But in practice, yes.
I'm definitely late to this party (had a crazy week and saw a long thread). But, I like @Vaguery's idea of checking the legitimacy of things at pushing-to-stack time instead of in-instructor time. If we decided to do this, we could do all size checking at push time, including making sure things like strings and vectors and code don't get too big. This would definitely clean up the code base a lot.
I'm a bit sad that @Vaguery is quitting on the idea of separating the Clojush interpreter from the GP. It would definitely be a very useful thing to have done. But, I agree that it's a mess of wires as it currently stands, and is not simple.
Dude, I'm totally not quitting the idea of separating the interpreter. I'm just not able to heal the current interpreter.
Oh yay! I misinterpreted you then. Are you aiming to have feature parity with the current interpreter?
Absolutely. Except with complete coverage with automated tests, and without 729 duplicates of "are there two integers on the :integer stack?" written in sixteen different dialects :).
The first thing is to make it clean, and make it extensible. It will work in fundamentally different (and more stable) ways, internally, but it will not behave in any detectably different way. Except where I have found bugs in the Clojush codebase.
That said, I am still going to implement this weird-ass (no offense) hard precision limit on integers and floats as a "quirks mode" behavior, for backwards-compatibility.
OK, this really should get addressed at some point before I am able to say that the new interpreter works "the same" as the old one.
At the moment (and for the foreseeable future I suppose) the behavior of Clojush is:
integers
(by Clojure) are pushed to the :integer
stack; this includes BigInt
literals like 8N
or 888888888888888888888888N
integer
arithmetic instructions are unboxed, and therefore automatically promote types to arbitrary-precision BigInteger
values when needed (and silently); in other words, the math is performed idiomatically and correctly as far as Clojure is concerned (at least since version 1.3), and will never produce an overflow error AFAIKinteger
instruction then applies keep-number-reasonable
, which truncates integer and BigInt
values to the max-number-magnitude
global, which defaults to 1 000 000 000 000
Summary: a "Clojush integer
" is created whenever a Clojure literal would normally be produced in a calculation, and is always truncated to the range[-max-number-magnitude,+max-number-magnitude]
.
floats
" (by Clojure's float?
checker) are pushed to the :float
stack; this does not include BigDecimal
literals like 8.0M
or 888888888888888888888888.888888M
. Note this can get pretty quirky at the boundaries between Clojure, Push and Java: (class 9.0)
is java.lang.Double
, not java.lang,Float
, though Java defines both a Float
(single-precision) and Double
(double-precision) type. In other words, when we write the literal 1.234
(regardless of its length), Push calls that a float
, Clojure calls it a double
but stores it as a float
if it needs less than 32 bits and as double
if it needs more than 32 bits, but even though BigDecimal
is also a floating point number, Clojure returns false
for (float? 1.234M)
…float
arithmetic instructions are also unboxed, but Clojure still overflows large double
values to Infinity
and small double
values to 0.0
automatically (in other words, it does not as of Clojure 1.7, auto-promote to BigDecimal
to avoid overflow or underflow)float
instruction applies keep-number-reasonable
, which truncates float
results to the (typecast) max-number-magnitude
global, which defaults to 1 000 000 000 000.0
; it also rounds potential underflow values to 0.0
Summary: a "Clojush float
" a Clojure auto-promoted floating-point type (by default a double
), and is always truncated to the range[-max-number-magnitude,+max-number-magnitude]
.
The question that I need answer is: Moving forward, do you want numeric values to be truncated this way, or do you me to try to pursue an approach that leaves the behavior of fixed-point and floating-point numerics up to Clojure as much as possible?
We agree that the problem arises because GP often tries to use large values in inappropriate situations. But Clojure's (and Java's) unboxed numerics are capable of quickly calculating arbitrary-precision numbers, and are apparently smart enough to avoid doing so prematurely. The problems of memory consumption @lspector mentioned are things we can perhaps better manage by monitoring and restricting memory consumption, rather than numeric precision.
The troubling thing is that there is no logging of when truncation has occurred, a situation I will fix in the new interpreter if you want to have this behavior preserved.
So I'm trying to write tests for the
Interpreter
"routing" system, which is the cascade of recognizers that send items on the:exec
stack to the appropriate other stacks.At the moment this is the code that handles that, but the behavior of particular programs and the results of calculations are also filtered through this kind of ad hoc thingie, and as a result there are some non-obvious consequences for numeric values calculated in the course of running a Push program.
So as far as I can see, more or less every function that "returns an
integer
" explicitly callskeep-number-reasonable
, so the de facto type we call a Pushinteger
is a kind of truncated thing. Clojure will happily treat arithmetic (and other) numeric results as either boxed (fixed precision) or unboxed (arbitrary precision) values, and it looks also as if the basics of Clojush arithmetic uses arbitrary precision for results, but then inevitably applieskeep-number-reasonable
to that. Similarly,float
values are protected bykeep-number-reasonable
from both the same overflow and also underflow—anything within some small $\epsilon$ of0.0
is rounded down to0.0
.But it is also true that if a
float
result is larger thanmax-number-magnitude
, thenkeep-number-reasonable
will convert it to aninteger
. You don't have to look at the tests I'm writing that surfaced this problem, but can see this in the code itself. Whenever the value ofmax-number-magnitude
is the default value of1000000000000
, then the result of(float_mult 10000000.0 10000000.0)
(or any largerfloat
result) will be dropped down to theinteger
value1000000000000
.And because the result of
keep-number-reasonable
is used inline within numeric instructions of several types, theinteger
will be present on thefloat
stack as a result offloat_mult
. I suspect at least a few more of these are lurking in there for transcendental and exponential functions as well.So what is
keep-number-reasonable
really supposed to do? Can we be a bit more rigorous about its goal? For instance, can we pick a maximum precision for any Pushinteger
value? I expect the origin of the truncation kludge is something about the way Clojure handles numeric data type arithmetic, and that you maybe were getting exceptions whenBigDecimal
results had infinite representations, but pure Javafloat
primitives were overflowing and raising exceptions too?But what do you want it to do, really? Besides "not raise exceptions"? At the moment it's injecting subtle and confusing ad hoc changes to running code. And there are no such checks and truncations on the "routers", so a program that contains
2000000000000000000000
as a value would be perfectly runnable.