Open thelmuth opened 9 years ago
We do already have code_map
, which could be used as a model for others. Also, several exec
instructions have related functionality.
There are also exec_do*vector_integer
and similar instructions for the other vector stacks.
I would be happy to add these. I've done them at least twice in push-forth
and duck
, and it's a good place for me to get a better sense of adding comprehensive midje
tests as I work. :)
The bigger question, of course, is how do you want to handle big swathes of functionality like this? A few instructions at a time, or a suite all together?
Personally, I'd say do it however you like! If you want feedback, a little at a time might be good, but most things "never get done", so if you're going to "get it done", then do it your way :)
So let it be written.
Please be sure, though, that the execution of a single instruction is always well bounded in space and time, since we only check for limits being exceeded between instructions. I'm pretty sure we've been careful to do this -- for the definition of "well bounded" that has mattered for us in practice -- for every instruction currently in Clojush, but I also think (though I'm not sure) that @Vaguery has had a somewhat different philosophy on this (maybe because of different ways to catch things when they run out of control).
My idea right now is to bring over something like the :enumerator
type I built in one of the other languages. It is a direct extension of the forms you are used to with exec_do_etc
and code_do
etc`, and so it's completely reentrant and iterative in the same way.
Briefly: am enumerator
contains one immutable sequence (I will start with vectors
here) and an associated persistent "pointer state". As with code
there is no need that the sequence be single-typed, and one enumerator
may contain a vector_integer
and another a vector_string
or a string
item.
When created, the enumerator
's pointer is set to 0
.
A standard library of pull the top enumerator
from that stack and push the following to exec
:
enumerator_first
produces
enumerator
, with its pointer set to 0enumerator
enumerator_last
produces
enumerator
, with its pointer set to the last indexenumerator
enumerator_forward
produces
enumerator
, with its pointer incremented by 1; if the pointer exceeds the length of the sequence, the enumerator
is consumed without outputenumerator_back
produces
enumerator
, with its pointer reduced by 1; if the counter was already 0, the enumerator
is consumed without outputenumerator_next
produces
enumerator
, with its pointer incremented by 1; if the pointer exceeds the length of the sequence, the enumerator
is consumed without outputenumerator_prev
produces
enumerator
, with its counter reduced by 1; if the counter was already 0, the enumerator
is consumed without outputenumerator_history
produces
enumerator
enumerator_remaining
produces
enumerator
enumerator_apply
produces (with a code
argument)
enumerator
code_quote
code
itemcode
itemenumerator_map
produces (with a code
argument)
enumerator
, with the pointer advancedcode_quote
code
itemcode
itemenumerator_moveto
produces (with an integer
argument)
enumerator
, with the pointer set to the absolute position indicated by the int
modulo the sequence lengthenumerator_move
produces (with an integer
argument)
enumerator
, with the pointer moved relative to its current position by a number of steps determined by the int
; if the pointer leaves the sequence from either end, the enumerator
disappears.enumerator_pos
produces
enumerator
integer
enumerator_break
produces
integer
[sequence]_to_enumerator
creates a new enumerator
from the indicated sequenceenumerator_dup
, enumerator_pop
and so forthNote: in most of the implementations I've worked on where the goal was extending Push, I did not want anything reaching into the associated sequence and changing it. In duck
things are more fluid, and as I recall the stack-affecting messages like :dup
and :pop
can be received by an Enumerator
instance as well, so all kinds of madness happens.
Anyway, @lspector, there is no more than one additional copy of the sequence ever, and so I think the computational implications of this approach are in keeping with the Push philosophy at least as much as code_do_times
and so forth are.
I didn't fully digest all of the details, but this looks cool to me. I like the idea of the enumerator stack. It always felt a little scary to me that indices for the exec iteration instructions are kept on the exec and integer stack, where they can be messed around with easily. An enumerator stack that keeps track of that stuff without multipurposing the exec and integer stacks seems like a clean way to avoid that mess. Thumbs up!
Is there a pre-existing idiom (or convenience function) for adding a new type to clojush.globals/push-types
? I would prefer to have the responsibility for registering a new type live in the definition of that type, so people who don't want to use it don't end up with extra chaff added to globals.
Nope. The only current way that I'm aware of is to just add it to that list of stack types.
I could use some feedback on the work I've done so far. It's not in any sense "done", mainly because there's a huge amount of repetition in the instruction definitions, but I'd like to know if it makes sense so far to an external reader familiar with Clojush's overarching structure.
You can compare to the more-or-less intact Clojush state using the branch comparison view
Definitely looking like a good start! A few comments:
else
argument, and will return nil
if the collection is empty. It should instead either return state
or popped_state
, depending on your intention (which I think is the latter based on your Rules of Thumb). I'm actually seeing this in all of your instructions.Otherwise, it's hard to tell if things look "right" without comments about what they're supposed to do.
Do you know if there is room inside define-registered
for a docstring?
Good catch on the return state. I was just getting around to getting rid of all the repetition that masked that from me.
Do the midje tests, which explicitly say what every thing should do in every edge case, help clarify what the code is intended to do? I will also add comments because that's the standard and because the docs are generally pretty bad, but the intention is always best spelled out in the test suite.
Do you know if there are any instructions that don't start by setting the metadata of their requirements, and (fn [state]...
?
(and I mean when they're "unrolled", too; I see for example the inner function of popper
in the example is equivalent)
Do you know if there is room inside define-registered for a docstring?
I don't know! You could always try and see if anything breaks.
I will also add comments because that's the standard and because the docs are generally pretty bad, but the intention is always best spelled out in the test suite.
I didn't check, but will. Though, that assume there aren't any typos/bugs in the test suite. And also: English is easier to understand than someone else's code.
Do you know if there are any instructions that don't start by setting the metadata of their requirements, and (fn [state]...?
Not that I know of. But, there's no reason you couldn't use another idiom. Each instruction has to be a function that takes a push state and returns a push state, and needs metadata associated with it that tells what stacks it uses, but you can get there any way you like.
The define-registered
macro, as currently written, cannot take a doc string. This has annoyed me and I've resorted to using comments in the past, but it could certainly be fixed. define-registered
would have to be written to take an optional doc string and to attach it as meta-data to the instruction name in the appropriate way (which I haven't investigated, but is probably not hard, modulo the usual mental orientation/clarity that's needed when writing macros).
Was just about to open this as an issue :)
I've written a little testing helper function called safe-execute
in my midje tests which will
execute-instruction
to a statepush-types
as its keys (e.g. when it's nil
)I am making an effort to use this in every test call of an instruction, so that every returned value from every edge case is validated. Obviously this is too much work for the interpreter to do in the normal course of running a program (?), but it's really pretty spooky that this sort of thing (blithely ignoring nil arguments) is still lurking in there.
Was that last comment supposed to go in #153?
Will do enumerator_map
, enumerator_set
and the remaining conversion instructions tomorrow.
A few questions of preference:
true
would permit the Enumerator
to "wrap" in both directions when the pointer moves beyond the currently permitted bounds.Thinking more about map
and apply
functionality: My previous implementations were all in interpreters where every result went onto the equivalent of the :exec
stack. Here I see many instructions jump straight to pushing results to their eventual stack. I am thinking the enumerator_map
instruction should do something like this:
enumerator_map
onto :exec
, unless the Enumerator
is done:code
item onto :exec
(leaving a copy on top of :code
)Enumerator
pointer onto :exec
Enumerator
pointerAnd enumerator_apply
(not my favorite name) would be slightly different in that it would take new items from the :code
stack on each step:
enumerator_apply
onto :exec
, unless the Enumerator
is done:code
item onto :exec
(popping it from :code
)Enumerator
pointer onto :exec
Enumerator
pointermy plan is to build (and test)
enumerator_from_vector_integer
enumerator_from_vector_boolean
enumerator_from_vector_float
enumerator_from_vector_string
enumerator_from_string
enumerator_from_code
That last is not intended to be a tree-traversing zipper
kind of thing; it will just be treated like a "flat" list of subtrees.
What have I missed?
@thelmuth No, it as intended for here, as a response to the bugs you caught!
Do you want to permit "looping" behavior? I'm guessing this would be a boolean flag, settable by a toggle instruction, which when true would permit the Enumerator to "wrap" in both directions when the pointer moves beyond the currently permitted bounds.
This sounds a bit crazy. Good crazy? I don't know. It would be a lot easier to have infinite looping behavior with the looping you're talking about.
re: enumerator_map
: I would love to be able to use this instruction (and similar ones) without having to use the :code
stack at all. The :code
stack is great when you want to use it a lot, but adds a ton of extra complexity (and a larger instruction set) if you otherwise don't need it. Would it be possible for some of these instructions to use the :exec
stack instead, or to have 2 versions, one which takes the code from :exec
and one from :code
, like exec_do*range
and code_do*range
?
No, I'm sorry, I can't possibly see how anybody could do that. Good grief, man, these aren't computers we're... oh wait.
Yes!
While we're discussing things: as I've mentioned, this isn't my first go on the Enumerator ride. One set of instructions (from push-forth
I think) I had sort of set aside because I figured you'd think they were pretty weird. These take an entire stack and create an Enumerator
out of that. enumerator_from_all_integers
and so forth, I guess they would be here.
Just sayin'.
My favorite is one I suppose would be called enumerator_from_all_enumerators
....
While we're discussing things: as I've mentioned, this isn't my first go on the Enumerator ride. One set of instructions (from push-forth I think) I had sort of set aside because I figured you'd think they were pretty weird. These take an entire stack and create an Enumerator out of that. enumerator_from_all_integers and so forth, I guess they would be here.
I'm all for this! While it might be sort of crazy, it seems like it could be very useful at times.
There still should be some discussion of what reduce
and filter
might be in a Push setting. Realizing that one doesn't want to do them "all at once" (because that's the Push way), I'm not entirely clear what makes sense.
The sort of canonical reducer is injecting +
over a list, right? I'm not sure what it means to "reduce" a list by "applying" an arbitrary Push code item—with unknown arity—to even one item, let alone "combining". Even the map
I've sketched above feels a bit like making a sandwich by carefully putting a slice of bread between every item....
I will postpone "looping" for the time being.
I notice a lot of the instruction names are pretty long. Can I get away with renaming them enum_thingie
rather than enumerator_thingie
, or is the naming standard to always use the complete stack name? Either way is good for me, just loads of typing. I understand enum
is probably an overloaded string for many programmers, and it's not at all the same thing as an enumerator.
Quick judgment call: Should instructions like enumerator_map_code
leave the code
item in place on that stack after they've exhausted the enumerator
, or pop it?
I haven't looked into the enumerators yet, but the general principle is to pop whatever you use. If it's straightforward to dup
it first, so that you can easily get the effect of not popping when that's what you want, then I don't see a reason not to follow this general practice. However, that can sometimes get messy for things involving code
and exec
stack manipulations, and I don't immediately recall if we've already made exceptions for some of those.
That makes sense, and confirms my expectations.
So what it's doing now is leaving a copy of the top :code
item until it deletes itself. This permits the same kind of mayhem code_do_...
instructions do, since the code item "mapped" can change the state arbitrarily; it also permits the code item to rewind the enumerator, so more mayhem. But it doesn't unnecessarily duplicate the top code item.
The equivalent enumerator_apply
instructions will pop the code or exec item when it's "applied".
I notice a lot of the instruction names are pretty long. Can I get away with renaming them enum_thingie rather than enumerator_thingie, or is the naming standard to always use the complete stack name? Either way is good for me, just loads of typing. I understand enum is probably an overloaded string for many programmers, and it's not at all the same thing as an enumerator.
If you go with enum_thingy
, then I'd change the stack name to :enum
. That would be fine with me, since we already have the abbreviated :exec
stack, but I don't particularly care either way.
I don't think we should forbid abbreviation, and Tom's right that we already have exec
. I do personally generally prefer to avoid abbreviations by default, but I agree that brevity is desirable. On this specific case I think that the overloading that @Vaguery mentions is real and worrisome. If I saw an enum
stack and enum_*
instructions (and I definitely agree with @thelmuth that stack names and instruction prefixes should agree), then I would definitely assume that they're for small finite sets of elements with symbolic names --- that is, enums
I've seen elsewhere. I still haven't studied the enumerators
being developed here, but I gather that the idea is quite different.
What C calls enum
I would call categorical
if I were to import my types to handle small symbolic sets....
If you shortened categorical
to cat
while also shortening enumerator
to enum
then you would guarantee maximal confusion :smiley:.
Oh no, I can guarantee a much higher level of confusion, just by pointing out that Clojure permits Unicode function names....
clojush.core=> (defn ℤ⊕ [a b] (+ a b))
#'clojush.core/ℤ⊕
clojush.core=> (ℤ⊕ 8 12)
20
Of course we would push that to the ℤ stack, with is clearly the integers....
But then you'd know you're confused. Confusion is maximal when it is unexpected!
$$c log(c)$$
Precisely, but quantifying the c
you care about is always the rub.
OK, I think I have things squared away. In order to get a minimal viable set in folks' hands, I did not implement some tricky and odd things
enumerator_from_code
enumerator_from_[entire stack]
However, the four higher-order functions are present:
enumerator_map_code
enumerator_map_exec
enumerator_apply_code
enumerator_apply_exec
Still not digging "apply" in those; really they're more like some other combinators I don't know the name of: they "apply" items from the source stack to the elements of the enumerator
, consuming them as they progress. See the examples in the tests, which I worked out by hand just to make absolutely sure things were progressing as expected.
I've got very good test coverage on vector_integer
, since that was the type I used in all the principle testing. I will spend some time in the next few days making sure there are equivalent and comprehensive tests for the other enumerator
types, just in case there are some weird edge cases for strings
or something.
Also, I have not yet refactored the "creator" code into a single higher-order definition. Will do that in a second pass after I have filed the rest of the automated tests, using the tests to confirm that the refactoring doesn't break everything.
Have a look at the codebase please; I've been attaching it to @NicMcPhee's fork (accidentally) since the other night when I committed my branch to his fork instead of my own, and never made the time to shift it over. It's not his fault!
(that said, I am just now about to do a run with this stuff turned on, so... soon)
Later: Yeah, no; I'm going to need some explicit help from somebody telling me how to set up pushgp to actually use
src/clojush/types/enumerators.clj
)src/clojush/instructions/enumerators.clj
)src/clojush/util.clj
)What has to happen? The existing setups make me think some things need to be invoked with require
, others with use
? Very confused.
Trying to catch up quickly on enumerators... just read some of the thread above... looks great.
I think Tom's more recent experience adding types will make it easier for him to give you the definitive list of things that have to happen. But a couple of notes:
use
in Clojure allows the used namespace's symbols to be used unqualified; references have to be qualified when using require
. Some people in the Clojure community consider use
to be evil but I like and often use it (although I appreciate the nay-sayers' arguments).push-types
in clojush.globals
. Furthermore, you'd better do lein clean
(or the equivalent in an IDE) after making this change, because push-state
s are Java records, which causes weird bad behavior e.g. involving persistent class files. So why do we use Java records when Clojure maps would avoid these and other problems? Because they're slightly faster. They're annoying enough that I've been tempted just to use maps anyway, but this is an inner loop optimization that I guess is actually worth the annoyance.Later: Yeah, no; I'm going to need some explicit help from somebody telling me how to set up pushgp to actually use
We generally use :use
instead of :require
. I think what you will need to do is the following:
clojush.instructions.enumerators
to the :use
in the namespace of clojush.pushgp.pushgp
. Note that Clojure has a quicker way of doing this than listing clojush.instructions.X
every time, so adding enumerators
to our current list should be sufficient: [clojush.instructions boolean code common numbers random-instructions string char vectors tag zip return input-output genome enumerators]
.clojush.instructions.enumerators
. Try :use
ing it there and see what happens.util
file should be available anywhere it needs to be, so you shouldn't have to change anything.I'll now take a look at the code and let you know if I spot anything else. Let me know how that goes.
Some questions:
enumerator_next
without moving the pointer? Maybe I'm missing it, or maybe there's a reason not to have it. Otherwise, I'm not sure what the utility of enumerator_forward
and enumerator_backward
are.enumerator_map_exec
and enumerator_map_code
don't actually map things in the traditional (Clojure?) sense of the word -- that is, take one collection and apply a function to each, creating a new collection of those items. What you have is actually extremely close to the iterator instructions that we have for vectors. In other words, it doesn't produce a new collection/vector -- it just lets the code do whatever it wants with each item in the collection. While this can be useful (and indeed, I see it in many solution programs), it doesn't help explicitly with the type of task where you want to map one collection to another -- the program would have to have the mapped code build up the collection from scratch, which is hard work! I'm not sure if there's a good way to make this more like map
, but if so, it could definitely be very useful!assoc
the new pointer value onto the old enumerator. I'm sort of surprised you don't have a helper function for this!Otherwise, looking really cool! I'm excited to give these a whirl.
@thelmuth
emulator_emit
(and pos
) to my lists, because it was obvious... so I totally forgot about itmap
is pretty difficult to implement unless one has detailed type information about both items and functions (instructions). That said, I started building the introspection infrastructure into push-forth
the last time I paid attention to it, including a Types stack, introspection, type-sensitive collection builders, and so forth. I started that path in push-forth
after working with Swift for a while, where every damned thing is so expressive about its finicky little type that it actually kinda grows on you after a bit. And if we did build something like this into Push, if we catalogued the instruction types completely—in and out, like int->bool
or enum(string)+(string->int)->vector(int)
—then we could probably do some real higher-order functions. But without that level of detailed metadata or a centralized catalog for the instructions themselves, we're limited to just sort of throwing them into a bag and watching them rub against one another, I'm afraid.
The other trick I imagine is that one has to fish or get off the can when it comes to using built-in types in the language in which the interpreter is written. While Push interpreters (even all of mine) tend to use host-language primitive types, I think we'd have to hoist the type system up into Push itself instead of borrowing here and there.
@thelmuth Let's talk more about proper map
and other functional instructions. Here are the things I can determine, thinking about the instruction set we have already:
For numeric vector
types, the basic arithmetic would be easy to implement as map
or reduce
functions. The question, in terms of language design, is how to recognize and obtain the particular instructions, without blindly proliferating a new suite of instructions: How would the process described by map integer_add over this vector, taking an integer as the second argument
find the integer_add
?
As I mentioned above, one approach would be to use metadata associated with the instructions themselves to facilitate this: What if a better map
instruction looked at the code
or exec
stack, found an instruction as such, looked for an argument of the right kind to match the types of the instruction... that might work for a lot of them.
The trick here is the same one that interferes with (or complicates) matrix
operations in Push: We need to choose between
integer->integer
function and a vector_integer
accidentally to be next to each other on some (?) stack.The first one seems like it's not especially helpful, and unlikely to be very evolvable. The second one breaks a fundamental paradigm of the instructions we've written so far, which is that all arguments are pulled first and then we do the math. The third one opens a can of worms that makes this more like push-forth
, where this sort of thing is built in (because there is only one stack for every type).
One way of thinking about the second and third paths, I suppose, would be to build a kind of "map closure" or "continuation" object out of a map
instruction and its first (instruction or code
) argument, and then give that the behavior that push-forth
uses to shuffle through the stack looking for a match.
Call this mythic instruction supermap_exec
for now. Suppose it takes a single argument from the exec
stack, which must be an instruction or code
item. It pops the next item on exec
, and if that item isn't "compatible" it swaps position with it in the exec
stack. If the popped item is "mappable", it consumes it and becomes a map_closure
item, still on exec
. When a map_closure
item is executed it pops the next item from the exec
stack, and if it's the right type it applies the map
function to it (one set of arguments taken from the stacks), and we're done. If it doesn't find a compatible second argument in the next position on exec
or wherever, it swaps position with it.
The problem I've seen in push-forth
is that these little continuation form gliders start crawling up the stack and consuming everything in turn. No instruction goes un-executed, but if there's a lot of mismatches there can be a large bolus of continuations drawing up the stack before you're done.
The only tricks are, this feels pretty un-Pushy, and it means there needs to be a lot more metadata associated with instructions (and possible code
items).
Let me walk through an example for clarity:
exec:(supermap 2 4 [8 7 2] integer_mod 8) int:() int_vec:()
exec:(2 supermap 4 [8 7 2] integer_mod 8) int:() int_vec:() ; supermap don't like "2"
exec:(supermap 4 [8 7 2] integer_mod 8) int:(2) int_vec:()
exec:(4 supermap [8 7 2] integer_mod 8) int:(2) int_vec:() ; supermap don't like "4"
exec:(supermap [8 7 2] integer_mod 8) int:(4 2) int_vec:()
exec:([8 7 2] supermap integer_mod 8) int:(4 2) int_vec:() ; supermap don't like "[8 7 2]"
exec:(supermap integer_mod 8) int:(4 2) int_vec:([8 7 2]) ; supermap likes integer_mod!
exec:((map integer_mod SCALAR INT onto INT_VECTOR) 8) int:(4 2) int_vec:([8 7 2]) ; closure formed
exec:(8) int:(2) int_vec:([8%4 7%4 2%4]) ; closure arguments taken from stacks
exec:() int:(8 2) int_vec:([8%4 7%4 2%4]) ; boring old 8
Before reading too deep into your post, I think here's one way we're differing on these thoughts: You seem to want to ensure that the instruction(s) used as the mapping function output the right type and will find the right types. Let me come back to this idea in a second, when you get to [1].
I would personally go down another track entirely. Yes, I would somehow indicate the collection of the output of the map. Not sure exactly how -- there used to be a :types
stack in Push that would be helpful here, but maybe there's a better way. You could have "baked-in" input and output types -- instructions like enumerator_map_integer_vector_to_string
. Maybe this would result in "too many" enumerator instructions. We currently have ~6 enumerable types which would mean 6^2 = 36
such map instructions, which might cross the "too many" threshold.
Another idea that just came to mind is to somehow use epigenetic markers to denote the input/output types for higher-order functions. This might be overkill/weird, but it could also potentially be a nice way to only have 1 map instruction, but be able to have it define any input/output combinations.
[1] Ok, back to this thought. Personally, I'd lean toward letting evolution "figure it out" regarding the correct output types. You give the map instruction a block of code, and if it doesn't leave the correct output on the right stack, tough noogies. I don't know if the penalty would simply be a failing map instruction, or a shorter output vector, or what. Same with a filter
instruction -- if there's no boolean on the :boolean
stack, default to false and don't include that item. It seems like this way would avoid a lot of the complications that you're getting into. Now that I've read more, maybe this suggestion is like your (1) in your list?
Another problem I'm seeing with your suggestion is that you'd (well, I'd) often want to map a block of instructions, not just a single instruction. Your push-forthy ideas seem like they'd be even crazier to do with "more than one instruction functions".
Another crazy thought: what if we had a :function
stack? The functions would be blocks of code with input and output types, and would execute in their own scope as to not disrupt the rest of the program. AHA! This actually sounds like something we already have, believe it or not -- the :environments
stack, which is something @lspector and I were working on year(s) ago, with the purpose of being able to execute a block of code without disrupting other stack contents -- a local scope. We were doing it for other reasons (I think it was geometric semantic crossover?), but this could totally be co-oped for this purpose. I don't have time to think about the ramifications right now, but this could be a pretty elegant solution.
Ugh, I should be working on my defense talk rather than this, but this is so much more interesting. In fact, I should probably put off all of this stuff until after my defense. But, I don't want to put you on hold if now is when you can work on it. Anyway, if I don't respond for a while, that's why. :crying_cat_face:
Dude. DEFENSE.
I just was chatting with @NicMcPhee and I realize you might think I am considering this current branch for submission. This enumerator
type was simply intended as a design spike for a more robust suite of types and instructions that actually do what we discover they should do. It's not going to be submitted for a Pull Request, and all the code will be thrown away! It's just a way of sketching in code, and eliciting points of "resistance" from the codebase and from you folks.
I quite like the idea of a proper higher-order function, along the lines Tom and I are sketching above. But that would be something I'd do in a test-driven way, and using a lot better refactoring techniques. Please recall this is the first Clojure (or Lisp or Java) code I've ever written.
Never use the first code!
It would be cool to have higher-order function instructions like map, reduce, and filter in Clojush. Maybe they'd only be defined on the array-like types (string, vector_integer, vector_boolean, etc.), or maybe they'd somehow be defined on other stacks as well.
The details are vague, and there are some non-trivial things to figure out, but if anyone wants to take this on it would be cool!