lspector / Clojush

The Push programming language and the PushGP genetic programming system implemented in Clojure.
http://hampshire.edu/lspector/push.html
Eclipse Public License 1.0
331 stars 92 forks source link

`code_rand` fails with a warning during execution #125

Open Vaguery opened 9 years ago

Vaguery commented 9 years ago

Over at the ongoing experiment I've got some random code generation and the interpreter nominally working, still as a preliminary for anything interesting. In the course of making and running some random Push programs, I've noticed that the code_rand instruction barfs when executed.

It apparently wants @global-atom-generators not to be an empty list if it's going to work. I can see why, since it seems to want it to contain a collection of random-code generators instead of an empty (atom {}) definition.

That said... nobody in the codebase ever puts anything into @global-atom-generators.

lspector commented 9 years ago

Actually, global-atom-generators does get set when pushgp is called, via reset-globals, which does some clever (evil?) dynamic namespace scraping and symbol manipulation to set this and the other globals. I think Emma Tosch wrote that bit. If you're running programs outside of a call to pushgp then you'll want to do this or something similar in your own client code.

Vaguery commented 9 years ago

Hmm. That'd be a bug, then, since the interpreter can't interpret code_rand instructions without that being part of its setup.

Vaguery commented 9 years ago

So if I undertake to fix it, should I:

  1. change the interpreter's behavior so that it fails silently when code_rand is encountered and @global-atom-generators is empty
  2. change the definition of the instruction so that it simply generates an empty code block (of the specified size?) without complaining
  3. disable the instruction by default
Vaguery commented 9 years ago

Actually @lspector there's nowhere it's set to any value (except the default empty one) in the entire repository.

lspector commented 9 years ago

There's no static code that sets it, but when you execute a call to pushgp it will call reset-globals which will in fact set it, even though that's far from obvious. What reset-globals does is to loop through all of the public symbols in clojush.globals that start with "global-" and set each one to whatever's in the corresponding entry in @push-argmap. This will cause it to reset the value of the global-atom-generators atom, even though that's not explicitly mentioned in this code.

lspector commented 9 years ago

As for possible remedies, I definitely don't like silent failure (1). Re: 2, how can it be empty and of a non-zero size? I guess it could make sense for the resulting code to always be just "()". On 3, do you mean that it acts as a no-op? The rationale for the current behavior is that it's almost certainly a mistake -- if you're calling code_rand then you probably meant to have some atom-generators defined, so the idea is to make this as obvious as possible. I think the bigger issue is that there's no built-in way to initialize the system properly without running pushgp. Maybe we can define a way to do that elegantly.

Vaguery commented 9 years ago

I think the use case you're imagining presupposes people "want" to use this particular instruction out of a growing pile of enigmatic and esoteric ones. I'd be much less surprised if (as in my case) somebody throws everything at a problem. After all, the point of all active research "fine-tuning" of search operators, overall algorithms and identification of "pathologies" is "the particular choice of parameters is hard", right?

There is another possibility, a sort of zero-information approach that struck me as I was peeling carrots just now:

;; (pseudocode)

if global-atom-generators is not empty, pass that into random-push-code
else 
- build a list of all the non-parenthesis tokens in the original script, including ERC literals
- build a list of all (abstract) subtrees, including bare leaves, present in the original script
- create a random (abstract) code tree of the required size by sampling the subtree sizes
- fill the nodes of the tree by resampling the token collection

So for instance if the original script was

((true false -398 4.34375 true) (10.875 (in1 false) integer_pop float_add) (char_eq -8.21875 7.53125 136 false) (return_tagspace -5.78125 vector_string_conj -1.84375 false) (float_rand true char_yankdup -447 11.625) (in1 string_substring -2.875 in2 float_sub) (in1 -118 code_rand true 143) (false 54 false -15.5 in2) (boolean_empty in1 false 6.90625 false) (-46 -271 -7.0 true 222))

the list of tokens is:

-1.84375
-118
-15.5
-2.875
-271
-398
-447
-46
-5.78125
-7.0
-8.21875
10.875
11.625
136
143
222
4.34375
54
6.90625
7.53125
boolean_empty
char_eq
char_yankdup
code_rand
false
false
false
false
false
false
false
false
float_add
float_rand
float_sub
in1
in1
in1
in1
in2
in2
integer_pop
return_tagspace
string_substring
true
true
true
true
true
vector_string_conj

and the set of "abstract subtrees" is

_ (x 50)
(_ _ _ _ _) (x 4)
(_ (_ _) _ _)
(_ _)

The call to code_rand without any value in @global-atom-generators with an integer argument of 17 (for instance) might produce an abstract tree of 17 points like:

( _ ( _ _ ) ( _ _ _ _ _ ) _ ( _ _ _ _ _ ) )

which gets filled in with tokens from the original string by resampling.

That way, there's no chance the random code will contain uninterpretable tokens, and it more or less preserves the kind of code present in the script.

Or, as a much simpler alternative: just a flat tree of 17 tokens resampled (with replacement) from the script.

lspector commented 9 years ago

By "the original script" you mean "the program that was passed to run-push"? If so, then I think I understand what you're suggesting, but it seems sort of unrelated to the intended function of the instruction, which is to push some random code made out of whatever "soup" the client has set up for the system more generally.

You're suggesting that if no soup was provided then A) consider the program passed to run-push to be the soup, and also B) use a different random code generation algorithm. I guess A is makes some sense, although I don't understand why to bother with B. And in any event, if you want a shuffle of the program passed to run-push then my instinct would be to provide another instruction for this sort of thing (which would also require storing run-push's argument for this purpose), but not to use an existing instruction for something else.

I think that a more reasonable approach would just be to ensure that there's always a defined soup, by doing the right kind of system initialization even if not in a call to pushgp.

Vaguery commented 9 years ago

I agree. Let me see what I can offer in a few days on that.