opencog / asmoses

MOSES Machine Learning: Meta-Optimizing Semantic Evolutionary Search for the AtomSpace (https://github.com/opencog/atomspace)
https://wiki.opencog.org/w/Meta-Optimizing_Semantic_Evolutionary_Search
Other
38 stars 29 forks source link

Grammar-guided program evolution #103

Open linas opened 1 year ago

linas commented 1 year ago

This is an idea from @robert-haas on a discord channel.

The original moses created combo trees out of the arithmetic ops (plus, minus, times, divide) the boolean ops (and/or/not) and a few others (greater-than, etc...) The allowed ways in which these can be mutated was hard-coded in an adhoc manner. For example, you can only decorate bools with boolean knobs, and contins with contin knobs, etc.

The current as-moses does the same, except not its atomese and not combo. The knob decoration is still adhoc, hard-coded. If as-moses doesn't know about some op, it can't deal with it (for example -- hyperbolic tangent -- you could hack as-moses to add that, but it would be a hack.The next function to come along would be yet another hack.)

The suggestion is to replace the knob-decoration code with a formal specification of a grammar. There would be a grammar, that defines exactly what a "valid program tree" is. New program trees can be created and mutatated, only if they obey the rules of the grammar.

This would allow as-moses to create and mutate trees in any problem domain, and not just arithmetic+bool.

This is interesting, because some of the problem domains are audio and video, and some of the new kinds of ops include lo-pass filters, high-pass filters, chirp filters, squelch filters, etc. It's hard/awkward to try to boil these down to just arithmetic+bools.

robert-haas commented 1 year ago

Hey, here's most of what I've been doing with grammar-guided genetic programming (G3P) so far: https://github.com/robert-haas/alogos

linas commented 1 year ago

Agree that 99% of cpu time is in evaluation, and not in the mutation of the trees. However ...

Writing code to export atomese to your system, and then re-importing the results is going to be far more complex and fragile and buggy, than simply doing the mutation in-house, directly. Rainbow of reasons:

linas commented 1 year ago

Off-topic, but I'm also heavily biased: I spent several years with moses. I learned several key things:

robert-haas commented 1 year ago

That's a very convincing list of reasons. I'm looking forward to see the growth of as-moses into this direction!

Some general insights I gained with alogos so far:

Summed up: Without better empirical data, I'd by default go with an evolutionary algorithm with a "large" population size (not <50, better 100-500, for tricky problems even in the 1000s), a simple diversity preservation mechanism (e.g. not allowing more than n copies of a candidate solution in survivor selection), and as search operators mutation & crossover & tournament selection.

linas commented 1 year ago

Let me reply piece-wise. Grammar. There is not one, but two: In "classical" moses, the first grammar is the collection of the legal ways in which the arithmetic ops (+ - * /) can be combined with constants (0,1, 2), variables (x,y,...) relations (<, =, >) and bools (and,or,not, T, F) and misc funcs (impulse, cosine, exp, log...) So that is the grammar of the program trees.

Then there is the second grammar: this is the (implicit) grammar of the dataset that is be analyzed/understood. That grammar is unknown and needs to be discovered.

I'm interested in the first grammar only so much as it helps me create valid program trees. Otherwise, its not all that interesting. Most of my effort is focused on discovering/discerning the second grammar.

The uniform distribution of mutation points is .. interesting. The original moses had something called "EDA" -- estimated distribution of algorithm -- but it was ripped out cause it was painfully slow. Maybe it should be restored? It attempted to assign some kind of Bayesian(?) probability to which parts of the program tree were "good" in some way. I don't recall. It's in Moshe Looks thesis. I haven't read the rest of your post yet, seems you get into this...

linas commented 1 year ago

Single-solution based metaheuristics (e.g. a hill climber or simulated annealing) seem to perform not as well as population based metaheuristics

FYI, In moses, we generate a handful of starter individuals. Then pick one. Hill climb for a while, until hill climbing slows down/stalls. Then pick another, and repeat. When overall progress stalls, do some cross-over, then hill climb some more. Keep a population of a few hundred of the best ones. Pick the next one(s) to hill-climb or cross-over with a weighted random pick. The weighting has two parameter penalties: one for overall score, and another for overall complexity. So lower scoring individuals are less likely to be picked. And more complex individuals are less likely to be picked.

In almost any real-world application you'll want to perform multiple runs to drastically improve the probability of finding a sufficiently good result. A restart is often better than spending more time in a search that got itself stuck in some corner of the search space (not one local optimum, but some wider basin of bad solutions).

I never-ever saw this. Or rather, I did, early on, and promptly realized the core problem: the selective pressure was way too high. The solution was to enlarge the population, lower the pressure. So, basically, to "automatically run multiple runs". Duhh. Why do it by hand, when you can do it automatically?

linas commented 1 year ago

There are many ideas in literature

The point here is NOT to use this tech to find a super-whiz-bang program that fits large datasets with high quality. That cannot be done with current algos. Not possible. And anyway, the DL/NN codes do this a zillion times better! Give up on the simplistic dream of evolving a complex program that "fits the data".

All I want out of this is a so-so, mediocre, better-than-random-chance description. I have completely different, unrelated algos to take other bigger, deeper steps. So I want to use MOSES/program-evolution only as a kind of information amplifier, to improve classification only a litle bit -- factors of 2x, 4x, 8x, if more, great, but I'm not asking for much. Just mediocre is enough for what I need. Later stages are used to revise to get better results.

Here's an example, it might be the first place I'll use this stuff. I want to use it for for text segmentation (c.f @akolonin). Anton has characterized 3-4 different text segmentation algos: "transition freedom", "conditional probability", "mutual information", and some others. These are just .. algos. They are really pretty simple algos -- not complicated. As program trees, they are shallow. So I want to automate the generation of text segmentation algos. As a starting point, it would include the ones Anton considered. Then I want to mutate, evolve, adjust these automatically, including automatically tuning parameters. The result does not have to be "great segmentation", just "good-enough".

Then, outside of the above steps, there is a distinct step of statistical data gathering, and construction of a lexis, a discovery of a simple grammar. This step, this second step, will be a much better tokenizer than the one that the genetic programming found. However, it needs to be pumped with some initial data that is not totally random, but is at least sort-of OK. So I want to use the genetic programming to find something sort-of OK, so I can feed it into this second step.