opencog / moses

MOSES Machine Learning: Meta-Optimizing Semantic Evolutionary Search. See also AS-MOSES https://github.com/opencog/asmoses but kept to guaranty backward compatibility.
https://wiki.opencog.org/w/Meta-Optimizing_Semantic_Evolutionary_Search
Other
128 stars 84 forks source link

Create a subclass of field_set without continuous representation as variable-length sequences of bits. #10

Open ArleyRistar opened 9 years ago

ArleyRistar commented 9 years ago

By recommendation of Nil, I'm opening a issue initially discussed on the Slack moses channel. Summary of the situation: I'm adding particle swarm to the optimization part of moses, and representing continuous knobs as variable-length sequences of bits is causing loss of information and ps doesn't need this discrete representation. As discussed in the channel, I was thinking of change this representation just inside particle swarm, but there's the problem of returning the deme to the metapopulation that would lose information in the same way. So Ben said that this design "doesn't really need to persist".

I was thinking that I could "create a modified version of field_set overriding the contin parts to use 1 packet_t as float and use this modified version only with particle swarm". I still don't know how it can affect other parts like the metapopulation, or if it can be done easily. Because of that i'm opening this issue looking for opinions.

Have in mind that will use a little more ram for each instance with contin parts, but with a better precision and less processing.

linas commented 9 years ago

Hi Arley,

... and Nil,

I think we need to fix contin_t for "ordinary" moses, not just particle-swarm moses.

How about this idea: we store a full-length float or double in the deme, AS WELL AS the contin_t. The value of the double is used in the actual combo tree, whereas the contin_t is used only during knob-turning, to store the current variance.

Thus, during knob turning, the contin_t is used to generate a random double-precision number, with a PDF centered at zero, and a variance equal to the contin_t value. This random offset is then added to the double-precision number in the deme. (The random offset is stored in the instance, or maybe the sum of the value and the offset is stored .. ). Thus, knob-turning generates a random walk. (During knob turning, the variance is also either made smaller, is kept as-is, or is increased).

The above is how we should do floats in "ordinary", non-particle-swarm MOSES.

For the particle-swarm moses, you may want to do knob turning differently (I don't really understand how), but the above mechanism will at least allow you to store the full-length double-precision number in both the instance and in the deme.

bgoertzel commented 9 years ago

yah -- This seems like a reasonable approach for now.. any thoughts from Dr. Geisweiller?

ben

On Tue, Jul 7, 2015 at 10:29 AM, Linas Vepštas notifications@github.com wrote:

Hi Arley,

... and Nil,

I think we need to fix contin_t for "ordinary" moses, not just particle-swarm moses.

How about this idea: we store a full-length float or double in the deme, AS WELL AS the contin_t. The value of the double is used in the actual combo tree, whereas the contin_t is used only during knob-turning, to store the current variance.

Thus, during knob turning, the contin_t is used to generate a random double-precision number, with a PDF centered at zero, and a variance equal to the contin_t value. This random offset is then added to the double-precision number in the deme. (The random offset is stored in the instance, or maybe the sum of the value and the offset is stored .. ). Thus, knob-turning generates a random walk. (During knob turning, the variance is also either made smaller, is kept as-is, or is increased).

The above is how we should do floats in "ordinary", non-particle-swarm MOSES.

For the particle-swarm moses, you may want to do knob turning differently (I don't really understand how), but the above mechanism will at least allow you to store the full-length double-precision number in both the instance and in the deme.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-119050858.

Ben Goertzel, PhD http://goertzel.org

"The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man." -- George Bernard Shaw

ngeiswei commented 9 years ago

@linas the current state of the code already nearly does that, a contin_spec contains a mean and any knob turning is centered around that mean. However the variance is replaced by the expansion combined with the step, so if you tweak the contin trits (Left, Right, Stop) you get some distribution, let's call it a CTD (for Contin Trits Distribution), it's not a Gaussian, but maybe it approaches one.

Anyway, I agree that the distribution should not be limited to a CTD. My advice would be to add or upgrade the neighborhood sampling methods to take in argument a distribution, could be a Gaussian or otherwise.

I don't think you need to change anything else. And here's why, candidates in the deme are represented as instances

typedef std::vector<packed_t> instance;

see instance.h

If you were to represent the contin_spec directly as a float you'd still need to hack the code to en/decode a contin into an instance (see field_set::pack in field_set.h). It's not impossible, but it adds a coding burden to our student. And if you run out of resolution you can always increase the contin_spec depth, so why bother.

So to sum up, yes ps doesn't need this contin trits representation, but MOSES does. Ultimately I don't think this en/decoding of candidates into instances is good. It has the advantage of saving a bit of RAM at the expense of making the code more rigid and complex. But getting rid of it is probably too much work for this project.

I'll get back with more details about what should be hacked to enable neighborhood sampling with any distribution, not just CTDs.

ngeiswei commented 9 years ago

Hmm, on the other hand converting a float into bytes is trivial, and it would save some CPU time.

Anyway, you'd still need to upgrade the neighborhood sampling to support any distribution, those 2 issues are really independent. So I suggest

  1. Upgrade the neighborhood sampling distribution to support any distribution.
  2. If it turns out manipulating large contin with the current trit representation is a bottleneck, then hack the contin_spec and en/decoding methods to support float representation.
ngeiswei commented 9 years ago

@arleyristar The code to hack to support 1. is definitely in

moses/moses/moses/neighborhood_sampling.{h,cc}

you may start to study it, till I come back with more information.

ngeiswei commented 9 years ago

At that point I really need to understand exactly what you want to do and how. There are several ways to go about that problem, like the distribution could be passed to the sampling function (sample_from_neighborhood, then generate_contin_neighbor or another new one), or it could be inside the contin_spec.

Also there is a problem I had not thought about, the mean, in the contin_spec, remains the same during the deme optimization, it cannot be changed because otherwise the contin values corresponding to the instances will not be correctly reconstructed. So if you really want to have the mean change during deme optimization we need to address that.

ngeiswei commented 9 years ago

OK, I presume the code so far is here

https://github.com/arleyristar/moses/commits/master

Please point to me as well any relevant doc.

ArleyRistar commented 9 years ago

Nil, i'm seeing the neighborhood sampling now, i'll try to solve this first problem.

In your penultimate email, i'm assuming that it was for me. I'll see how can i do that, but leaving the distribution inside contin_spec it's what makes more sense to me for now. As for changing the mean, don't worry, what i said in the channel about changing the mean was just a option that i thought before Ben say that this representation doesn't need to persist. Solving the problem 2, ps can use the float representation.

Yes, the code for ps is in there. I didn't made any outside documentation yet, most of the things i was putting as commentaries inside the code or posting in the channel.

On Tue, Jul 7, 2015 at 10:57 AM, ngeiswei notifications@github.com wrote:

OK, I presume the code so far is here

https://github.com/arleyristar/moses/commits/master

Please point to me as well any relevant doc.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-119209634.

ArleyRistar commented 9 years ago

From the things I thought this seems a acceptable idea for 1; forgive me if I misunderstood what i was supposed to do or if this is a bad idea.

Nil, we could maintain neighborhood sampling as it is now and change contin_stepper and contin_spec. Not even get_contin and set_contin from field_set would change or by just a little (for 1).

Let me explain in details: For now we have a behavior similar to this: https://code.google.com/p/moses/wiki/ModelingAtomSpaces For example, with mean = 0, expansion = 2 and S=Stop, L=Left, R=Right: S -> 0 LS -> -2 step = 4 RS -> 2 step = 4 LLS -> -6 step = 8 LRS -> -1 step = 0.5 RLS -> 1 step = 0.5 RRS -> 6 step = 8 ... It has a better precision next to it's mean, and it kind of resembles a gaussian to me.

My idea is to modify contin_stepper left() and right() to have a behavior similar to the link, but between [0,1]. In this case it would start with 0.5 as mean and decreasing step_size by half, starting with 0.25.

Applying this contin_stepper to the actual LRS representation would give us a probability that could be applied to a (cumulative) distribution function. The distribution function would be stored inside contin_spec in the place of mean and step_side (depth has to stay to use in the stepper).

I could write some distributions that would be choose as a parameter (leaving normal as default) and for the parameters, it's given in the construction of the contin_spec, mean and expansion (as variation but this last would be better if if bigger).

That makes sense or should i think of other solution?

On Tue, Jul 7, 2015 at 12:52 PM, Arley Ristar arley@ristar.me wrote:

Nil, i'm seeing the neighborhood sampling now, i'll try to solve this first problem.

In your penultimate email, i'm assuming that it was for me. I'll see how can i do that, but leaving the distribution inside contin_spec it's what makes more sense to me for now. As for changing the mean, don't worry, what i said in the channel about changing the mean was just a option that i thought before Ben say that this representation doesn't need to persist. Solving the problem 2, ps can use the float representation.

Yes, the code for ps is in there. I didn't made any outside documentation yet, most of the things i was putting as commentaries inside the code or posting in the channel.

On Tue, Jul 7, 2015 at 10:57 AM, ngeiswei notifications@github.com wrote:

OK, I presume the code so far is here

https://github.com/arleyristar/moses/commits/master

Please point to me as well any relevant doc.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-119209634.

linas commented 9 years ago

On Tue, Jul 7, 2015 at 8:17 PM, ngeiswei notifications@github.com wrote:

@linas https://github.com/linas the current state of the code already nearly does that,

Arghh... yes but no. I understand perfectly well how contin works in moses, and basically, it sucks. It needs a major revamp. I used it a lot, and eventually gave up, because it behaved so badly. I think I know why it mis-behaves, and the sketch I made was for an approach that I think could significantly improve it.

The contin_t, as currently designed, attempts to use the center-points of the Cantor set, which, in a certain naive sense, does have a uniform distribution over the reals, but it fails for practical machine learning problems. The problem is that the vertexes in the Cantor tree will almost never be close to the actual value you are trying to learn, and it takes too many iterations to get to the vertex that does fit your data. e.g trying to converge to 1.333 by using 1.0 1.5 1.25 1.375 ... is ineffective and slow. It is far too many steps for MOSES to get the right value. Its even worse if you want learn more than one float.

I am kind-of guessing that storing full 64-bit floats, and using a random walk will converge faster than dyadic Cantor tree walks. I can't prove that it will converge faster, but contin_t works so poorly that almost anything will be better.

linas commented 9 years ago

p.s. what you call a "contin trit" is called a "Cantor tree" in the general world. http://blogs.scientificamerican.com/critical-opalescence/files/2012/08/f5.jpg The cantor tree unfolds into the "modular group" and there is all sorts of neat stuff there.

ArleyRistar commented 9 years ago

So, from Linas idea and issue 2 from Nil: contin_spec and contin_stepper won't be used and are almost useless now, it doesn't make sense maintain it in field_set. get_contin and set_contin has to change to insert the float (contin_t is already a double) inside instance. Change generate_contin_neighbor to do the random walk. Plus a lot of minor updates.

While this isn't decided, I'll be fixing other things in ps.

On Wed, Jul 8, 2015 at 1:31 AM, Linas Vepštas notifications@github.com wrote:

p.s. what you call a "contin trit" is called a "Cantor tree" in the general world. http://blogs.scientificamerican.com/critical-opalescence/files/2012/08/f5.jpg The cantor tree unfolds into the "modular group" and there is all sorts of neat stuff there.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-119426959.

ngeiswei commented 9 years ago

@linas, if you update the contin neighborhood sampling you don't have to iterate 1.0 1.5 1.25 1.375 ... to get to 1.333. You just set the contin_spec with enough depth and you can set the value 1.333 right away.

@arleyristar contin_spec is certainly useful, but could indeed be greatly simplified, both in code and performances, if the cantor tree representation was replaced by a float representation. Although there is still one area the cantor tree representation wins. If the resolution of the contin values turn out to be low then it takes much less memory, a few bits instead of a systematic 32 bits. It's probably not worth it but useful to keep in mind, and again you don't need to change the contin_spec code to allow to sample with other distributions (unless you put the distribution in the contin_spec of course), as I explain above.

ngeiswei commented 9 years ago

@arleyristar you cannot change the mean, the expension or the step of a contin_spec, once constructed, or at least it wouldn't make much sense because in order reconstruct the contin value given the packed bit string you need those, if they change while optimizing the deme the contin values on the candidates will also change, so the older candidates won't have their contin values correctly reconstituted.

That's why I want to understand exactly how you're algorithm is gonna work, I'm currently reading

https://en.wikipedia.org/wiki/Particle_swarm_optimization

let me know if you follow that, or some alteration of it (this will help me to understand your code when I start reviewing it).

ngeiswei commented 9 years ago

@arleyristar I'm only 1/3 way through reviewing your code, but I think you don't need to change the neighborhood sampling functions, all sampling code is taking place in the particle_swarm struct, and that makes sense given its peculiarities (you don't sample the neighborhood directly, you sample the coefficients to construct the velocities).

So I don't think you need to do 1. or 2. All you need is to increase the depth of the contin_spec. If that becomes a performance bottleneck then you can work toward improving contin_spec, etc. But if the performance overhead is negligible, you don't need to bother (unless your really want to).

Anyway, I'm keeping reviewing, I may change my mind once I see the whole thing.

ngeiswei commented 9 years ago

@arleyristar I'm re-reading your first comment when you created that issue. You got it all. Forget my comment about modifying the neighborhood distribution, I got there because of Linas initial comment and my lack of understanding of ps. So I stand by my comment right above, that can be synthesized into these questions

  1. Have you tried to set the depth high enough so that the loss of information isn't an issue?
  2. Does it create a performance bottleneck?

If the answers are yes, then I think what you suggest is the correct hack, replace the cantor tree by float binary rep. Of course before we go there, we want to do things well, record the performances of a bunch of runs before, to compare with after the code optimization, etc. And I'll probably want to review the contin_spec once more, just in case I see something simple and bright to do. But first things first, answer those 2 questions.

ngeiswei commented 9 years ago

For instance if you could plot

  1. Performance
  2. Run-time
  3. RAM

w.r.t. depth (varying from say 5 to 32), for one or more optimization problems that you know ps is supposed perform OK, that would definitely help a lot to assess what should be done next.

Ideally you would even repeat that over several random seeds, say 10, to cancel out noise, get smoother charts.

Once you know the kind of depth you need, you may run that into a profiler like valgrind, so you can get an idea of how much performance is wasted by the cantor tree representation.

ArleyRistar commented 9 years ago

I'll change the code to increase the depth and test with it as Nil said. I'll post the results here, but it can take a while (give me 1 day).

On Thu, Jul 9, 2015 at 6:45 AM, ngeiswei notifications@github.com wrote:

For instance if you could plot

  1. Performance
  2. Run-time
  3. RAM

w.r.t. depth (varying from say 5 to 32), for one more more optimization problems that you know ps is supposed perform somewhat OK, that would definitely help a lot to assess what should be done next.

Ideally you would even repeat that over several random seeds, say 10, to cancel out noise, get smoother charts.

Once you know the kind of depth you need, you may run that into a profiler like valgrind, so you can get an idea of how much performance is wasted by the cantor tree representation.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-119891670.

ngeiswei commented 9 years ago

It may take you a few days even, but it's worth it, not just for ps. Thanks!

linas commented 9 years ago

Hi Nil, Arley,

I have to retract/modify my earlier comments as well; they were based on a misunderstanding of sorts. So, what I meant to say was this:

-- I am very excited by the PSO work; I think it will be an excellent way of handling float-point numbers in moses. My grumbling was less about the contin representation itself, than it was with how the the hill-climbing algo treated it: hill-climbing by twiddling one binary bit at a time of a contin was just terribly slow, inefficient, awful.

-- Yes, contins offer a small, compact way to store floats or doubles. I guess its harmless to use them in the combo trees: you just need to write two functions (maybe they exist??) to convert double to the contin rep, and back. As long as you save enough digits, it should not be a problem.

-- If contins are replaced in the combo trees, then I strongly recommend using 64-bit floats (i.e. doubles), do NOT mess with 32-bit floats. Messing about with 32-bit floats is not worth the effort. Its a bad idea.

-- Nil, the way I currently understand it is as follows: every combo tree would have one to hundreds of doubles in it. If the metapop has 10K trees in it, then there would be 10K trees/metapop * 200 doubles/tree * 8 bytes/double = 16MBytes of doubles in the metapop: hardly a big amount.

-- During a single PSO step, we pick just one tree, and so PSO only needs to optimize 1 to a few hundred doubles. If you have 1K particles per swarm, that is about 200K doubles per PSO step, which again is not very big, and might even fit in modern-day CPU caches.

So I see no problem at all with storing doubles everywhere. Done right PSO will be awesome for moses. I'm actually excited, now that I think I understand it.

ngeiswei commented 9 years ago

Linas, you wrote,

"-- Yes, contins offer a small, compact way to store floats or doubles. I guess its harmless to use them in the combo trees: you just need to write two functions (maybe they exist??) to convert double to the contin rep, and back. As long as you save enough digits, it should not be a problem."

Yes they exist, field_set::get_contin and field_set::set_contin, but in fact you can directly en/deconde a contin via dereferencing the contin_iterator.

Yeah, you're right, the extra RAM will probably be negligible. I'd still be interested in seeing how increasing the depth would help the quality of the results, and whether this en/decoding is really hurting run-time performance, though. I mean it's nice to know the actual gain of changing a piece of code (although the code simplification is a gain by itself).

linas commented 9 years ago

OK, sure. My knee-jerk reaction is "simplify the code" and remove contin entirely, but I'm tired just right now and don't really care. :-) Do the right thing.

ArleyRistar commented 9 years ago

TL;DR update: Problem, worse performance, don't expect good results from what Nil asked. This weekend i'll try to use Valgrind.

Detailed update: I don't have the results that Nil asked because i'm running in some problems that i'll share with you:

The implementation in the last commit has a "mistake", it almost always initialize the particle with a really big value [double min, double max] and when i get the result (after the conversion) it always has the value of the mean (0 or 1 in this case); and it happens after the update too.

It's clearly wrong, but this way it runs very quickly, like 1000 evals in 3 sec (i7 4500U, predicates m1000). When i changed to a start with smaller values between [-30, 32](range when depth is 5) i got the result right (in fact the conversion works right with anything a little minor than 3E+17, it gives 32 that is the limit, more than that it returns the mean value).

The thing is that it's taking a much longer time now (45s) and when i apply a restriction to the values to not happen a wrong conversion (-2.8e+17<x<2.8e+17) or a simply [-30, 32] it's becoming worse in time (150+s).

The worse thing? it doesn't happen because of the particle-swarm update of the particles, the bottleneck is when it gets the score (that is coded exactly like the hill climbing) AND it almost didn't improved in results because it depends of the discrete part too and it always fall in local maximum.

Changing the problem didn't help either (iris m2000 has very little contin %): hc: score - 48.79, time: 78s ps: score - 100, time: 90s (42s when it was wrong)

Obs: When i change set_contin to do nothing, it gets 3s in predicates -m1000 (iris m2000 44s) even with the limit so it's almost certain that using set_contin all the time is spending too much processing.

Nil don't expect much good results from what you asked. This weekend i'll give a look at Valgrind (i don't know how to use it yet, i'm used to just use time) and return with a better detailed experiment.

On Thu, Jul 9, 2015 at 12:11 PM, Linas Vepštas notifications@github.com wrote:

OK, sure. My knee-jerk reaction is "simplify the code" and remove contin entirely, but I'm tired just right now and don't really care. :-) Do the right thing.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-120024176.

linas commented 9 years ago

What is the test problem that you are running? --linas

On Sat, Jul 11, 2015 at 9:53 AM, arleyristar notifications@github.com wrote:

TL;DR update: Problem, worse performance, don't expect good results from what Nil asked. This weekend i'll try to use Valgrind.

Detailed update: I don't have the results that Nil asked because i'm running in some problems that i'll share with you:

The implementation in the last commit has a "mistake", it almost always initialize the particle with a really big value [double min, double max] and when i get the result (after the conversion) it always has the value of the mean (0 or 1 in this case); and it happens after the update too.

It's clearly wrong, but this way it runs very quickly, like 1000 evals in 3 sec (i7 4500U, predicates m1000). When i changed to a start with smaller values between [-30, 32](range when depth is 5) i got the result right (in fact the conversion works right with anything a little minor than 3E+17, it gives 32 that is the limit, more than that it returns the mean value).

The thing is that it's taking a much longer time now (45s) and when i apply a restriction to the values to not happen a wrong conversion (-2.8e+17<x<2.8e+17) or a simply [-30, 32] it's becoming worse in time (150+s).

The worse thing? it doesn't happen because of the particle-swarm update of the particles, the bottleneck is when it gets the score (that is coded exactly like the hill climbing) AND it almost didn't improved in results because it depends of the discrete part too and it always fall in local maximum.

Changing the problem didn't help either (iris m2000 has very little contin %): hc: score - 48.79, time: 78s ps: score - 100, time: 90s (42s when it was wrong)

Obs: When i change set_contin to do nothing, it gets 3s in predicates -m1000 (iris m2000 44s) even with the limit so it's almost certain that using set_contin all the time is spending too much processing.

Nil don't expect much good results from what you asked. This weekend i'll give a look at Valgrind (i don't know how to use it yet, i'm used to just use time) and return with a better detailed experiment.

On Thu, Jul 9, 2015 at 12:11 PM, Linas Vepštas notifications@github.com wrote:

OK, sure. My knee-jerk reaction is "simplify the code" and remove contin entirely, but I'm tired just right now and don't really care. :-) Do the right thing.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-120024176.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-120562427.

ngeiswei commented 9 years ago

@arleyristar I'm wondering if the super fast run-times aren't due to the cache. If the candidates are idiotically identical that could turn most evaluations into cache hits. Can you run your experiment with the debug logging level and send it to us?

ArleyRistar commented 9 years ago

I was running with predicates.csv -W1 -u pred -m1000. I'll send, but I didn't know that were a cache in the evaluations so it appears to be exactly it that is happening.

On Mon, Jul 13, 2015 at 10:43 AM, ngeiswei notifications@github.com wrote:

@arleyristar https://github.com/arleyristar I'm wondering if the super fast run-times aren't due to the cache. If the candidates are idiotically identical that could turn most evaluations into cache hits. Can you run your experiment with the debug logging level and send it to us?

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-120932114.

ArleyRistar commented 9 years ago

I forgot to post here too: https://opencog.slack.com/files/arleyristar/F086CPMUK/Initial_experiments_for_PS_and_HC_in_continuous_problems_ or read below. Initial experiments for PS and HC in continuous problems.

For resume version jump to 3.4. General.

  1. Methodology:

First, the experiments were divided in two groups: without reduction (-B 0 -E 1 parameters) and with reduction; it's separated this way because the reduction part affects the particle swarm (PS) too much (clean_combo_tree uses 99.73% with reduction in PS). Besides that, the experiments were split in three parts: analysis of the memory, processing and time. The memory analysis was done using the valgrind tool massif running for 4 values of depth {5(default), 16, 32, 128} and for PS and hill climbing (HC). The processing analysis was done in the same way of the memory, but using the tool callgrind. The time analysis was done in a different way, using a simple python script to measure the mean time of seeds between 1 and 10 running between depth 2 and 32 for PS and HC. The dataset used was predicates.csv because it is a simple one that is already inside the examples folder of Moses; It was running for a maximum of 1000 evaluations, a small value because of the valgrind tools that increase the program time in 20x for massif and between 20x and 100x for callgring [1] but it's a sufficient amount of evaluations to test (HC in some seeds can find the best solution in less than 1000 evals).

  1. Results description:2.1. Memory:
    • PS with reduction spend more memory than without.
    • Increasing in the depth of PS, increases the memory used, but not that much, sometimes it can decrease, because it can find better results that are less complex.
    • HC without reduction reaches the ram limit independently of depth.
    • HC with reduction uses far less memory, even less than PS without reduction.

2.2. Processing:

2.3. Time:

2.4. Others:

I don't think that memory is a problem, even very complex problems with thousands of instances will not reach the amount of ram that the computers today have, and even if do, there's always feature selection; but let's take a look: Variable-length representation uses a amount of 28B (4B depth, 8B mean, 8B step_size, 8B expansion) + ((1/4B * depth) * num_instances) for each dimension in a deme. While a double representation uses 8B * num_instances. So using variable-length representation with a small depth for a larger number of instances still makes sense for RAM economy. But i'm still felling that this way could be better in 2 forms. 1- It uses 2 bits to represent left, right and stop (0,1,2), and it don't use 11 (3). 2- 5 depth can represents +-30 (with LLLLS and RRRRS), why not use until LLLLL or RRRRR if the get/set contin function limits the for with the depth too? 3.2. Processing:

With variable length representation, processing is just a problem with PS, and just if it have a bigger depth. In all cases the clean_combo_tree and merge_demes are the ones that use more processing and the specific part of HC or PS are small (PS is bigger just because of set_contin and get_contin). In the PS i'm having to call get_contin a lot of times (to get the best values), and it isn't helping. To solve it i'll have to store the values in doubles, that in the end will lose the benefices of using the variable length representation. 3.3 Time:

I don't know how the cache of combo trees reduction works, but it seems to work best with HC where each one in a eval are very close. In the initial phase of PS the instance are very different one of the other. I would say that PS just is running quicker without reduction because it returns a deme smaller than HC because it doesn't include the worst solutions. 3.4. General:

The variable length representation don't have that much impact in memory or performance, it just have when using PS and for complex problems. Even so, I still find better to change it to a double representation, to get a little better performance, with the possibility to search in a bigger space

and have a less complex code at the cost of a little more ram and my work.

Detailed results data:

  1. PS without reduction:1.1. Massif:
    • depth 5: peak 1.2842 MiB = 0.168322662 MB
    • depth 16: peak 1.3022 MiB = 0.170681958 MB
    • depth 32: peak 1.1866 MiB = 0.155530035 MB (leads to better solutions, and than, less complexity)
    • depth 128: peak 1.5212 MiB = 0.199386726 MB

1.2. Callgrind:

1.3. Time (s):

2.2. Callgrind:

2.3. Time (s):

3.2. Callgrind:

3.3. Time (s):

4.2. Callgrind:

4.3. Time (s):

For even more detailed result (the callgrids and massifs outputs or time with results) you can just ask me, i'll not but it in the github, it's too big and don't need to be there.

[1] See valgrind site: http://valgrind.org/info/tools.html

On Mon, Jul 13, 2015 at 10:59 AM, Arley Ristar arley@ristar.me wrote:

I was running with predicates.csv -W1 -u pred -m1000. I'll send, but I didn't know that were a cache in the evaluations so it appears to be exactly it that is happening.

On Mon, Jul 13, 2015 at 10:43 AM, ngeiswei notifications@github.com wrote:

@arleyristar https://github.com/arleyristar I'm wondering if the super fast run-times aren't due to the cache. If the candidates are idiotically identical that could turn most evaluations into cache hits. Can you run your experiment with the debug logging level and send it to us?

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-120932114.

ArleyRistar commented 9 years ago

Update from what we talked about in the Moses channel.

For now on i'll make my updates here to avoid any confusion.

On Mon, Jul 27, 2015 at 9:38 AM, Arley Ristar arley@ristar.me wrote:

I forgot to post here too:

https://opencog.slack.com/files/arleyristar/F086CPMUK/Initial_experiments_for_PS_and_HC_in_continuous_problems_ or read below. Initial experiments for PS and HC in continuous problems.

For resume version jump to 3.4. General.

  1. Methodology:

First, the experiments were divided in two groups: without reduction (-B 0 -E 1 parameters) and with reduction; it's separated this way because the reduction part affects the particle swarm (PS) too much (clean_combo_tree uses 99.73% with reduction in PS). Besides that, the experiments were split in three parts: analysis of the memory, processing and time. The memory analysis was done using the valgrind tool massif running for 4 values of depth {5(default), 16, 32, 128} and for PS and hill climbing (HC). The processing analysis was done in the same way of the memory, but using the tool callgrind. The time analysis was done in a different way, using a simple python script to measure the mean time of seeds between 1 and 10 running between depth 2 and 32 for PS and HC. The dataset used was predicates.csv because it is a simple one that is already inside the examples folder of Moses; It was running for a maximum of 1000 evaluations, a small value because of the valgrind tools that increase the program time in 20x for massif and between 20x and 100x for callgring [1] but it's a sufficient amount of evaluations to test (HC in some seeds can find the best solution in less than 1000 evals).

  1. Results description:2.1. Memory:
    • PS with reduction spend more memory than without.
    • Increasing in the depth of PS, increases the memory used, but not that much, sometimes it can decrease, because it can find better results that are less complex.
    • HC without reduction reaches the ram limit independently of depth.
    • HC with reduction uses far less memory, even less than PS without reduction.

2.2. Processing:

  • Variable length representation affects PS without reduction too much depending of the depth (15% at depth 32 vs. 2.7% in depth 5).
  • Affects the PS with reduction in a similar way, but with a percentage much lower, because of the reduction.
  • The depth didn't affect hill climbing at all, mostly because it found a solution that needs less than depth 5.

2.3. Time:

  • PS without reduction is much quicker than with reduction.
  • HC without reduction is slower than with reduction.
  • PS without reduction is quicker than HC without reduction.
  • PS with reduction is very much slower than HC with reduction.

2.4. Others:

  • Increasing the depth in PS, it returns better results, but it's stuck in the discrete maximum local anyway.
  • The results are not affected increasing the HC depth, maybe because this problem didn't need precision or a bigger value to reach the max. global.
    1. Conclusion:3.1. Memory:

I don't think that memory is a problem, even very complex problems with thousands of instances will not reach the amount of ram that the computers today have, and even if do, there's always feature selection; but let's take a look: Variable-length representation uses a amount of 28B (4B depth, 8B mean, 8B step_size, 8B expansion) + ((1/4B * depth) * num_instances) for each dimension in a deme. While a double representation uses 8B * num_instances. So using variable-length representation with a small depth for a larger number of instances still makes sense for RAM economy. But i'm still felling that this way could be better in 2 forms. 1- It uses 2 bits to represent left, right and stop (0,1,2), and it don't use 11 (3). 2- 5 depth can represents +-30 (with LLLLS and RRRRS), why not use until LLLLL or RRRRR if the get/set contin function limits the for with the depth too? 3.2. Processing:

With variable length representation, processing is just a problem with PS, and just if it have a bigger depth. In all cases the clean_combo_tree and merge_demes are the ones that use more processing and the specific part of HC or PS are small (PS is bigger just because of set_contin and get_contin). In the PS i'm having to call get_contin a lot of times (to get the best values), and it isn't helping. To solve it i'll have to store the values in doubles, that in the end will lose the benefices of using the variable length representation. 3.3 Time:

I don't know how the cache of combo trees reduction works, but it seems to work best with HC where each one in a eval are very close. In the initial phase of PS the instance are very different one of the other. I would say that PS just is running quicker without reduction because it returns a deme smaller than HC because it doesn't include the worst solutions. 3.4. General:

The variable length representation don't have that much impact in memory or performance, it just have when using PS and for complex problems. Even so, I still find better to change it to a double representation, to get a little better performance, with the possibility to search in a bigger space

and have a less complex code at the cost of a little more ram and my work.

Detailed results data:

  1. PS without reduction:1.1. Massif:
    • depth 5: peak 1.2842 MiB = 0.168322662 MB
    • depth 16: peak 1.3022 MiB = 0.170681958 MB
    • depth 32: peak 1.1866 MiB = 0.155530035 MB (leads to better solutions, and than, less complexity)
    • depth 128: peak 1.5212 MiB = 0.199386726 MB

1.2. Callgrind:

  • depth 5:
    • get_contin: 1.95%
    • set_contin: 0.7%
    • update_particles: 3.17%
    • complexity_based_scorer: 80.65%
  • depth 16:
    • get_contin: 5.39%
    • set_contin: 1.97%
    • update_particles: 6.98%
    • complexity_based_scorer: 77.19%
  • depth 32:
    • get_contin: 11.02%
    • set_contin: 4.15%
    • update_particles: 14.16%
    • complexity_based_scorer: 77.30%
  • depth 128:
    • get_contin: 16.18%
    • set_contin: 10.34%
    • update_particles: 24.40%
    • complexity_based_scorer: 67.82%

1.3. Time (s):

  • Depth: 2 | Mean: 2.557880
  • Depth: 3 | Mean: 2.593005
  • Depth: 4 | Mean: 2.663667
  • Depth: 5 | Mean: 2.668498
  • Depth: 6 | Mean: 2.701040
  • Depth: 7 | Mean: 2.644231
  • Depth: 8 | Mean: 2.714792
  • Depth: 9 | Mean: 2.692005
  • Depth: 10 | Mean: 2.680957
  • Depth: 11 | Mean: 2.735071
  • Depth: 12 | Mean: 2.783216
  • Depth: 13 | Mean: 2.743773
  • Depth: 14 | Mean: 2.725607
  • Depth: 15 | Mean: 2.722347
  • Depth: 16 | Mean: 2.755987
  • Depth: 17 | Mean: 2.901231
  • Depth: 18 | Mean: 2.809483
  • Depth: 19 | Mean: 2.762999
  • Depth: 20 | Mean: 2.758817
  • Depth: 21 | Mean: 2.772569
  • Depth: 22 | Mean: 2.816113
  • Depth: 23 | Mean: 2.928664
  • Depth: 24 | Mean: 2.904253
  • Depth: 25 | Mean: 2.938958
  • Depth: 26 | Mean: 2.845138
  • Depth: 27 | Mean: 2.877145
  • Depth: 28 | Mean: 2.857758
  • Depth: 29 | Mean: 2.840159
  • Depth: 30 | Mean: 2.822597
  • Depth: 31 | Mean: 2.813350
  • Depth: 32 | Mean: 2.820256
    1. HC without reduction:2.1. Massif:
      • depth 5: peak 53.1730 MiB = 6.96949146 MB
      • depth 16: peak 53.1730 MiB = 6.96949146 MB
      • depth 32: peak 53.1730 MiB = 6.96949146 MB
      • depth 128: peak 53.1730 MiB = 6.96949146 MB

2.2. Callgrind:

  • depth 5:
    • get_contin: 0.78%
    • set_contin: 0%
    • hill_climbing: 0.05%
    • complexity_based_scorer: 65.23%
  • depth 16:
    • get_contin: 0.78%
    • set_contin: 0%
    • hill_climbing: 0.05%
    • complexity_based_scorer: 65.21%
  • depth 32:
    • get_contin: 0.78%
    • set_contin: 0%
    • hill_climbing: 0.05%
    • complexity_based_scorer: 65.23%
  • depth 128:
    • get_contin: 0.78%
    • set_contin: 0%
    • hill_climbing: 0.06%
    • complexity_based_scorer: 65.20%

2.3. Time (s):

  • Depth: 2 | Mean: 13.933148
  • Depth: 3 | Mean: 13.710301
  • Depth: 4 | Mean: 13.771378
  • Depth: 5 | Mean: 13.699602
  • Depth: 6 | Mean: 13.643018
  • Depth: 7 | Mean: 13.953615
  • Depth: 8 | Mean: 14.218726
  • Depth: 9 | Mean: 14.607479
  • Depth: 10 | Mean: 14.577442
  • Depth: 11 | Mean: 14.953429
  • Depth: 12 | Mean: 14.705942
  • Depth: 13 | Mean: 14.993052
  • Depth: 14 | Mean: 14.541223
  • Depth: 15 | Mean: 14.249910
  • Depth: 16 | Mean: 14.563872
  • Depth: 17 | Mean: 15.361906
  • Depth: 18 | Mean: 14.469012
  • Depth: 19 | Mean: 14.755074
  • Depth: 20 | Mean: 14.411482
  • Depth: 21 | Mean: 14.821014
  • Depth: 22 | Mean: 14.412988
  • Depth: 23 | Mean: 14.660408
  • Depth: 24 | Mean: 14.989600
  • Depth: 25 | Mean: 14.081191
  • Depth: 26 | Mean: 14.491643
  • Depth: 27 | Mean: 14.121354
  • Depth: 28 | Mean: 14.673866
  • Depth: 29 | Mean: 14.079422
  • Depth: 30 | Mean: 13.745444
  • Depth: 31 | Mean: 13.860764
    1. PS with reduction:3.1. Massif:
      • depth 5: peak 2.7437 MiB = 0.359622246 MB
      • depth 16: peak 2.8201 MiB = 0.369636147 MB
      • depth 32: peak 4.2963 MiB = 0.563124634 MB
      • depth 128: peak 3.6244 MiB = 0.475057357 MB

3.2. Callgrind:

  • depth 5:
    • get_contin: 0.11%
    • set_contin: 0.01%
    • update_particles: 0.02%
    • complexity_based_scorer: 91.94%
  • depth 16:
    • get_contin: 0.11%
    • set_contin: 0.02%
    • update_particles: 0.06%
    • complexity_based_scorer: 93.25%
  • depth 32:
    • get_contin: 0.12%
    • set_contin: 0.04%
    • update_particles: 0.15%
    • complexity_based_scorer: 91.06%
  • depth 128:
    • get_contin: 0.25%
    • set_contin: 0.16%
    • update_particles: 0.37%
    • complexity_based_scorer: 92.87%

3.3. Time (s):

  • Depth: 32 | Mean: 180
    1. HC with reduction:4.1. Massif:
      • depth 32: peak 940.1328 MiB = 0.1203369984 MB

4.2. Callgrind:

  • depth 32:
    • get_contin: 0.42%
    • set_contin: 0%
    • hill_climbing: 0.08%
    • complexity_based_scorer: 32.52% (Low because merge_demes uses a lot)

4.3. Time (s):

  • Depth: 32 | Mean: 1.8413

For even more detailed result (the callgrids and massifs outputs or time with results) you can just ask me, i'll not but it in the github, it's too big and don't need to be there.

[1] See valgrind site: http://valgrind.org/info/tools.html

On Mon, Jul 13, 2015 at 10:59 AM, Arley Ristar arley@ristar.me wrote:

I was running with predicates.csv -W1 -u pred -m1000. I'll send, but I didn't know that were a cache in the evaluations so it appears to be exactly it that is happening.

On Mon, Jul 13, 2015 at 10:43 AM, ngeiswei notifications@github.com wrote:

@arleyristar https://github.com/arleyristar I'm wondering if the super fast run-times aren't due to the cache. If the candidates are idiotically identical that could turn most evaluations into cache hits. Can you run your experiment with the debug logging level and send it to us?

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-120932114.

ArleyRistar commented 9 years ago

About std::bitset: it isn't something for PS alone, and it isn't something that will increase the performance (not much anyway), but the code do some custom operations that exist there (and some that don't). But i've found that:

Basically you have to change too much things to gain almost nothing, so i'm skipping it for now and jumping to change the variable representation.

On Thu, Jul 30, 2015 at 5:02 PM, Arley Ristar arley@ristar.me wrote:

Update from what we talked about in the Moses channel.

  • I sent the above post check how depth affects the code.
  • In the conclusion i found a issue different from what we were testing (PS is slow with reduction).
  • Ben ask for a hypotheses.
  • I respond that it may be of Nil said 3 emails ago.
  • I show a callgrind that has a big difference in the number of instructions to reduct HC and PS.
  • I start to talk to Nil.
  • Empirically, i tested that HC will have bad performance too if it change the continuous knobs every eval too (because PS do this).
  • Nil and I didn't decided anything about reduction.
  • But we decided to change the variable length representation to double (that was what Linas wanted 7 emails ago).
  • We decided to change instance to std::bitset (i have a update about it below).
  • Ben gave some good ideas that we could try, but he'll be offline until Sunday, so it would be better wait for him.
  • Linas confirms that reduction with contin works well (in the post we could see that it really makes a big difference for HC independently of depth).

For now on i'll make my updates here to avoid any confusion.

On Mon, Jul 27, 2015 at 9:38 AM, Arley Ristar arley@ristar.me wrote:

I forgot to post here too:

https://opencog.slack.com/files/arleyristar/F086CPMUK/Initial_experiments_for_PS_and_HC_in_continuous_problems_ or read below. Initial experiments for PS and HC in continuous problems.

For resume version jump to 3.4. General.

  1. Methodology:

First, the experiments were divided in two groups: without reduction (-B 0 -E 1 parameters) and with reduction; it's separated this way because the reduction part affects the particle swarm (PS) too much (clean_combo_tree uses 99.73% with reduction in PS). Besides that, the experiments were split in three parts: analysis of the memory, processing and time. The memory analysis was done using the valgrind tool massif running for 4 values of depth {5(default), 16, 32, 128} and for PS and hill climbing (HC). The processing analysis was done in the same way of the memory, but using the tool callgrind. The time analysis was done in a different way, using a simple python script to measure the mean time of seeds between 1 and 10 running between depth 2 and 32 for PS and HC. The dataset used was predicates.csv because it is a simple one that is already inside the examples folder of Moses; It was running for a maximum of 1000 evaluations, a small value because of the valgrind tools that increase the program time in 20x for massif and between 20x and 100x for callgring [1] but it's a sufficient amount of evaluations to test (HC in some seeds can find the best solution in less than 1000 evals).

  1. Results description:2.1. Memory:
    • PS with reduction spend more memory than without.
    • Increasing in the depth of PS, increases the memory used, but not that much, sometimes it can decrease, because it can find better results that are less complex.
    • HC without reduction reaches the ram limit independently of depth.
    • HC with reduction uses far less memory, even less than PS without reduction.

2.2. Processing:

  • Variable length representation affects PS without reduction too much depending of the depth (15% at depth 32 vs. 2.7% in depth 5).
  • Affects the PS with reduction in a similar way, but with a percentage much lower, because of the reduction.
  • The depth didn't affect hill climbing at all, mostly because it found a solution that needs less than depth 5.

2.3. Time:

  • PS without reduction is much quicker than with reduction.
  • HC without reduction is slower than with reduction.
  • PS without reduction is quicker than HC without reduction.
  • PS with reduction is very much slower than HC with reduction.

2.4. Others:

  • Increasing the depth in PS, it returns better results, but it's stuck in the discrete maximum local anyway.
  • The results are not affected increasing the HC depth, maybe because this problem didn't need precision or a bigger value to reach the max. global.
    1. Conclusion:3.1. Memory:

I don't think that memory is a problem, even very complex problems with thousands of instances will not reach the amount of ram that the computers today have, and even if do, there's always feature selection; but let's take a look: Variable-length representation uses a amount of 28B (4B depth, 8B mean, 8B step_size, 8B expansion) + ((1/4B * depth) * num_instances) for each dimension in a deme. While a double representation uses 8B * num_instances. So using variable-length representation with a small depth for a larger number of instances still makes sense for RAM economy. But i'm still felling that this way could be better in 2 forms. 1- It uses 2 bits to represent left, right and stop (0,1,2), and it don't use 11 (3). 2- 5 depth can represents +-30 (with LLLLS and RRRRS), why not use until LLLLL or RRRRR if the get/set contin function limits the for with the depth too? 3.2. Processing:

With variable length representation, processing is just a problem with PS, and just if it have a bigger depth. In all cases the clean_combo_tree and merge_demes are the ones that use more processing and the specific part of HC or PS are small (PS is bigger just because of set_contin and get_contin). In the PS i'm having to call get_contin a lot of times (to get the best values), and it isn't helping. To solve it i'll have to store the values in doubles, that in the end will lose the benefices of using the variable length representation. 3.3 Time:

I don't know how the cache of combo trees reduction works, but it seems to work best with HC where each one in a eval are very close. In the initial phase of PS the instance are very different one of the other. I would say that PS just is running quicker without reduction because it returns a deme smaller than HC because it doesn't include the worst solutions. 3.4. General:

The variable length representation don't have that much impact in memory or performance, it just have when using PS and for complex problems. Even so, I still find better to change it to a double representation, to get a little better performance, with the possibility to search in a bigger space

and have a less complex code at the cost of a little more ram and my work.

Detailed results data:

  1. PS without reduction:1.1. Massif:
    • depth 5: peak 1.2842 MiB = 0.168322662 MB
    • depth 16: peak 1.3022 MiB = 0.170681958 MB
    • depth 32: peak 1.1866 MiB = 0.155530035 MB (leads to better solutions, and than, less complexity)
    • depth 128: peak 1.5212 MiB = 0.199386726 MB

1.2. Callgrind:

  • depth 5:
    • get_contin: 1.95%
    • set_contin: 0.7%
    • update_particles: 3.17%
    • complexity_based_scorer: 80.65%
  • depth 16:
    • get_contin: 5.39%
    • set_contin: 1.97%
    • update_particles: 6.98%
    • complexity_based_scorer: 77.19%
  • depth 32:
    • get_contin: 11.02%
    • set_contin: 4.15%
    • update_particles: 14.16%
    • complexity_based_scorer: 77.30%
  • depth 128:
    • get_contin: 16.18%
    • set_contin: 10.34%
    • update_particles: 24.40%
    • complexity_based_scorer: 67.82%

1.3. Time (s):

  • Depth: 2 | Mean: 2.557880
  • Depth: 3 | Mean: 2.593005
  • Depth: 4 | Mean: 2.663667
  • Depth: 5 | Mean: 2.668498
  • Depth: 6 | Mean: 2.701040
  • Depth: 7 | Mean: 2.644231
  • Depth: 8 | Mean: 2.714792
  • Depth: 9 | Mean: 2.692005
  • Depth: 10 | Mean: 2.680957
  • Depth: 11 | Mean: 2.735071
  • Depth: 12 | Mean: 2.783216
  • Depth: 13 | Mean: 2.743773
  • Depth: 14 | Mean: 2.725607
  • Depth: 15 | Mean: 2.722347
  • Depth: 16 | Mean: 2.755987
  • Depth: 17 | Mean: 2.901231
  • Depth: 18 | Mean: 2.809483
  • Depth: 19 | Mean: 2.762999
  • Depth: 20 | Mean: 2.758817
  • Depth: 21 | Mean: 2.772569
  • Depth: 22 | Mean: 2.816113
  • Depth: 23 | Mean: 2.928664
  • Depth: 24 | Mean: 2.904253
  • Depth: 25 | Mean: 2.938958
  • Depth: 26 | Mean: 2.845138
  • Depth: 27 | Mean: 2.877145
  • Depth: 28 | Mean: 2.857758
  • Depth: 29 | Mean: 2.840159
  • Depth: 30 | Mean: 2.822597
  • Depth: 31 | Mean: 2.813350
  • Depth: 32 | Mean: 2.820256
    1. HC without reduction:2.1. Massif:
      • depth 5: peak 53.1730 MiB = 6.96949146 MB
      • depth 16: peak 53.1730 MiB = 6.96949146 MB
      • depth 32: peak 53.1730 MiB = 6.96949146 MB
      • depth 128: peak 53.1730 MiB = 6.96949146 MB

2.2. Callgrind:

  • depth 5:
    • get_contin: 0.78%
    • set_contin: 0%
    • hill_climbing: 0.05%
    • complexity_based_scorer: 65.23%
  • depth 16:
    • get_contin: 0.78%
    • set_contin: 0%
    • hill_climbing: 0.05%
    • complexity_based_scorer: 65.21%
  • depth 32:
    • get_contin: 0.78%
    • set_contin: 0%
    • hill_climbing: 0.05%
    • complexity_based_scorer: 65.23%
  • depth 128:
    • get_contin: 0.78%
    • set_contin: 0%
    • hill_climbing: 0.06%
    • complexity_based_scorer: 65.20%

2.3. Time (s):

  • Depth: 2 | Mean: 13.933148
  • Depth: 3 | Mean: 13.710301
  • Depth: 4 | Mean: 13.771378
  • Depth: 5 | Mean: 13.699602
  • Depth: 6 | Mean: 13.643018
  • Depth: 7 | Mean: 13.953615
  • Depth: 8 | Mean: 14.218726
  • Depth: 9 | Mean: 14.607479
  • Depth: 10 | Mean: 14.577442
  • Depth: 11 | Mean: 14.953429
  • Depth: 12 | Mean: 14.705942
  • Depth: 13 | Mean: 14.993052
  • Depth: 14 | Mean: 14.541223
  • Depth: 15 | Mean: 14.249910
  • Depth: 16 | Mean: 14.563872
  • Depth: 17 | Mean: 15.361906
  • Depth: 18 | Mean: 14.469012
  • Depth: 19 | Mean: 14.755074
  • Depth: 20 | Mean: 14.411482
  • Depth: 21 | Mean: 14.821014
  • Depth: 22 | Mean: 14.412988
  • Depth: 23 | Mean: 14.660408
  • Depth: 24 | Mean: 14.989600
  • Depth: 25 | Mean: 14.081191
  • Depth: 26 | Mean: 14.491643
  • Depth: 27 | Mean: 14.121354
  • Depth: 28 | Mean: 14.673866
  • Depth: 29 | Mean: 14.079422
  • Depth: 30 | Mean: 13.745444
  • Depth: 31 | Mean: 13.860764
    1. PS with reduction:3.1. Massif:
      • depth 5: peak 2.7437 MiB = 0.359622246 MB
      • depth 16: peak 2.8201 MiB = 0.369636147 MB
      • depth 32: peak 4.2963 MiB = 0.563124634 MB
      • depth 128: peak 3.6244 MiB = 0.475057357 MB

3.2. Callgrind:

  • depth 5:
    • get_contin: 0.11%
    • set_contin: 0.01%
    • update_particles: 0.02%
    • complexity_based_scorer: 91.94%
  • depth 16:
    • get_contin: 0.11%
    • set_contin: 0.02%
    • update_particles: 0.06%
    • complexity_based_scorer: 93.25%
  • depth 32:
    • get_contin: 0.12%
    • set_contin: 0.04%
    • update_particles: 0.15%
    • complexity_based_scorer: 91.06%
  • depth 128:
    • get_contin: 0.25%
    • set_contin: 0.16%
    • update_particles: 0.37%
    • complexity_based_scorer: 92.87%

3.3. Time (s):

  • Depth: 32 | Mean: 180
    1. HC with reduction:4.1. Massif:
      • depth 32: peak 940.1328 MiB = 0.1203369984 MB

4.2. Callgrind:

  • depth 32:
    • get_contin: 0.42%
    • set_contin: 0%
    • hill_climbing: 0.08%
    • complexity_based_scorer: 32.52% (Low because merge_demes uses a lot)

4.3. Time (s):

  • Depth: 32 | Mean: 1.8413

For even more detailed result (the callgrids and massifs outputs or time with results) you can just ask me, i'll not but it in the github, it's too big and don't need to be there.

[1] See valgrind site: http://valgrind.org/info/tools.html

On Mon, Jul 13, 2015 at 10:59 AM, Arley Ristar arley@ristar.me wrote:

I was running with predicates.csv -W1 -u pred -m1000. I'll send, but I didn't know that were a cache in the evaluations so it appears to be exactly it that is happening.

On Mon, Jul 13, 2015 at 10:43 AM, ngeiswei notifications@github.com wrote:

@arleyristar https://github.com/arleyristar I'm wondering if the super fast run-times aren't due to the cache. If the candidates are idiotically identical that could turn most evaluations into cache hits. Can you run your experiment with the debug logging level and send it to us?

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-120932114.

linas commented 9 years ago

Hi Arley,

Could you cut-n-paste this into the directory moses/moses/diary/particle-swarm/ ? The other directories under "diary" provide similar information on experiments. They are rather informal; just notes, really, nothing particularly fancy.

On Mon, Jul 27, 2015 at 7:38 AM, arleyristar notifications@github.com wrote:

I forgot to post here too:

https://opencog.slack.com/files/arleyristar/F086CPMUK/Initial_experiments_for_PS_and_HC_in_continuous_problems_ or read below. Initial experiments for PS and HC in continuous problems.

For resume version jump to 3.4. General.

  1. Methodology:

First, the experiments were divided in two groups: without reduction (-B 0 -E 1 parameters) and with reduction; it's separated this way because the reduction part affects the particle swarm (PS) too much (clean_combo_tree uses 99.73% with reduction in PS).

What do you mean "affects too much"? Do you mean "uses too much cpu time"? or does it actually change results?

You nmay want to call (maybe) reduct once, before the PSO step; clearly, you should not call reduct while doing PSO i.e. while trying different float values.

Besides that, the experiments were split in three parts: analysis of the memory, processing and time. The memory analysis was done using the valgrind tool massif running for 4 values of depth {5(default), 16, 32, 128} and for PS and hill climbing (HC). The processing analysis was done in the same way of the memory, but using the tool callgrind. The time analysis was done in a different way, using a simple python script to measure the mean time of seeds between 1 and 10 running between depth 2 and 32 for PS and HC. The dataset used was predicates.csv

predicates.csv is extremely simple. predicates-1.3.csv is a lot harder to learn, and illustrates one of the big problems with the current contin learner, that I am hoping that PSO will do much better on.

something that had to learn e.g. x>1.618 would be hard for the current contin code, and should be "very easy" for PSO.

Much better test cases wou;ld be any dataset that people typically use gradient-descent methods on, or maybe datasets that are easily solved by any kind of linear regression, for example, SVM. Currently, moses mostly cannot do linear-regression type problems; I am hoping PSO will fix that.

A demo of either a "linear programming" (linear optimization) or a "linear integer programming" problem, in moses, with PSO, would be very cool.

because it is a simple one that is already inside the examples folder of Moses; It was running for a maximum of 1000 evaluations, a small value because of the valgrind tools that increase the program time in 20x for massif and between 20x and 100x for callgring [1] but it's a sufficient amount of evaluations to test (HC in some seeds can find the best solution in less than 1000 evals).

  1. Results description:2.1. Memory:
    • PS with reduction spend more memory than without.
    • Increasing in the depth of PS, increases the memory used, but not that much, sometimes it can decrease, because it can find better results that are less complex.
    • HC without reduction reaches the ram limit independently of depth.
    • HC with reduction uses far less memory, even less than PS without reduction.

2.2. Processing:

  • Variable length representation affects PS without reduction too much depending of the depth (15% at depth 32 vs. 2.7% in depth 5).
  • Affects the PS with reduction in a similar way, but with a percentage much lower, because of the reduction.
  • The depth didn't affect hill climbing at all, mostly because it found a solution that needs less than depth 5.

2.3. Time:

  • PS without reduction is much quicker than with reduction.
  • HC without reduction is slower than with reduction.
  • PS without reduction is quicker than HC without reduction.
  • PS with reduction is very much slower than HC with reduction.

I don't quite understand these results. MOSES does reduction in several different places, for "completely unrelated" reasons. it seems to me that you should do reduction at least once, before starting PSO. Howver, during the PSO inner loop, where you are merely trying different floating point values, you should NOT do any reduction at all; it would be completely pointless. Right!?

Can you sketch the pseudocode for me?

2.4. Others:

  • Increasing the depth in PS, it returns better results, but it's stuck in the discrete maximum local anyway.
  • The results are not affected increasing the HC depth, maybe because this problem didn't need precision or a bigger value to reach the max. global.
    1. Conclusion:3.1. Memory:

I don't think that memory is a problem, even very complex problems with thousands of instances will not reach the amount of ram that the computers today have, and even if do, there's always feature selection; but let's take a look: Variable-length representation uses a amount of 28B (4B depth, 8B mean, 8B step_size, 8B expansion) + ((1/4B * depth) * num_instances) for each dimension in a deme. While a double representation uses 8B * num_instances. So using variable-length representation with a small depth for a larger number of instances still makes sense for RAM economy. But i'm still felling that this way could be better in 2 forms. 1- It uses 2 bits to represent left, right and stop (0,1,2), and it don't use 11 (3). 2- 5 depth can represents +-30 (with LLLLS and RRRRS), why not use until LLLLL or RRRRR if the get/set contin function limits the for with the depth too? 3.2. Processing:

With variable length representation, processing is just a problem with PS, and just if it have a bigger depth. In all cases the clean_combo_tree and merge_demes are the ones that use more processing and the specific part of HC or PS are small (PS is bigger just because of set_contin and get_contin). In the PS i'm having to call get_contin a lot of times (to get the best values), and it isn't helping. To solve it i'll have to store the values in doubles, that in the end will lose the benefices of using the variable length representation. 3.3 Time:

I don't know how the cache of combo trees reduction works, but it seems to work best with HC where each one in a eval are very close. In the initial phase of PS the instance are very different one of the other. I would say that PS just is running quicker without reduction because it returns a deme smaller than HC because it doesn't include the worst solutions. 3.4. General:

The variable length representation don't have that much impact in memory or performance, it just have when using PS and for complex problems. Even so, I still find better to change it to a double representation, to get a little better performance, with the possibility to search in a bigger space

and have a less complex code at the cost of a little more ram and my work.

Detailed results data:

  1. PS without reduction:1.1. Massif:
    • depth 5: peak 1.2842 MiB = 0.168322662 MB
    • depth 16: peak 1.3022 MiB = 0.170681958 MB
    • depth 32: peak 1.1866 MiB = 0.155530035 MB (leads to better solutions, and than, less complexity)
    • depth 128: peak 1.5212 MiB = 0.199386726 MB

1.2. Callgrind:

  • depth 5:
  • get_contin: 1.95%
  • set_contin: 0.7%
  • update_particles: 3.17%
  • complexity_based_scorer: 80.65%
  • depth 16:
  • get_contin: 5.39%
  • set_contin: 1.97%
  • update_particles: 6.98%
  • complexity_based_scorer: 77.19%
  • depth 32:
  • get_contin: 11.02%
  • set_contin: 4.15%
  • update_particles: 14.16%
  • complexity_based_scorer: 77.30%
  • depth 128:
  • get_contin: 16.18%
  • set_contin: 10.34%
  • update_particles: 24.40%
  • complexity_based_scorer: 67.82%

1.3. Time (s):

  • Depth: 2 | Mean: 2.557880
  • Depth: 3 | Mean: 2.593005
  • Depth: 4 | Mean: 2.663667
  • Depth: 5 | Mean: 2.668498
  • Depth: 6 | Mean: 2.701040
  • Depth: 7 | Mean: 2.644231
  • Depth: 8 | Mean: 2.714792
  • Depth: 9 | Mean: 2.692005
  • Depth: 10 | Mean: 2.680957
  • Depth: 11 | Mean: 2.735071
  • Depth: 12 | Mean: 2.783216
  • Depth: 13 | Mean: 2.743773
  • Depth: 14 | Mean: 2.725607
  • Depth: 15 | Mean: 2.722347
  • Depth: 16 | Mean: 2.755987
  • Depth: 17 | Mean: 2.901231
  • Depth: 18 | Mean: 2.809483
  • Depth: 19 | Mean: 2.762999
  • Depth: 20 | Mean: 2.758817
  • Depth: 21 | Mean: 2.772569
  • Depth: 22 | Mean: 2.816113
  • Depth: 23 | Mean: 2.928664
  • Depth: 24 | Mean: 2.904253
  • Depth: 25 | Mean: 2.938958
  • Depth: 26 | Mean: 2.845138
  • Depth: 27 | Mean: 2.877145
  • Depth: 28 | Mean: 2.857758
  • Depth: 29 | Mean: 2.840159
  • Depth: 30 | Mean: 2.822597
  • Depth: 31 | Mean: 2.813350
  • Depth: 32 | Mean: 2.820256
    1. HC without reduction:2.1. Massif:
  • depth 5: peak 53.1730 MiB = 6.96949146 MB
  • depth 16: peak 53.1730 MiB = 6.96949146 MB
  • depth 32: peak 53.1730 MiB = 6.96949146 MB
  • depth 128: peak 53.1730 MiB = 6.96949146 MB

2.2. Callgrind:

  • depth 5:
  • get_contin: 0.78%
  • set_contin: 0%
  • hill_climbing: 0.05%
  • complexity_based_scorer: 65.23%
  • depth 16:
  • get_contin: 0.78%
  • set_contin: 0%
  • hill_climbing: 0.05%
  • complexity_based_scorer: 65.21%
  • depth 32:
  • get_contin: 0.78%
  • set_contin: 0%
  • hill_climbing: 0.05%
  • complexity_based_scorer: 65.23%
  • depth 128:
  • get_contin: 0.78%
  • set_contin: 0%
  • hill_climbing: 0.06%
  • complexity_based_scorer: 65.20%

2.3. Time (s):

  • Depth: 2 | Mean: 13.933148
  • Depth: 3 | Mean: 13.710301
  • Depth: 4 | Mean: 13.771378
  • Depth: 5 | Mean: 13.699602
  • Depth: 6 | Mean: 13.643018
  • Depth: 7 | Mean: 13.953615
  • Depth: 8 | Mean: 14.218726
  • Depth: 9 | Mean: 14.607479
  • Depth: 10 | Mean: 14.577442
  • Depth: 11 | Mean: 14.953429
  • Depth: 12 | Mean: 14.705942
  • Depth: 13 | Mean: 14.993052
  • Depth: 14 | Mean: 14.541223
  • Depth: 15 | Mean: 14.249910
  • Depth: 16 | Mean: 14.563872
  • Depth: 17 | Mean: 15.361906
  • Depth: 18 | Mean: 14.469012
  • Depth: 19 | Mean: 14.755074
  • Depth: 20 | Mean: 14.411482
  • Depth: 21 | Mean: 14.821014
  • Depth: 22 | Mean: 14.412988
  • Depth: 23 | Mean: 14.660408
  • Depth: 24 | Mean: 14.989600
  • Depth: 25 | Mean: 14.081191
  • Depth: 26 | Mean: 14.491643
  • Depth: 27 | Mean: 14.121354
  • Depth: 28 | Mean: 14.673866
  • Depth: 29 | Mean: 14.079422
  • Depth: 30 | Mean: 13.745444
  • Depth: 31 | Mean: 13.860764
    1. PS with reduction:3.1. Massif:
  • depth 5: peak 2.7437 MiB = 0.359622246 MB
  • depth 16: peak 2.8201 MiB = 0.369636147 MB
  • depth 32: peak 4.2963 MiB = 0.563124634 MB
  • depth 128: peak 3.6244 MiB = 0.475057357 MB

3.2. Callgrind:

  • depth 5:
  • get_contin: 0.11%
  • set_contin: 0.01%
  • update_particles: 0.02%
  • complexity_based_scorer: 91.94%
  • depth 16:
  • get_contin: 0.11%
  • set_contin: 0.02%
  • update_particles: 0.06%
  • complexity_based_scorer: 93.25%
  • depth 32:
  • get_contin: 0.12%
  • set_contin: 0.04%
  • update_particles: 0.15%
  • complexity_based_scorer: 91.06%
  • depth 128:
  • get_contin: 0.25%
  • set_contin: 0.16%
  • update_particles: 0.37%
  • complexity_based_scorer: 92.87%

3.3. Time (s):

  • Depth: 32 | Mean: 180
    1. HC with reduction:4.1. Massif:
  • depth 32: peak 940.1328 MiB = 0.1203369984 MB

4.2. Callgrind:

  • depth 32:
  • get_contin: 0.42%
  • set_contin: 0%
  • hill_climbing: 0.08%
  • complexity_based_scorer: 32.52% (Low because merge_demes uses a lot)

4.3. Time (s):

  • Depth: 32 | Mean: 1.8413

For even more detailed result (the callgrids and massifs outputs or time with results) you can just ask me, i'll not but it in the github, it's too big and don't need to be there.

[1] See valgrind site: http://valgrind.org/info/tools.html

On Mon, Jul 13, 2015 at 10:59 AM, Arley Ristar arley@ristar.me wrote:

I was running with predicates.csv -W1 -u pred -m1000. I'll send, but I didn't know that were a cache in the evaluations so it appears to be exactly it that is happening.

On Mon, Jul 13, 2015 at 10:43 AM, ngeiswei notifications@github.com wrote:

@arleyristar https://github.com/arleyristar I'm wondering if the super fast run-times aren't due to the cache. If the candidates are idiotically identical that could turn most evaluations into cache hits. Can you run your experiment with the debug logging level and send it to us?

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-120932114.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-125191369.

ArleyRistar commented 9 years ago

"Could you cut-n-paste this into the directory moses/moses/diary/particle-swarm/ ? The other directories under "diary" provide similar information on experiments. They are rather informal; just notes, really, nothing particularly fancy."

I'll make a diary from that post and commit.

"What do you mean "affects too much"? Do you mean "uses too much cpu time"? or does it actually change results? I mean like jumping from 1s without reduction to 180s with it almost without changing scores.

You nmay want to call (maybe) reduct once, before the PSO step; clearly, you should not call reduct while doing PSO i.e. while trying different float values." I don't call reduction directly, i call the scorer, that call the complexity_based_scorer, that calls get_candidate, that call clean_combo_tree, that calls reduct::rule::operator. In fact, this part that is calling the scorer is exactly like the HC, it even has the same commentaries, it was literally copy and past of 5 lines (just changing the start and end of the deme).


predicates.csv is extremely simple. predicates-1.3.csv is a lot harder to learn, and illustrates one of the big problems with the current contin learner, that I am hoping that PSO will do much better on.

something that had to learn e.g. x>1.618 would be hard for the current contin code, and should be "very easy" for PSO.

Much better test cases wou;ld be any dataset that people typically use gradient-descent methods on, or maybe datasets that are easily solved by any kind of linear regression, for example, SVM. Currently, moses mostly cannot do linear-regression type problems; I am hoping PSO will fix that.

A demo of either a "linear programming" (linear optimization) or a "linear integer programming" problem, in moses, with PSO, would be very cool.

Yes, predicates is the most simple one to test, i was aware of that. But even this most simple one took 3h running to get a callgrind of PS with reduction and for the first test i got some good results. Harder problems will show these results in a more precise way, and i'll use these problems in the next experiment. Let me reminder everybody of one thing, PS DO NOT get better results than HC, i haven't seen a single problem that every knob is continuous. Even in this simple problem the discrete part has more than 1/3 of the knobs. In the discrete part if falls in local maximum, it has the more simple and old implementation for using PSO with discrete problems, that's basically a round off and has no mechanism of escape. I know that Linas suggested to "not wasting any time there", but I want to fix that in this second part (or fix, or do as Ben suggest and make a hybrid), but the issues are stacking up.


I don't quite understand these results. MOSES does reduction in several different places, for "completely unrelated" reasons. it seems to me that you should do reduction at least once, before starting PSO. Howver, during the PSO inner loop, where you are merely trying different floating point values, you should NOT do any reduction at all; it would be completely pointless. Right!? I dunno. I would not say pointless because of the complexity scorer that PS uses; besides that doing this reduction in the loop seems to help HC a lot. And Ben said: "Another point to keep in mind is: There are some fitness functions for which fitness function evaluation is much more expensive than reduction" So in some cases it would be better do the reduction i think...

Can you sketch the pseudocode for me? Yes, there's even a image. This image has 5% degree of precision (functions that uses less than it, won't show), but if you need i can generate a more precise one. But if you are talking about the PS part alone, i can make a simple pseudocode: operator(): Create new particles in a deme. Create new velocities. Inner loop: Apply transform to score deme. Select best score for each particle and the best particle. Check finishing conditions. Update particles based on the bests: Update bit. Update disc. Update contin. End loop. Return deme with the best particles. End operator.

On Thu, Jul 30, 2015 at 6:16 PM, Linas Vepštas notifications@github.com wrote:

Hi Arley,

Could you cut-n-paste this into the directory moses/moses/diary/particle-swarm/ ? The other directories under "diary" provide similar information on experiments. They are rather informal; just notes, really, nothing particularly fancy.

On Mon, Jul 27, 2015 at 7:38 AM, arleyristar notifications@github.com wrote:

I forgot to post here too:

https://opencog.slack.com/files/arleyristar/F086CPMUK/Initial_experiments_for_PS_and_HC_in_continuous_problems_ or read below. Initial experiments for PS and HC in continuous problems.

For resume version jump to 3.4. General.

  1. Methodology:

First, the experiments were divided in two groups: without reduction (-B 0 -E 1 parameters) and with reduction; it's separated this way because the reduction part affects the particle swarm (PS) too much (clean_combo_tree uses 99.73% with reduction in PS).

What do you mean "affects too much"? Do you mean "uses too much cpu time"? or does it actually change results?

You nmay want to call (maybe) reduct once, before the PSO step; clearly, you should not call reduct while doing PSO i.e. while trying different float values.

Besides that, the experiments were split in three parts: analysis of the memory, processing and time. The memory analysis was done using the valgrind tool massif running for 4 values of depth {5(default), 16, 32, 128} and for PS and hill climbing (HC). The processing analysis was done in the same way of the memory, but using the tool callgrind. The time analysis was done in a different way, using a simple python script to measure the mean time of seeds between 1 and 10 running between depth 2 and 32 for PS and HC. The dataset used was predicates.csv

predicates.csv is extremely simple. predicates-1.3.csv is a lot harder to learn, and illustrates one of the big problems with the current contin learner, that I am hoping that PSO will do much better on.

something that had to learn e.g. x>1.618 would be hard for the current contin code, and should be "very easy" for PSO.

Much better test cases wou;ld be any dataset that people typically use gradient-descent methods on, or maybe datasets that are easily solved by any kind of linear regression, for example, SVM. Currently, moses mostly cannot do linear-regression type problems; I am hoping PSO will fix that.

A demo of either a "linear programming" (linear optimization) or a "linear integer programming" problem, in moses, with PSO, would be very cool.

because it is a simple one that is already inside the examples folder of Moses; It was running for a maximum of 1000 evaluations, a small value because of the valgrind tools that increase the program time in 20x for massif and between 20x and 100x for callgring [1] but it's a sufficient amount of evaluations to test (HC in some seeds can find the best solution in less than 1000 evals).

  1. Results description:2.1. Memory:
    • PS with reduction spend more memory than without.
    • Increasing in the depth of PS, increases the memory used, but not that much, sometimes it can decrease, because it can find better results that are less complex.
    • HC without reduction reaches the ram limit independently of depth.
    • HC with reduction uses far less memory, even less than PS without reduction.

2.2. Processing:

  • Variable length representation affects PS without reduction too much depending of the depth (15% at depth 32 vs. 2.7% in depth 5).
  • Affects the PS with reduction in a similar way, but with a percentage much lower, because of the reduction.
  • The depth didn't affect hill climbing at all, mostly because it found a solution that needs less than depth 5.

2.3. Time:

  • PS without reduction is much quicker than with reduction.
  • HC without reduction is slower than with reduction.
  • PS without reduction is quicker than HC without reduction.
  • PS with reduction is very much slower than HC with reduction.

I don't quite understand these results. MOSES does reduction in several different places, for "completely unrelated" reasons. it seems to me that you should do reduction at least once, before starting PSO. Howver, during the PSO inner loop, where you are merely trying different floating point values, you should NOT do any reduction at all; it would be completely pointless. Right!?

Can you sketch the pseudocode for me?

2.4. Others:

  • Increasing the depth in PS, it returns better results, but it's stuck in the discrete maximum local anyway.
  • The results are not affected increasing the HC depth, maybe because

this problem didn't need precision or a bigger value to reach the max. global.

  1. Conclusion:3.1. Memory:

I don't think that memory is a problem, even very complex problems with thousands of instances will not reach the amount of ram that the computers today have, and even if do, there's always feature selection; but let's take a look: Variable-length representation uses a amount of 28B (4B depth, 8B mean, 8B step_size, 8B expansion) + ((1/4B * depth) * num_instances) for each dimension in a deme. While a double representation uses 8B * num_instances. So using variable-length representation with a small depth for a larger number of instances still makes sense for RAM economy. But i'm still felling that this way could be better in 2 forms. 1- It uses 2 bits to represent left, right and stop (0,1,2), and it don't use 11 (3). 2- 5 depth can represents +-30 (with LLLLS and RRRRS), why not use until LLLLL or RRRRR if the get/set contin function limits the for with the depth too? 3.2. Processing:

With variable length representation, processing is just a problem with PS, and just if it have a bigger depth. In all cases the clean_combo_tree and merge_demes are the ones that use more processing and the specific part of HC or PS are small (PS is bigger just because of set_contin and get_contin). In the PS i'm having to call get_contin a lot of times (to get the best values), and it isn't helping. To solve it i'll have to store the values in doubles, that in the end will lose the benefices of using the variable length representation. 3.3 Time:

I don't know how the cache of combo trees reduction works, but it seems to work best with HC where each one in a eval are very close. In the initial phase of PS the instance are very different one of the other. I would say that PS just is running quicker without reduction because it returns a deme smaller than HC because it doesn't include the worst solutions. 3.4. General:

The variable length representation don't have that much impact in memory or performance, it just have when using PS and for complex problems. Even so, I still find better to change it to a double representation, to get a little better performance, with the possibility to search in a bigger space and have a less complex code at the cost of a little more ram and my

work.

Detailed results data:

  1. PS without reduction:1.1. Massif:
    • depth 5: peak 1.2842 MiB = 0.168322662 MB
    • depth 16: peak 1.3022 MiB = 0.170681958 MB
    • depth 32: peak 1.1866 MiB = 0.155530035 MB (leads to better solutions, and than, less complexity)
    • depth 128: peak 1.5212 MiB = 0.199386726 MB

1.2. Callgrind:

  • depth 5:
  • get_contin: 1.95%
  • set_contin: 0.7%
  • update_particles: 3.17%
  • complexity_based_scorer: 80.65%
  • depth 16:
  • get_contin: 5.39%
  • set_contin: 1.97%
  • update_particles: 6.98%
  • complexity_based_scorer: 77.19%
  • depth 32:
  • get_contin: 11.02%
  • set_contin: 4.15%
  • update_particles: 14.16%
  • complexity_based_scorer: 77.30%
  • depth 128:
  • get_contin: 16.18%
  • set_contin: 10.34%
  • update_particles: 24.40%
  • complexity_based_scorer: 67.82%

1.3. Time (s):

  • Depth: 2 | Mean: 2.557880
  • Depth: 3 | Mean: 2.593005
  • Depth: 4 | Mean: 2.663667
  • Depth: 5 | Mean: 2.668498
  • Depth: 6 | Mean: 2.701040
  • Depth: 7 | Mean: 2.644231
  • Depth: 8 | Mean: 2.714792
  • Depth: 9 | Mean: 2.692005
  • Depth: 10 | Mean: 2.680957
  • Depth: 11 | Mean: 2.735071
  • Depth: 12 | Mean: 2.783216
  • Depth: 13 | Mean: 2.743773
  • Depth: 14 | Mean: 2.725607
  • Depth: 15 | Mean: 2.722347
  • Depth: 16 | Mean: 2.755987
  • Depth: 17 | Mean: 2.901231
  • Depth: 18 | Mean: 2.809483
  • Depth: 19 | Mean: 2.762999
  • Depth: 20 | Mean: 2.758817
  • Depth: 21 | Mean: 2.772569
  • Depth: 22 | Mean: 2.816113
  • Depth: 23 | Mean: 2.928664
  • Depth: 24 | Mean: 2.904253
  • Depth: 25 | Mean: 2.938958
  • Depth: 26 | Mean: 2.845138
  • Depth: 27 | Mean: 2.877145
  • Depth: 28 | Mean: 2.857758
  • Depth: 29 | Mean: 2.840159
  • Depth: 30 | Mean: 2.822597
  • Depth: 31 | Mean: 2.813350
  • Depth: 32 | Mean: 2.820256
    1. HC without reduction:2.1. Massif:
  • depth 5: peak 53.1730 MiB = 6.96949146 MB
  • depth 16: peak 53.1730 MiB = 6.96949146 MB
  • depth 32: peak 53.1730 MiB = 6.96949146 MB
  • depth 128: peak 53.1730 MiB = 6.96949146 MB

2.2. Callgrind:

  • depth 5:
  • get_contin: 0.78%
  • set_contin: 0%
  • hill_climbing: 0.05%
  • complexity_based_scorer: 65.23%
  • depth 16:
  • get_contin: 0.78%
  • set_contin: 0%
  • hill_climbing: 0.05%
  • complexity_based_scorer: 65.21%
  • depth 32:
  • get_contin: 0.78%
  • set_contin: 0%
  • hill_climbing: 0.05%
  • complexity_based_scorer: 65.23%
  • depth 128:
  • get_contin: 0.78%
  • set_contin: 0%
  • hill_climbing: 0.06%
  • complexity_based_scorer: 65.20%

2.3. Time (s):

  • Depth: 2 | Mean: 13.933148
  • Depth: 3 | Mean: 13.710301
  • Depth: 4 | Mean: 13.771378
  • Depth: 5 | Mean: 13.699602
  • Depth: 6 | Mean: 13.643018
  • Depth: 7 | Mean: 13.953615
  • Depth: 8 | Mean: 14.218726
  • Depth: 9 | Mean: 14.607479
  • Depth: 10 | Mean: 14.577442
  • Depth: 11 | Mean: 14.953429
  • Depth: 12 | Mean: 14.705942
  • Depth: 13 | Mean: 14.993052
  • Depth: 14 | Mean: 14.541223
  • Depth: 15 | Mean: 14.249910
  • Depth: 16 | Mean: 14.563872
  • Depth: 17 | Mean: 15.361906
  • Depth: 18 | Mean: 14.469012
  • Depth: 19 | Mean: 14.755074
  • Depth: 20 | Mean: 14.411482
  • Depth: 21 | Mean: 14.821014
  • Depth: 22 | Mean: 14.412988
  • Depth: 23 | Mean: 14.660408
  • Depth: 24 | Mean: 14.989600
  • Depth: 25 | Mean: 14.081191
  • Depth: 26 | Mean: 14.491643
  • Depth: 27 | Mean: 14.121354
  • Depth: 28 | Mean: 14.673866
  • Depth: 29 | Mean: 14.079422
  • Depth: 30 | Mean: 13.745444
  • Depth: 31 | Mean: 13.860764
    1. PS with reduction:3.1. Massif:
  • depth 5: peak 2.7437 MiB = 0.359622246 MB
  • depth 16: peak 2.8201 MiB = 0.369636147 MB
  • depth 32: peak 4.2963 MiB = 0.563124634 MB
  • depth 128: peak 3.6244 MiB = 0.475057357 MB

3.2. Callgrind:

  • depth 5:
  • get_contin: 0.11%
  • set_contin: 0.01%
  • update_particles: 0.02%
  • complexity_based_scorer: 91.94%
  • depth 16:
  • get_contin: 0.11%
  • set_contin: 0.02%
  • update_particles: 0.06%
  • complexity_based_scorer: 93.25%
  • depth 32:
  • get_contin: 0.12%
  • set_contin: 0.04%
  • update_particles: 0.15%
  • complexity_based_scorer: 91.06%
  • depth 128:
  • get_contin: 0.25%
  • set_contin: 0.16%
  • update_particles: 0.37%
  • complexity_based_scorer: 92.87%

3.3. Time (s):

  • Depth: 32 | Mean: 180
    1. HC with reduction:4.1. Massif:
  • depth 32: peak 940.1328 MiB = 0.1203369984 MB

4.2. Callgrind:

  • depth 32:
  • get_contin: 0.42%
  • set_contin: 0%
  • hill_climbing: 0.08%
  • complexity_based_scorer: 32.52% (Low because merge_demes uses a lot)

4.3. Time (s):

  • Depth: 32 | Mean: 1.8413

For even more detailed result (the callgrids and massifs outputs or time with results) you can just ask me, i'll not but it in the github, it's too big and don't need to be there.

[1] See valgrind site: http://valgrind.org/info/tools.html

On Mon, Jul 13, 2015 at 10:59 AM, Arley Ristar arley@ristar.me wrote:

I was running with predicates.csv -W1 -u pred -m1000. I'll send, but I didn't know that were a cache in the evaluations so it appears to be exactly it that is happening.

On Mon, Jul 13, 2015 at 10:43 AM, ngeiswei notifications@github.com wrote:

@arleyristar https://github.com/arleyristar I'm wondering if the super fast run-times aren't due to the cache. If the candidates are idiotically identical that could turn most evaluations into cache hits. Can you run your experiment with the debug logging level and send it to us?

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-120932114.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-125191369.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-126490604.

linas commented 9 years ago

OK, So now I am completely confused. I was assuming the algo went like this:

outer loop:
   select exemplar
   decorate it with knobs; most of these will work just like before, but ...
   anywhere where a contin used to be, there will now be a double-precision float 
   (i.e. old contins are replaced by particles)
   inner loop:
      twiddle the boolean and discrete knobs, but NOT the floats!
      call reduct to get a simpler expression.
      initialize particle velocities.
      PSO loop:
           score the current tree with the current particle velocities
           (do NOT reduce the tree before scoring!!)
           update particle positions and velocities
           repeat PSO loop until converged.
     Now that PSO is done, 
     record best score,
     repeat the knob-twiddling inner-loop
     use ordinary hill-climbing to see if the best current PSO result
     is better than the previous best.
     if not, exit the inner loop

In other words, the new PSO-enabled version is just like the old hill-climbing moses, and it even uses the same-old hill-climbing just like before, with NO modifications AT ALL to hill-climbing... the only modification for PSO would be to replace all contins by particles, and to replace the computation of a single score by a PSO-loop.

That is, in the old code, scoring would be done exactly once for each knob setting; in the new code, this single score would be replaced by a PSO loop. But otherwise, everything else would be unchanged. I guess its not being designed like this?

ngeiswei commented 9 years ago

Agreed about std::bitset.

ArleyRistar commented 9 years ago

That is, in the old code, scoring would be done exactly once for each knob setting; in the new code, this single score would be replaced by a PSO loop. But otherwise, everything else would be unchanged. I guess its not being designed like this? Linas, no it's not. For now it's a pure PSO and it isn't a hybrid version like this. I wanted to make a complete version of PS to solve the problems (bit+disc+contin) like HC do. I still believe that with the right discrete algorithm for PS it can perform as good or better than HC, but it isn't that certain. On the other hand, there's a huge chance that a hybrid version would increase the performance of continuous knobs. I've thought about doing a hybrid, but not as described above that makes more sense (in some place in the code, there's some comments calling HC). I still want to test another discrete algorithm for discrete PS, but for now i think this hybrid should be priority for me (after finishing change to variable length to double).

Some comments:

outer loop: select exemplar <- Like center_inst from HC? decorate it with knobs; most of these will work just like before, but ... anywhere where a contin used to be, there will now be a double-precision float (i.e. old contins are replaced by particles) <- that's not necessary now that the representation will change (But HC and the others will access it like before) inner loop: twiddle the boolean and discrete knobs, but NOT the floats! <- like neighborhood_sampling? call reduct to get a simpler expression. <- like calling scorer with reduction to true? initialize particle velocities. PSO loop:<- I would do it for the best one, right? i can't do this for all generated by neighborhood_sampling because their reductions trees would have continuous in different places, wouldn't they? And even if they had, the best position for one wouldn't necessarily be the best for others. score the current tree with the current particle velocities <- so in this case, can i work directly with the tree and not with the instance? (do NOT reduce the tree before scoring!!) update particle positions and velocities repeat PSO loop until converged. <- convergence in pso it's a little trick, but for this case i would recommend finishing early. Now that PSO is done, record best score, repeat the knob-twiddling inner-loop <- Won't it be better to cut this part and do this after it return to the inner loop? use ordinary hill-climbing to see if the best current PSO result is better than the previous best. if not, exit the inner loop <- Here i would use the same parameter as HC to stop when reaching max_dist.

On Fri, Jul 31, 2015 at 5:20 AM, ngeiswei notifications@github.com wrote:

Agreed about std::bitset.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-126605494.

linas commented 9 years ago

On Fri, Jul 31, 2015 at 8:06 PM, arleyristar notifications@github.com wrote:

That is, in the old code, scoring would be done exactly once for each knob setting; in the new code, this single score would be replaced by a PSO loop. But otherwise, everything else would be unchanged. I guess its not being designed like this?

Linas, no it's not. For now it's a pure PSO and it isn't a hybrid version like this.

OK, well, I don't understand what "pure PSO" is. How can PSO possibly change discrete values? I thought it only worked with things that have floating-point values, and velocities; how can you possibly convert non-contin knobs into floats?

I wanted to make a complete version of PS to solve the problems (bit+disc+contin) like HC do. I still believe that with the right discrete algorithm for PS it can perform as good or better than HC, but it isn't that certain.

Again, I don't understand what a "discrete algo for PS" could be, or how it might work. Wikipedia did not seem to mention it; is there some reference I should be looking at?

Look, there is nothing "wrong" with hill-climbing; it works well for discrete and boolean knobs. It fails disasterously with contin knobs, but the reason for this is "well understood". Replacing contin-hillclimbing by almost any float-pt optimization algo will work better.

On the other hand, there's a huge chance that a hybrid version would increase the performance of continuous knobs.

Yes, and that is what I find very exciting.

I've thought about doing a hybrid, but not as described above that makes more sense (in some place in the code, there's some comments calling HC). I still want to test another discrete algorithm for discrete PS,

OK. I'd like some insight into what this might be like.

but for now i think this hybrid should be priority for me (after finishing change to variable length to double).

Yeah, this would be great!

Some comments:

The formatting is mangled...

outer loop: select exemplar <- Like center_inst from HC?

The outer loop is independent of HC. There is a population of trees; one is selected and decorated with knobs. HC is the "inner loop" it offers one way of turning knobs. There are several different inner loops, but HC is the only one that works well.

decorate it with knobs; most of these will work just like before, but ... anywhere where a contin used to be, there will now be a double-precision float (i.e. old contins are replaced by particles) <- that's not necessary now that the representation will change (But HC and the others will access it like before)

OK. As far as I am concerned, contins could be completely discarded (and replaced by doubles). Perhaps Nil has other ideas?

inner loop: twiddle the boolean and discrete knobs, but NOT the floats! <- like neighborhood_sampling?

yes.

call reduct to get a simpler expression. <- like calling scorer with reduction to true?

well, in this "hybrid" approach, this would be a step you'd have to explicitly add. The only reason that reduct is done before scoring is that it speeds up scoring. In the "hybrid" approach, you would not need this step in the scorer itself... but you would need it earlier.

initialize particle velocities. PSO loop:<- I would do it for the best one, right?

Yeah, I guess so ... or maybe you would want to do it for the best 5 or 10 ...

i can't do this for all generated by neighborhood_sampling because their reductions trees would have continuous in different places, wouldn't they?

Yes; each different tree would have the contin variables in different places, and you'd have to run an independent run of PSO for each one. (or just run it for the best one, or run it for the best 5 or 10 ...)

score the current tree with the current particle velocities <- so in this case, can i work directly with the tree and not with the instance?

Well, normally, a single instance is converted to a tree, then the tree is reduced, then the tree is scored. (An instance being just a binary string which indicates the knob settings on a decorated tree. Instances are used only because they are compact, much smaller than trees; otherwise the distinction doesn't much matter.)

There seem to be two different ways of doing hybrid-PSO.

One is "per-instance PSO": Given "some set of instances", for each instance: -- convert a single instance to a tree -- reduce the tree -- apply PSO to all floats in the tree. -- the final instance score is the best PSO result.

What is the "some set of instances"? Well, it could be a set of one: the highest scoring instance before PSO. Or it could be the top 5 best. Or it could be all of the ones returned by neighborhood_sampling ...

Now that PSO is done, record best score, repeat the knob-twiddling inner-loop <- Won't it be better to cut this part and do this after it return to the inner loop?

I suppose that is another way of doing it. In this case, you let HC find the very best 100, and then run PSO on all 100 of these, and see if PSO can get even better scores. Lets call this "per-hill PSO".

Clearly, per-hill PSO will use a LOT less cpu than per-instance PSO. Will it actually be better? Hard to say. It seems easier to code up per-hill PSO.

--linas

Reply to this email directly or view it on GitHub.

ngeiswei commented 9 years ago

What about PSO is moved into the sampling contin neighborhood? So there is no PSO per se, it's still hillclimbing optimization but the neighborhood sampling for contin (now double) values relies on PS. The code could be done so that PSO based sampling is just one possible sampling amonst others (just that one keeps track of positions and velocities to determine the sampling distribution, as opposed to say a Gausian).

If there's only contin knobs I guess it might look and behave just like the current PSO code, but if there are discrete values, some discrete knob turning will be mashed up with contin knob turned according to PSO.

I don't have a good intuition as to whether it's a good idea or not to mash up discrete HC and contin PS, ultimately it depends on how dependent discrete knobs are from contin knobs. Assuming there are independent then it should work great, faster than doing HC then PS, cause we'll be able to optimize both discrete and contin knobs at the same time. If there are dependent (they surely are to some degree), then I don't know.

bgoertzel commented 9 years ago

That seems a good, simple kind of hybrid to explore...

On Tue, Aug 4, 2015 at 1:20 PM, ngeiswei notifications@github.com wrote:

What about PSO is moved into the sampling contin neighborhood? So there is no PSO per se, it's still hillclimbing optimization but the neighborhood sampling for contin (now double) values relies on PS. The code could be done so that PSO based sampling is just one possible sampling amonst others (just that one keeps track of positions and velocities to determine the sampling distribution, as opposed to say a Gausian).

If there's only contin knobs I guess it might look and behave just like the current PSO code, but if there are discrete values, some discrete knob turning will be mashed up with contin knob turned according to PSO.

I don't have a good intuition as to whether it's a good idea or not to mash up discrete HC and contin PS, ultimately it depends on how dependent discrete knobs are from contin knobs. Assuming there are independent then it should work great, faster than doing HC then PS, cause we'll be able to optimize both discrete and contin knobs at the same time. If there are dependent (they surely are to some degree), then I don't know.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-127568761.

Ben Goertzel, PhD http://goertzel.org

"The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man." -- George Bernard Shaw

linas commented 9 years ago

Well, during hill-climbing, hundreds or thousands of different trees are explored. Depending on the knob setting, any given tree may or may not contain some contin variables -- for example:

or ($z and (boolean-knob  greater-than ( plus( $x $y))))

so if boolean-knob is set to True then the tree is $z or 0<($x + $y) but if the boolean-knob is False, then the tree does not even have the contins in it; the tree is just $z.

ArleyRistar commented 9 years ago

Nil, I don't know if leaving PSO inside neighborhood sampling is a good idea. On the bright side, we would have a quicker convergence (probably). On the not-so-bright side, we'll fall in the reduction problem again. (hill_climbing will call the score and so on) On the bright side, i don't have to worry about maintaining the contin_neighbor to work the same way that is now, but with double instead of variable length (something that i'm doing now). On the not-so-bright side, if we do the reduction before, we fall in what Linas said, that we wouldn't have equal numbers of contins to apply PSO. On the bright side, we could map and update the ones in use.

Anyway, pick one of these two (separated or inside neighborhood) to do first.

On Tue, Aug 4, 2015 at 1:17 PM, Linas Vepštas notifications@github.com wrote:

Well, during hill-climbing, hundreds or thousands of different trees are explored. Depending on the knob setting, any given tree may or may not contain some contin variables -- for example:

or ($z and (boolean-knob greater-than ( plus( $x $y))))

so if boolean-knob is set to True then the tree is $z or 0<($x + $y) but if the boolean-knob is false, then the free does not even have the contins in it; the tree is just $z.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-127663018.

linas commented 9 years ago

FYI, I don't think I have any objeections to the variable-length encoding that contin uses. I guess its OK for a suitable form of OK. What I did object to was treating contins as just another knob, of which a single bit could be twiddled at a time. Experience showed that this was an ineffective way of searching the space of possible floating point values.

bgoertzel commented 9 years ago

I have thought about this a little more and poked around on the Internet a little...

There is something called PSO-LS, given in pseudocode in this paper

https://books.google.com.et/books?id=vSXvHTQ07UsC&pg=PA246&lpg=PA246&dq=pso+hillclimbing+hybrid&source=bl&ots=DbAqVFwpNG&sig=9ZFZjsL5vPfPXPUZSDZh77gpIn0&hl=en&sa=X&ved=0CCkQ6AEwA2oVChMIkqmg7pKRxwIVqSPbCh0tsQtO#v=onepage&q=pso%20hillclimbing%20hybrid&f=false

which basically does PSO, but does hillclimbing on each particle to local-optimize it before reporting its info back to the swarm... [Step 3 in their pseudocode would be replaced in MOSES by our existing, fancy hillclimbing from the specified initial point.]

From what I understand, this would seem a natural extension of the current "hill-climbing with repeated re-starts" approach. Basically, we would be using PSO to handle the re-starts...

In this approach, MOSES search would break down into three levels:

1) Highest level, which is the deme level: choosing which exemplars to put in the demes, i.e. search of deme-exemplar space

2) Mid-level, which is choosing the starting-points for hill-climbing. In a PSO-LS approach, this would be done by PSO.

3) Low-level... This would use the current hill-climbing, which is reduct-friendly (i.e. makes small changes so leads to cheap reductions usually), but ignores dependencies among variables. Adding crossover (or BOA) then incorporates a way of paying attention to dependencies at this level...

...

In this case, the motivation for incorporating LS (hillclimbing) into PSO is a little different from in the usual case. The usual motivation is to help PSO do the final stages of optimization better, and help it faster escape local optima. Here we have an additional motivation, which is to minimize the amount of effort spent on reduction.

Still, this approach would be expected to add value only in cases where fitness evaluation is very costly so that the cost of reduction does not dominate. But this is generally the most important kind of case for AGI.

On reflection, I find this more promising than doing PSO within the local neighborhood as Nil suggested. Doing PSO within the local neighborhood appears not to really exploit the power of PSO for doing global search.

With this approach, I think we could get away with demes containing programs with more variables than we are using currently in MOSES. Right now we tend to very strictly restrict the number of features used in a deme, but I think this is partly a result of working around the limitations of HC.

-- Ben

On Tue, Aug 4, 2015 at 11:44 PM, arleyristar notifications@github.com wrote:

Nil, I don't know if leaving PSO inside neighborhood sampling is a good idea. On the bright side, we would have a quicker convergence (probably). On the not-so-bright side, we'll fall in the reduction problem again. (hill_climbing will call the score and so on) On the bright side, i don't have to worry about maintaining the contin_neighbor to work the same way that is now, but with double instead of variable length (something that i'm doing now). On the not-so-bright side, if we do the reduction before, we fall in what Linas said, that we wouldn't have equal numbers of contins to apply PSO. On the bright side, we could map and update the ones in use.

Anyway, pick one of these two (separated or inside neighborhood) to do first.

On Tue, Aug 4, 2015 at 1:17 PM, Linas Vepštas notifications@github.com wrote:

Well, during hill-climbing, hundreds or thousands of different trees are explored. Depending on the knob setting, any given tree may or may not contain some contin variables -- for example:

or ($z and (boolean-knob greater-than ( plus( $x $y))))

so if boolean-knob is set to True then the tree is $z or 0<($x + $y) but if the boolean-knob is false, then the free does not even have the contins in it; the tree is just $z.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-127663018.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-127771228.

Ben Goertzel, PhD http://goertzel.org

"The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man." -- George Bernard Shaw

ngeiswei commented 9 years ago

Hmm, what about keeping PSO as neighborhood sampling technique, but just limiting the number of contin/float knobs to turn at once? That way there would be more ones or zeros. Or would what you suggest brings a greater benefit? (I may not have understood very well honestly).

ArleyRistar commented 9 years ago

I'm finally finishing the variable length change, it gave me more work than i expected. It's compiling now, i have to adjust somethings but probably today or at most tomorrow i finish. Then i'll start this hybrid. From my point of view:

Why it'll work (probably): PSO can adapt to a changing best maximum IF the change is small enough (HC will change by 1-4 disc at times) to be got by a random change or the disc values converge before the contins (that will happens most of the times). HC can now find other examples that a contin would affect in the next iteration.

*Nil, you can do that, but you lose the PS purpose of adjust lots of variables at the same time converging to a point. If you wanna do that i recommend a random walking or something like that. After the change to double is somewhat easy to implement a test for this. It's definitely something to put in a to-do list.

On Tue, Aug 11, 2015 at 12:29 PM, ngeiswei notifications@github.com wrote:

Hmm, what about keeping PSO as neighborhood sampling technique, but just limiting the number of contin/float knobs to turn at once? That way there would be more ones or zeros. Or would what you suggest brings a greater benefit? (I may not have understood very well honestly).

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-129930470.

linas commented 9 years ago

Yes, I like your "new idea". It makes sense. Maybe there could be a flag or option that says "do n steps of hillclimbing in between PSO", and maybe another flag that says "do one round of PSO before starting HC."

Its very hard to intuit what will actually happen, since typically, there are local minima everywhere, and you don't know how big they are, what shape they have, what's next to what. For any given problem (or rather, class of problems) there is a meta-meta optimization step: fiddling with flags and options, by hand, to see what settings seem to provide the best learning. For moses, this meta-meta step seems to make a huge difference, particularly the "temperature". Setting the number of "dynamical features" to 10 or 20 seems like a good bet. I forget there is one or two more that seem to make a big difference.

--linas

On Tue, Aug 11, 2015 at 4:13 PM, arleyristar notifications@github.com wrote:

I'm finally finishing the variable length change, it gave me more work than i expected. It's compiling now, i have to adjust somethings but probably today or at most tomorrow i finish. Then i'll start this hybrid. From my point of view:

  • Linas' option: It's the easiest option from the 3, but my fear is that it won't be able to do better global search. Doing HC before will leave PSO too dependent of the examples found by HC. For example, if it starts with a exemplar that has 0 or 1 in all it's contin knobs, HC will optimize the discrete knobs and pass the reduction to PSO that has to optimize the contins, but it'll optimize dependently of that exemplar (that maybe don't have much contins because of the reduction) and these values may be good for that exemplar but not necessarily for global. Basically what i'm trying to say is that some examples generated in HC will not have chance to get optimized to really know if they are good or not.
  • Nil's option: It's the more difficult one to do, but it can get better results if done right. My view i've said 4 emails ago.
  • Ben's option: I got the idea, but as Ben said, this specific PSO-LS doesn't work for us as it is. The hill-climbing is a random change between [-1,1] to do local search. In our case we don't want to use some hill-climbing for the contin, but for the disc. I think it would work similar to the Linas idea, but in a inverse way, PSO first than HC.
  • My (new) opinion: Leave the pseudocode as Linas said, but instead of doing a complete HC before PSO, it will do just 1 neighborhood sampling for disc, reduce the best (now center), apply 1 iteration of PS for contin (as Nil said, doing sampling + PS), update center, and so on in a loop. As Ben said PS would choose the next center to HC and HC to PS. If not found better, increase distance for HC and/or increase inertia for PS until converge.

Why it'll work (probably): PSO can adapt to a changing best maximum IF the change is small enough (HC will change by 1-4 disc at times) to be got by a random change or the disc values converge before the contins (that will happens most of the times). HC can now find other examples that a contin would affect in the next iteration.

*Nil, you can do that, but you lose the PS purpose of adjust lots of variables at the same time converging to a point. If you wanna do that i recommend a random walking or something like that. After the change to double is somewhat easy to implement a test for this. It's definitely something to put in a to-do list.

On Tue, Aug 11, 2015 at 12:29 PM, ngeiswei notifications@github.com wrote:

Hmm, what about keeping PSO as neighborhood sampling technique, but just limiting the number of contin/float knobs to turn at once? That way there would be more ones or zeros. Or would what you suggest brings a greater benefit? (I may not have understood very well honestly).

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-129930470.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-130079447.

ArleyRistar commented 9 years ago

I'm back. So, about the pull request to merge the project: All the commits prior to August can easily be merged and won't affect other parts. It's basically the addiction of the PS as a optimization option, some changes in the parameters to receive arguments for ps and a option to change the depth from the command line. Plus that, there's the diary commit in August that can put in the pull request. But at least for now i recommend leave the variable length representation change and hybrid commits.

The variable length change changed too much things outside the optimization, and in some changes i wasn't so sure about the correctness, but in the end "just" the contin part of the neighborhood sampling was broken, but it was expected. Nil suggested to mimic the prior behavior, but it would cause performance loss and i had little time to do hybrid, so i leaved it for a while and tested a simpler solution, but it turn out to be not so good (i'll not detailed, it didn't work anyway, was basically a expansion factor (+ and -) that was decreasing by half and initialized with the 2^(depth-1)). So i won't recommend because it's worse for contin than before for hill climbing. For ps it gave better results, but it still have the discrete problem, so it still gave bad results.

About the hybrid. It depends of the variable length and it's not giving good results (at least it was better than without). The version in the last commit (21 Aug before 19 utc) that is the one that i have to send to gsoc isn't complete, in fact i thought it was complete, but there's a mistake that i saw after. One of the tricks i had to do was pass the double value as reference and not copy how it was before (to form the combo_tree), it won't cause problems because the instance is constant when it is called, but for my case that needed to change the combo_tree that was important. That way i can change the contin part of the temporary instance and apply the scorer. The problem lies in inserting that instance in the metapopulation because of the reduction. After the reduction I don't know which contin was used and which wasn't and in the rush i didn't realized it putting all the particle; leaving contins that weren't supposed to be there. These days i found a not-so-pretty way to map which contin is used in the reduced tree without changing the rules, i'll test after the pull requests, probably next week, but don't expect much. About the flag that Linas said: they are pretty easy to do, but first let's see if it work. Also it's pretty much without documentation, so make a pull request without it will leave the code a bit confusing.

On Tue, Aug 11, 2015 at 6:42 PM, Linas Vepštas notifications@github.com wrote:

Yes, I like your "new idea". It makes sense. Maybe there could be a flag or option that says "do n steps of hillclimbing in between PSO", and maybe another flag that says "do one round of PSO before starting HC."

Its very hard to intuit what will actually happen, since typically, there are local minima everywhere, and you don't know how big they are, what shape they have, what's next to what. For any given problem (or rather, class of problems) there is a meta-meta optimization step: fiddling with flags and options, by hand, to see what settings seem to provide the best learning. For moses, this meta-meta step seems to make a huge difference, particularly the "temperature". Setting the number of "dynamical features" to 10 or 20 seems like a good bet. I forget there is one or two more that seem to make a big difference.

--linas

On Tue, Aug 11, 2015 at 4:13 PM, arleyristar notifications@github.com wrote:

I'm finally finishing the variable length change, it gave me more work than i expected. It's compiling now, i have to adjust somethings but probably today or at most tomorrow i finish. Then i'll start this hybrid. From my point of view:

  • Linas' option: It's the easiest option from the 3, but my fear is that it won't be able to do better global search. Doing HC before will leave PSO too dependent of the examples found by HC. For example, if it starts with a exemplar that has 0 or 1 in all it's contin knobs, HC will optimize the discrete knobs and pass the reduction to PSO that has to optimize the contins, but it'll optimize dependently of that exemplar (that maybe don't have much contins because of the reduction) and these values may be good for that exemplar but not necessarily for global. Basically what i'm trying to say is that some examples generated in HC will not have chance to get optimized to really know if they are good or not.
  • Nil's option: It's the more difficult one to do, but it can get better results if done right. My view i've said 4 emails ago.
  • Ben's option: I got the idea, but as Ben said, this specific PSO-LS doesn't work for us as it is. The hill-climbing is a random change between [-1,1] to do local search. In our case we don't want to use some hill-climbing for the contin, but for the disc. I think it would work similar to the Linas idea, but in a inverse way, PSO first than HC.
  • My (new) opinion: Leave the pseudocode as Linas said, but instead of doing a complete HC before PSO, it will do just 1 neighborhood sampling for disc, reduce the best (now center), apply 1 iteration of PS for contin (as Nil said, doing sampling + PS), update center, and so on in a loop. As Ben said PS would choose the next center to HC and HC to PS. If not found better, increase distance for HC and/or increase inertia for PS until converge.

Why it'll work (probably): PSO can adapt to a changing best maximum IF the change is small enough (HC will change by 1-4 disc at times) to be got by a random change or the disc values converge before the contins (that will happens most of the times). HC can now find other examples that a contin would affect in the next iteration.

*Nil, you can do that, but you lose the PS purpose of adjust lots of variables at the same time converging to a point. If you wanna do that i recommend a random walking or something like that. After the change to double is somewhat easy to implement a test for this. It's definitely something to put in a to-do list.

On Tue, Aug 11, 2015 at 12:29 PM, ngeiswei notifications@github.com wrote:

Hmm, what about keeping PSO as neighborhood sampling technique, but just limiting the number of contin/float knobs to turn at once? That way there would be more ones or zeros. Or would what you suggest brings a greater benefit? (I may not have understood very well honestly).

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-129930470.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-130079447.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-130086384.

ngeiswei commented 9 years ago

Thanks for your comeback Arley. I expected the difficulty of you last task to be too high for the GSoC timing. See what you can do. We've learned things, that already matters.

linas commented 9 years ago

There are no open pull requests ...

How hard would it be to split up the code into different mergable pull requests?

ArleyRistar commented 9 years ago

There's no conflict, it's able to merge. But i wanna make just one pull with the diary commit, that is after i started the variable length replacement. I started see i now, i'll send it asap.

On Mon, Aug 31, 2015 at 6:43 PM, Linas Vepštas notifications@github.com wrote:

There are no open pull requests ...

How hard would it be to split up the code into different mergable pull requests?

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-136509708.

ArleyRistar commented 9 years ago

ops, i did something wrong: ci/circleci — No test commands were found Anybody knows what is it?

On Mon, Aug 31, 2015 at 6:49 PM, Arley Ristar arley@ristar.me wrote:

There's no conflict, it's able to merge. But i wanna make just one pull with the diary commit, that is after i started the variable length replacement. I started see i now, i'll send it asap.

On Mon, Aug 31, 2015 at 6:43 PM, Linas Vepštas notifications@github.com wrote:

There are no open pull requests ...

How hard would it be to split up the code into different mergable pull requests?

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-136509708.

linas commented 9 years ago

You can do multiple pull requests, that might make things easier to review and to merge.

Ignore circle-ci for now. Its tempermental.