Open greenleaf641 opened 4 weeks ago
Hi,
If I understand correctly the question is how to implement autodiff for your custom function.
Why are we multiplying trace with dot, or trace with primal exactly, and do you have any idea on how these two functions should be altered to support transformation interaction instead?
ForwardTrace
and ReverseTrace
implement forward- and reverse mode automatic differentiation (see e.g. https://arxiv.org/abs/1502.05767), in a slightly more simplified way since in Operon's case the directed acyclic graph is in fact a tree.
I'm not sure what to do when it comes to the ForwardTrace and ReverseTrace which I assume will be necessary for optimizing the coefficients later on.
This requires that all intermediate results are stored in the primal to be used later by the traces. In your ForwardPass
function, it looks to me like the evaluation of the entire polynomial expression (weight1 * function1(x1^poly11 * x2^ poly12 * ...) + weight2 * function2(x1^poly21 * x2^poly22 * ...) + ...
) is stored in one single column of the primal matrix by the ComputePolynomials
function. In other words, the intermediate results which would allow differentiation with respect to the weights inside the polynomial are missing (since the primal only stores the final aggregate values).
I would suggest to implement your own Interpreter instead. You could use dual numbers directly and then you don't need multiple passes. You can probably adapt this code to your needs: https://github.com/heal-research/operon/blob/28ac41b6b0af55207679c0d955f6e20a3749cb58/test/source/operon_test.hpp#L210
I see, having seen the paper I think what you are saying makes sense a little bit. So in this case, should I store in primal the result of each interaction (meaning weight1 * function1(x1^poly11 * x2^ poly12 * ...)
per column)?
Additionally have you looked at this method of symbolic regression in the past, do you think if I manage to make it work the results will be worth it?
@greenleaf641 as one of the authors of ITEA, I would say go for it :-D but I might be biased In my experience with ITEA, while it doesn't excel in srbench, it has some success cases when you really need to ensure the linearity of the parameters. (also, I would personally love to see this integrate to Operon :-) )
It would be fun I think, since I liked the idea. But for now I'm just changing Operon's existing code, if it works well, I should figure a way to add this as an additional option alongside Operon's traditional tree structure.
Btw, you described the IT representation but pointed out to TIR paper, in TIR you would have something like:
f(x) = weight1 * function1(x1^poly11 * x2^ poly12 * ...) + weight2 * function2(x1^poly21 * x2^poly22 * ...) + ...
g(x) = weight1 * function1(x1^poly11 * x2^ poly12 * ...) + weight2 * function2(x1^poly21 * x2^poly22 * ...) + ...
and the expression would be
linkFunction(f(x)/(1 + g(x))
this would be a little bit more difficult to implement, but it often generates smaller and more accurate expressions.
My mistake. No I meant the IT method [https://arxiv.org/pdf/1902.03983], I just linked the wrong paper somehow! Thank you for the correction.
Would I be correct in saying that in the case of IT, trace_
(i.e. in ForwardTrace) is no longer needed?
I just do the calculation directly: jac.col(i).segment(row, rem) = primal.col(i).head(rem) / individual.Weights[i];
I think so, the derivatives for IT expressions are straightforward, like you said
Would I be correct in saying that in the case of IT,
trace_
(i.e. in ForwardTrace) is no longer needed? I just do the calculation directly:jac.col(i).segment(row, rem) = primal.col(i).head(rem) / individual.Weights[i];
Your idea looks plausible, then you would only need the primal. I suggest starting with some small expressions and checking if the derivatives are correct, like we do in the unit test: https://github.com/heal-research/operon/blob/main/test/source/implementation/autodiff.cpp
I think I can train now "somewhat correctly". I have a few questions though. Why does Operon pass arguments by value and not by reference on the mutation/crossover functions? Wouldn't it be more efficient to do the latter?
By using this with the "Variations of vertical fluid" dataset as reference I get:
9.807*(ρ * Δh)
or 9.807*abs(ρ * Δh)
as an answer. Operon gives me -2.643664 * cbrt((ρ ^ 3.000000) * (Δh ^ 3.000000))
which is obviously correct but only after applying scaling. Is this the correct behaviour; should the result be returned like that or did I make some mistake somewhere in between?
Coefficient optimization happens with the scaled data so maybe this makes sense but just to be sure.
Why does Operon pass arguments by value and not by reference on the mutation/crossover functions? Wouldn't it be more efficient to do the latter?
Most of these operators work by performing some changes on a copy of the parent. Therefore, it makes sense to accept the arguments by value. One can also use std::move
to avoid a copy.
By using this with the "Variations of vertical fluid" dataset as reference I get: 9.807(ρ Δh) or 9.807abs(ρ Δh) as an answer. Operon gives me -2.643664 cbrt((ρ ^ 3.000000) (Δh ^ 3.000000)) which is obviously correct but only after applying scaling. Is this the correct behaviour; should the result be returned like that or did I make some mistake somewhere in between? Coefficient optimization happens with the scaled data so maybe this makes sense but just to be sure.
I am assuming that you are using the cli programs (operon-gp
, operon-nsgp
) ? If yes, normally the scaling terms are added to the model. If the scaling is not what you expected, I suspect something with the data?
Correct, the cli nsgp program. But given that I've edited some things to work with IT, then maybe it could be a problem on my end. I need to investigate further since the returned function is not correct without the scaling factor.
I am trying to see if it is possible to have Operon work not with a tree structure but by using the "Transformation Interaction representation". Briefly what this method hopes to do, is simplify the resulting functions by representing the population not as a tree but as a collection of multiple unaries acting on top of a polynomial. Something like:
weight1 * function1(x1^poly11 * x2^ poly12 * ...) + weight2 * function2(x1^poly21 * x2^poly22 * ...) + ...
I made a new generator that will only select functions with arity 1 and instead of a tree will generate a collection of genotypes that have the form described above:
I altered the
Interpreter
as well so it can evaluate the genotypes of this form; I added this function to the interpreter class:So then the new
ForwardPass
can evaluate them like this:but I'm not sure what to do when it comes to the
ForwardTrace
andReverseTrace
which I assume will be necessary for optimizing the coefficients later on.Why are we multiplying trace with dot, or trace with primal exactly, and do you have any idea on how these two functions should be altered to support transformation interaction instead?