Interaction transformation method and Operon.

greenleaf641 commented 4 weeks ago

I am trying to see if it is possible to have Operon work not with a tree structure but by using the "Transformation Interaction representation". Briefly what this method hopes to do, is simplify the resulting functions by representing the population not as a tree but as a collection of multiple unaries acting on top of a polynomial. Something like:

weight1 * function1(x1^poly11 * x2^ poly12 * ...) + weight2 * function2(x1^poly21 * x2^poly22 * ...) + ...

I made a new generator that will only select functions with arity 1 and instead of a tree will generate a collection of genotypes that have the form described above:

struct Individual {
        // number of nodes (Functions to be applied)
        uint16_t Length = 0UL; // 0-65535

        Operon::Vector<Operon::Scalar> Fitness;

        // vector of free parameters of the affine combination of the `Functions` in `Individual`.
        Operon::Vector<Operon::Scalar> Coefficients;

        // vector of transformation Functions by index.
        Operon::Vector<Operon::Function> Functions;

        // vector of interaction Functions for each transformation.
        Operon::Vector<Operon::Vector<Operon::Scalar>> Polynomials;

        Individual() = default;

        ...
};

I altered the Interpreter as well so it can evaluate the genotypes of this form; I added this function to the interpreter class:

        auto ComputePolynomials(Operon::Range range, int i) const {
            auto const& individual{ individual_.get() };
            auto const& terms = individual.Polynomials[i];
            const auto start = range.Start();
            const auto size = range.Size();

            constexpr int64_t S{ BatchSize };
            auto* h = primal_.data_handle();

            std::fill_n(h + i * S, S, T{1});

            for (int64_t j = 0; j < static_cast<int64_t>(terms.size()); ++j) {
                auto const* variableValues = dataset_.get().GetValues(j).subspan(start, size).data();
                auto* primalPtr = h + i * S;
                auto term = terms[j];
                std::transform(variableValues, variableValues + size, primalPtr, primalPtr, [term](auto value, auto current) {
                    T result = std::pow(value, term);
                    return current * result;
                });
            }
        }

So then the newForwardPass can evaluate them like this:

        inline auto ForwardPass(Operon::Range range, int row, bool trace = false) const -> void {
            auto const start { static_cast<int64_t>(range.Start()) };
            auto const len   { static_cast<int64_t>(range.Size()) };
            auto const& individual = individual_.get();
            auto const nn = std::ssize(individual.Functions);
            constexpr int64_t S{ BatchSize };

            auto rem = std::min(S, len - row);
            Operon::Range rg(start + row, start + row + rem);

            // forward pass - compute primal and trace
            for (auto i = 0L; i < nn; ++i) {
                auto const& [coefficient, nodeFunction, nodeDerivative, polynomial] = context_[i];

                auto* ptr = primal_.data_handle() + i * S;

                if (nodeFunction) {
                    ComputePolynomials(rg, i);
                    std::invoke(*nodeFunction, individual.Functions, primal_, i, rg);

                    // first compute the partials
                    if (trace && nodeDerivative) {
                        std::invoke(*nodeDerivative, individual.Functions, primal_, trace_, i, i);
                    }

                    // apply weight after partials are computed
                    if (coefficient != T{1}) {
                        std::ranges::transform(std::span(ptr, rem), ptr, [coefficient](auto x) { return x * coefficient; });
                    }
                }
            }
        }

but I'm not sure what to do when it comes to the ForwardTrace and ReverseTrace which I assume will be necessary for optimizing the coefficients later on.

Why are we multiplying trace with dot, or trace with primal exactly, and do you have any idea on how these two functions should be altered to support transformation interaction instead?

foolnotion commented 3 weeks ago

Hi,

If I understand correctly the question is how to implement autodiff for your custom function.

Why are we multiplying trace with dot, or trace with primal exactly, and do you have any idea on how these two functions should be altered to support transformation interaction instead?

ForwardTrace and ReverseTrace implement forward- and reverse mode automatic differentiation (see e.g. https://arxiv.org/abs/1502.05767), in a slightly more simplified way since in Operon's case the directed acyclic graph is in fact a tree.

I'm not sure what to do when it comes to the ForwardTrace and ReverseTrace which I assume will be necessary for optimizing the coefficients later on.

This requires that all intermediate results are stored in the primal to be used later by the traces. In your ForwardPass function, it looks to me like the evaluation of the entire polynomial expression (weight1 * function1(x1^poly11 * x2^ poly12 * ...) + weight2 * function2(x1^poly21 * x2^poly22 * ...) + ...) is stored in one single column of the primal matrix by the ComputePolynomials function. In other words, the intermediate results which would allow differentiation with respect to the weights inside the polynomial are missing (since the primal only stores the final aggregate values).

I would suggest to implement your own Interpreter instead. You could use dual numbers directly and then you don't need multiple passes. You can probably adapt this code to your needs: https://github.com/heal-research/operon/blob/28ac41b6b0af55207679c0d955f6e20a3749cb58/test/source/operon_test.hpp#L210

greenleaf641 commented 3 weeks ago

I see, having seen the paper I think what you are saying makes sense a little bit. So in this case, should I store in primal the result of each interaction (meaning weight1 * function1(x1^poly11 * x2^ poly12 * ...) per column)?

Additionally have you looked at this method of symbolic regression in the past, do you think if I manage to make it work the results will be worth it?

folivetti commented 3 weeks ago

@greenleaf641 as one of the authors of ITEA, I would say go for it :-D but I might be biased In my experience with ITEA, while it doesn't excel in srbench, it has some success cases when you really need to ensure the linearity of the parameters. (also, I would personally love to see this integrate to Operon :-) )

greenleaf641 commented 3 weeks ago

It would be fun I think, since I liked the idea. But for now I'm just changing Operon's existing code, if it works well, I should figure a way to add this as an additional option alongside Operon's traditional tree structure.

folivetti commented 3 weeks ago

Btw, you described the IT representation but pointed out to TIR paper, in TIR you would have something like:

f(x) = weight1 * function1(x1^poly11 * x2^ poly12 * ...) + weight2 * function2(x1^poly21 * x2^poly22 * ...) + ...

g(x) = weight1 * function1(x1^poly11 * x2^ poly12 * ...) + weight2 * function2(x1^poly21 * x2^poly22 * ...) + ...

and the expression would be

linkFunction(f(x)/(1 + g(x))

this would be a little bit more difficult to implement, but it often generates smaller and more accurate expressions.

greenleaf641 commented 3 weeks ago

My mistake. No I meant the IT method [https://arxiv.org/pdf/1902.03983], I just linked the wrong paper somehow! Thank you for the correction.

greenleaf641 commented 3 weeks ago

Would I be correct in saying that in the case of IT, trace_ (i.e. in ForwardTrace) is no longer needed? I just do the calculation directly: jac.col(i).segment(row, rem) = primal.col(i).head(rem) / individual.Weights[i];

folivetti commented 3 weeks ago

I think so, the derivatives for IT expressions are straightforward, like you said

foolnotion commented 3 weeks ago

Would I be correct in saying that in the case of IT, trace_ (i.e. in ForwardTrace) is no longer needed? I just do the calculation directly: jac.col(i).segment(row, rem) = primal.col(i).head(rem) / individual.Weights[i];

Your idea looks plausible, then you would only need the primal. I suggest starting with some small expressions and checking if the derivatives are correct, like we do in the unit test: https://github.com/heal-research/operon/blob/main/test/source/implementation/autodiff.cpp

greenleaf641 commented 1 week ago

I think I can train now "somewhat correctly". I have a few questions though. Why does Operon pass arguments by value and not by reference on the mutation/crossover functions? Wouldn't it be more efficient to do the latter?

By using this with the "Variations of vertical fluid" dataset as reference I get: 9.807*(ρ * Δh) or 9.807*abs(ρ * Δh) as an answer. Operon gives me -2.643664 * cbrt((ρ ^ 3.000000) * (Δh ^ 3.000000)) which is obviously correct but only after applying scaling. Is this the correct behaviour; should the result be returned like that or did I make some mistake somewhere in between?

Coefficient optimization happens with the scaled data so maybe this makes sense but just to be sure.

foolnotion commented 1 week ago

Why does Operon pass arguments by value and not by reference on the mutation/crossover functions? Wouldn't it be more efficient to do the latter?

Most of these operators work by performing some changes on a copy of the parent. Therefore, it makes sense to accept the arguments by value. One can also use std::move to avoid a copy.

By using this with the "Variations of vertical fluid" dataset as reference I get: 9.807(ρ Δh) or 9.807abs(ρ Δh) as an answer. Operon gives me -2.643664 cbrt((ρ ^ 3.000000) (Δh ^ 3.000000)) which is obviously correct but only after applying scaling. Is this the correct behaviour; should the result be returned like that or did I make some mistake somewhere in between? Coefficient optimization happens with the scaled data so maybe this makes sense but just to be sure.

I am assuming that you are using the cli programs (operon-gp, operon-nsgp) ? If yes, normally the scaling terms are added to the model. If the scaling is not what you expected, I suspect something with the data?

greenleaf641 commented 1 week ago

Correct, the cli nsgp program. But given that I've edited some things to work with IT, then maybe it could be a problem on my end. I need to investigate further since the returned function is not correct without the scaling factor.

heal-research / operon

Interaction transformation method and Operon. #39