peter-ch / MultiNEAT

Portable NeuroEvolution Library
http://MultiNEAT.com
GNU Lesser General Public License v3.0
327 stars 104 forks source link

Vector scale implementation is incorrect #60

Open jherico opened 4 years ago

jherico commented 4 years ago

https://github.com/peter-ch/MultiNEAT/blob/0dc93aa0e228e23eeb95371bdd3838c64f08fdae/src/Utils.cpp#L40

The existing void Scale(vector<double>& a_Values, const double a_tr_min, const double a_tr_max) function ignores the passed a_tr_min and a_tr_max.

Delegating to the existing Scale(double&...) function is inefficient because it causes the intermediate range value to be recomputed every call.

The a_Values = t_ValuesScaled; is inefficient because it invokes a vector copy requiring linear time, whereas a_Values.swap(t_ValuesScaled); would be constant time as it only exchanges the pointers for each vector.

The t_ValuesScaled.push_back(t_ValueToBeScaled); is inefficient because it can potentially trigger vector reallocation inside the loop. The final size of t_ValuesScaled is known at the start, so there's no reason not to pre-allocate storage by using vector::reserve

Here is an alternative implementation:

void Scale(std::vector<double>& a_Values, const double a_tr_min, const double a_tr_max)
{
    if (a_Values.empty()) {
        return;
    }

    double t_max = std::numeric_limits<double>::min(), t_min = std::numeric_limits<double>::max();
    GetMaxMin(a_Values, t_min, t_max);
    auto range = t_max - t_min;
    auto destRange = a_tr_max - a_tr_min;
    auto scaleValue = destRange / range;
    std::vector<double> t_ValuesScaled;
    t_ValuesScaled.reserve(a_Values.size());
    for (auto t_ValueToBeScaled : a_Values) {
        t_ValueToBeScaled -= t_min;
        t_ValueToBeScaled *= scaleValue;
        t_ValueToBeScaled += a_tr_min;
        t_ValuesScaled.push_back(t_ValueToBeScaled);
    }
    a_Values.swap(t_ValuesScaled);
}
jherico commented 4 years ago

The new implementation does have one remaining flaw. If t_min and t_max are the same, when the scaling value will be NaN. Solving this would be as easy as adding a conditional on range == 0, but the behavior would need to be specified. Reasonable options would include setting all the values to a_tr_min, or to a_tr_max, or to the midpoint between them, or to a random distribution between them.