halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.89k stars 1.07k forks source link

Hexagon schedule fails computations #4322

Open EgorZarubin opened 5 years ago

EgorZarubin commented 5 years ago

Hello, I'm trying to run simple code with pyramid at hexagon. And I faced to strange behavior: CODE:

Func upsample(Func f)
    {
        Func upx("upx"), upy("upy");
        Var x("x"), y("y");
        Func f_16("f_16");
        f_16(x, y) = cast<int16_t>(f(x, y));    
        upx(x, y) = (f_16((x >> 1) - 1 + 2 * (x & 1), y) + 3 * f_16(x >> 1, y));
        upy(x, y) = (upx(x, (y >> 1) - 1 + 2 * (y & 1)) + 3 * upx(x, y >> 1) + 8) >> 4;

        Func out;
        out(x, y) = cast<uint8_t>(clamp(upy(x, y), 0, 255));
        return out;
    }
Func downsample(Func f)
{
    Func downx("downx"), downy("downy");
    Var x("x"), y("y");
    Func f_16("f_16");
    f_16(x, y) = cast<int16_t>(f(x, y));

    // use 1 2 1 filter
    downx(x, y) = (2 * f_16(2 * x, y) +
                   f_16(2 * x - 1, y) + f_16(2 * x + 1, y));

    downy(x, y) = (2 * downx(x, 2 * y) +
                downx(x, 2 * y - 1) + downx(x, 2 * y + 1) + 8) >> 4;

    Func out;
    out(x, y) = cast<uint8_t>(clamp(downy(x, y), 0, 255)); 
    return out;
}
void run()
    {       
                //YUV_PYR_DEPTH = 2
        Var x("x"), y("y"); 
        Func gY[maxlev], lY[maxlev], outPyrY[maxlev];
        gY[0](x,y) = BoundaryConditions::repeat_edge(inputY)(x,y);
        for (int j = 1; j < YUV_PYR_DEPTH; j++) {
            gY[j](x, y) = downsample(gY[j - 1])(x, y);          
        }

        // make Laplacian pyr for Y and UV
        lY[YUV_PYR_DEPTH - 1] = gY[YUV_PYR_DEPTH - 1];
        for (int j = YUV_PYR_DEPTH - 2; j >= 0; j--) {
            lY[j](x, y) = cast<uint8_t>(clamp(cast<int16_t>(gY[j](x, y)) - upsample(gY[j + 1])(x, y) + 128, 0 , 255));
        }

        Func lower;
        lower(x, y) = cast<uint8_t>(clamp(cast<uint16_t>(gY[YUV_PYR_DEPTH - 1](x, y)) + 25, 0, 255));
        outPyrY[YUV_PYR_DEPTH - 1] = lower;
        for (int j = YUV_PYR_DEPTH - 2; j >= 0; j--) {
            outPyrY[j](x, y) = cast<uint8_t>(clamp(cast<int16_t>(upsample(outPyrY[j + 1])(x, y)) + lY[j](x, y) - 128, 0, 255)); 
        }

        outputY(x, y) = outPyrY[0](x, y);

        for (int i = 1; i < YUV_PYR_DEPTH; i++)
        {
            gY[i]
                .compute_root()
                .hexagon()
                .vectorize(x, 128, TailStrategy::RoundUp)
                .parallel(y, 16)
                .align_storage(x, 128);

            outPyrY[i]
                .compute_at(Func(outputY), y)
                .vectorize(x, 128, TailStrategy::RoundUp)
                .align_storage(x, 128);         
        }
        outputY
                .hexagon()                    //Question about this line!
            .vectorize(x, 128)
            .parallel(y);
    }

When I run this code at the phone, output is corrupted. Values in each 4th column are either 0 or 255. When I remove ".hexagon()" from outputY schedule, program works fine. What can be the reason of such behavior?

dsharletg commented 5 years ago

In the host code, are you checking for errors? Is it possible the Hexagon pipeline is failing to run?

I notice there are two 'hexagon' directives (one on gY[i], one on outputY), is it only the latter one that causes the results to be incorrect?

EgorZarubin commented 5 years ago

Yes, only 'hexagon' for outputY leads the error.

What do you mean by host code, code that call halide pipeline or generated pseudo code?