halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.83k stars 1.07k forks source link

Exception at interleaved -> interleaved processing #3080

Open Gobra opened 6 years ago

Gobra commented 6 years ago

Here's quite simple code that fails on the realize call with error message Constraint violated: f0.stride.0 (3) == 1 (1)

Halide::Var x, y, c;
Halide::Func processor;
auto input = Halide::Buffer<uint8_t>::make_interleaved((uint8_t *)bitmap.bits(), bitmap.width(), bitmap.height(), 3);
auto output = Halide::Buffer<uint8_t>::make_interleaved(bitmap.width(), bitmap.height(), 3);

processor(x, y, c) = input(x, y, c);
processor.realize(output);

To me the code makes perfect sense as both input and output are of the same dimensions and layout. I've tried "rendering" to the input filter with processor.realize(input); and it fails with the same error message.

Changing output to default (planar?) layout with auto output = Halide::Buffer<uint8_t>(bitmap.width(), bitmap.height(), 3); makes it working.

Is that a bug or did I miss something obvious in documentation regarding the data representation (layout)?

Should it be important - I'n running with Win10 x64, llvm 6.0, MSVC2017, Halide "master" branch from 21 june 2018.

abadams commented 6 years ago

Take a look at http://halide-lang.org/tutorials/tutorial_lesson_16_rgb_generate.html

By default halide assumes the first dimension (x in this case) is dense in memory (stride 1). This is so it can generate good dense vector loads. You can tell it otherwise using processor.output_buffer().dim(0).set_stride(3)

Gobra commented 6 years ago

Yes, but I have created input buffer as make_interleaved, it shows stride 3 for x and 3072 for y (buffer width is 1024) which seems to be right. As output buffer is set the same way, why would Halide assume different stride for x from one defined for both input and target output?

abadams commented 6 years ago

"processor" is compiled without knowledge of the output buffer you're going to use. You could call realize with any argument, and it would only compile once. I don't remember if this is also true of input buffers. I think when you use them directly perhaps Halide inspects the layout. If it were an ImageParam you'd have the same problem on the input as you do on the output.

Gobra commented 6 years ago

Thanks for your help, it makes sense and with your suggested fix it works now.

It's still quite confusing for a newbie and I'm sure a lot of programmer might fall to the same "trap". I've read tutorial 16 before posting the issue, but it also a bit confusing due to those differences between JIT and AOT schemes.

May I suggest maybe adding a page to the project wiki about buffer memory layouts and ways to treat it with Halide API? The most basic examples for JIT/AOT in addition to the tutorial 15 and 16 should be of big help to those only jumping onto the ship.

SuTanTank commented 4 years ago

I encountered the same problem when learning using Halide. I was trying to process a Buffer created from cv::Mat, which is interleaved. By adding output_buffer().dim(0).set_stride(3) to the Func the program runs normally.

My concern is, if the filter is assuming the first dimension's stride is 1, does it mean that changing it to other value may slow down the speed? or is there any method (like reorder maybe) can avoid the performance decreasing?

SuTanTank commented 4 years ago

I found that for a 3-channel interleaved image, reorder(c, x, y) cause significant slower process than doing nothing, which seem not making sense. Why following actual layout (y>x>c) is worse than (c>y>x) ?