halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.92k stars 1.07k forks source link

Documenation for .in() refers to interpolate app, but is rewritten since. #6454

Open mcourteaux opened 3 years ago

mcourteaux commented 3 years ago

The app/interpolate no longer uses the .in() directive. A new app should be chosen to guide the reader to a useful example. https://github.com/halide/Halide/blob/c0192ffa71bbebfbdcb6eddcdf060169f5022ea2/src/Func.h#L1313-L1316

While we are at .in() (again with FAQs efforts in mind), I'd like to also hear about the technique of copying memory into a SM's shared memory for improved performance. There is a trick in the apps somewhere that uses .in().in() to achieve this. I think this needs extensive elaboration: https://github.com/halide/Halide/blob/c0192ffa71bbebfbdcb6eddcdf060169f5022ea2/apps/stencil_chain/stencil_chain_generator.cpp#L86-L101

I'm slowly getting the hang of what .in() does, but this I don't get. It seems that the first block is meant to copy it to block Shared Memory, and then the second one (the one embedded in code here) is meant to load it into registers? Maybe I'm not familiar with how CUDA works, but how can a function be loaded into registers? Every value goes into a register? Why do you know this in this case? Doesn't there need to be a .store_in(MemoryType::Register) then? Same for the loading in the shared memory: doesn't it need a .store_in(MemoryType::GPUShared)?

mcourteaux commented 3 years ago

Documentation of in() should definitely refer to the tutorial. Didn't know there was one by now.