halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.91k stars 1.07k forks source link

Fix #8455 (in combination with #8457) #8456

Closed steven-johnson closed 2 weeks ago

steven-johnson commented 2 weeks ago

In slice_vector(), only check for type equality after vectors have been normalized to fixed (i.e., it's ok for some original input to be vscale)

steven-johnson commented 2 weeks ago

I'd welcome a better fix :-)

On Mon, Nov 4, 2024 at 4:27 PM Zalman Stern @.***> wrote:

I think this potentially returns a result which doesn't have the same fixed/scalable kid as the inputs. That causes this sort of problem to happen elsewhere. I guess it should only happen if the inputs aren't both the right type, but I'd sort of rather move to being strict about this as one doesn't know if it was the fixed or scalable parameter that was the desired state for the return.

— Reply to this email directly, view it on GitHub https://github.com/halide/Halide/pull/8456#issuecomment-2455975859, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACQJ63JDERWIZ22FUEZRVDZ677GZAVCNFSM6AAAAABRFLM7F6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJVHE3TKOBVHE . You are receiving this because you authored the thread.Message ID: @.***>

zvookin commented 2 weeks ago

(I deleted a prev comment that was just wrong)

I think the intent is the types need to match, but the result is always fixed so... sigh. Also if we're going to conform the types to each other, we might as well broadcast scalars to match a vector if the other argument is one. Would be nice if we had a specification... I feel like the idea of making LLVM IR level routines easier to use by automatically fixing up types ends up being less productive because something ends up not being what one expects which either messes up semantics somewhere of triggers an assert elsewhere. But we do generally broadcast scalars to vectors as needed so...

Let me see if there's a reliable way to get the inputs to strictly match types.

zvookin commented 2 weeks ago

I've convinced myself that this is the correct fix in that the result type is always a fixed vector and the conversions are from scalable to fixed. This allows one to compose shuffle_vectors calls without manually converting all scalable vectors to fixed ones. Which is what is happening here: in interleave_vectors, with an odd number of vectors, if they are all scalable, the last one will be shuffled with a fixed vector returned from a previous shuffle.

Conversion from scalable to fixed is lossless and always changes the flavor. (Fixed vectors with length that is not divisible by the effective vscale cannot be converted and stay fixed.) Thus my take is this will uniformly improve reliability without introducing new type mismatches.