[BUG][MLIR] expand-strided-metadata pass changes IR semantic

vlad-vinogradov-47 commented 1 year ago

The expand-strided-metadata pass has a pattern RewriteExtractAlignedPointerAsIndexOfViewLikeOp, which replaces extract_aligned_pointer_as_index(view_like(%mem)) with extract_aligned_pointer_as_index(%mem). But this pattern is not applicable for memref.view operation:

func.func @test() -> index {
    %alloc = memref.alloc() : memref<64xi8>

    %offset_32 = index.constant 32
    %view = memref.view %alloc[%offset_32][] : memref<64xi8> to memref<4xi64>

    %intptr = memref.extract_aligned_pointer_as_index %view : memref<4xi64> -> index

    return %intptr : index
}

// RUN: mlir-opt -expand-strided-metadata

func.func @test() -> index {
    %alloc = memref.alloc() : memref<64xi8>
    %intptr = memref.extract_aligned_pointer_as_index %alloc : memref<64xi8> -> index
    return %intptr : index
}

memref.view specification:

Returns a contiguous memref with 0 offset and empty layout.

Which means, that memref.view creates memref descriptor as:

dst.basePtr = src.BasePtr
dst.alignedPtr = src.alignedPtr + byte_shift
dst.offset = 0

So, in the above example extract_aligned_pointer_as_index %view != extract_aligned_pointer_as_index %alloc and the expand-strided-metadata pass performed non equal IR transformation.

llvmbot commented 1 year ago

@llvm/issue-subscribers-mlir

qcolombet commented 1 year ago

This is an interesting one. The problematic pattern relies on ViewLikeOpInterface to do the folding. AFAICT, aside from this one operation (memref.view), view-like operations only change offset, sizes, and strides, i.e., they are not touching the base/aligned ptr.

The description of the view-like interface is vague enough that it seems okay to modify the base/aligned ptr.

I see two paths forward:

we make view-like interface stricter and don’t allow this kind of changes (i.e., we drop the view-like interface for this operation (we may want to rename it as well, e.g., memref.rebase)
we modify the problematic folding pattern to only allow the view-like operations that are safe

The second solution would be unfortunate because that means we effectively can’t use this interface anymore for this folding but maybe that’s okay. As far as I see the may purpose of this interface was to inform alias analysis.

What do people think?

joker-eph commented 1 year ago

What is the impact on alias analysis? the connection isn't directly obvious to me.

qcolombet commented 1 year ago

Disclaimer I only looked at how the interface is used for like 5 min.

The connection is the alias analysis uses the view-like interface to get the underlying buffers to build its dependencies:

vlad-vinogradov-47 commented 1 year ago

we make view-like interface stricter and don’t allow this kind of changes (i.e., we drop the view-like interface for this operation (we may want to rename it as well, e.g., memref.rebase)

IMHO, this might broke alias analysis, since it relys on view-like interface. And the memref.view/memref.rebase operations creates an aliasing buffer.

AFAICT, aside from this one operation (memref.view), view-like operations only change offset, sizes, and strides, i.e., they are not touching the base/aligned ptr.

I've asked this question several time on forum and will ask it again (sorry for being tediousness). Why we need separate offset as underlying memref-structure field and as memref type part? Why not to always deal with views in the same way as memref.view does - by modifying the aligned ptr field? In that case the aligned ptr will always point to the first element and view-like operations will be consistent with each other. In this case the RewriteExtractAlignedPointerAsIndexOfViewLikeOp pattern becomes invalid.

joker-eph commented 1 year ago

The connection is the alias analysis uses the view-like interface to get the underlying buffers to build its dependencies:

But at this level, the descriptor does not even exist, it's an implementation detail of the LLVM lowering only?

I guess we can try to remove the offset and see if we hit a snag...

qcolombet commented 1 year ago

AFAICT, aside from this one operation (memref.view), view-like operations only change offset, sizes, and strides, i.e., they are not touching the base/aligned ptr.

I've asked this question several time on forum and will ask it again (sorry for being tediousness). Why we need separate offset as underlying memref-structure field and as memref type part? Why not to always deal with views in the same way as memref.view does - by modifying the aligned ptr field? In that case the aligned ptr will always point to the first element and view-like operations will be consistent with each other. In this case the RewriteExtractAlignedPointerAsIndexOfViewLikeOp pattern becomes invalid.

I don't know. The only thing I'd say is what is killing us here is the inconsistency. So whatever we do to make it consistent would be great.

As far as I understand, the memref.view is the only operation where the offset (in byte) may not be a multiple of the element type size. Hence the new memref is not necessarily representable with the classical base + offset (in element type).

qcolombet commented 1 year ago

The connection is the alias analysis uses the view-like interface to get the underlying buffers to build its dependencies:

But at this level, the descriptor does not even exist, it's an implementation detail of the LLVM lowering only?

That's correct, we just use that interface to move across the def-use chain while going through views to get a "purer" value (IIUC).

I guess we can try to remove the offset and see if we hit a snag...

You mean from the memref descriptor?

joker-eph commented 1 year ago

Yes, I meant removing this from the memref descriptor.

qcolombet commented 1 year ago

Yes, I meant removing this from the memref descriptor.

If I'm not mistaken, MemRefs used to not have an offset. It was added around:

commit 923b33ea16ac8734bc7430ccb4cc4169d0bce785
Author: Nicolas Vasilache <ntv@google.com>
Date:   Mon Sep 30 11:58:14 2019 -0700

    Normalize MemRefType lowering to LLVM as strided MemRef descriptor

@nicolasvasilache do you know why the offset was added in the MemRef struct as opposed to changing the base pointer on the fly the MemRef descriptor?

nicolasvasilache commented 1 year ago

At the time I introduced this there were 3 factors:

carry static alignment information in the offset.
carry information for alias analysis (i.e. taking 2 subviews that don't overlap).
use a representation that aligns naturally with numpy and torch, on which metadata transformations are known to work well (e.g. transpose by swapping 2 sizes/strides entry, slice by shifting the offset and modifying the sizes, etc).

In practice, I have not seen these 3 points be significantly enabled by the "offset in the type" and I would be happy if we could remove it, especially if it turns out it makes other things difficult. This echoes @ftynse's point here.

Note that we could remove either aligned_ptr or offset from the descriptor but there are load-bearing downstream usages of the aligned_ptr.

So removing the offset would be something worthwhile trying.

vlad-vinogradov-47 commented 1 year ago

Note that we could remove either aligned_ptr or offset from the descriptor but there are load-bearing downstream usages of the aligned_ptr.

If I understood correctly, they use base_ptr (aka allocated_ptr) to store control block, not aligned_ptr.

nicolasvasilache commented 1 year ago

True, thanks for the correction.

I think the load-bearing part here is they are using the 2 pointers, @Hardcode84 ?

joker-eph commented 1 year ago

Keeping the aligned_ptr seems to remove one computation for resolving any address?

Hardcode84 commented 1 year ago

Yes, we are using both pointers (allocated for control block and reference counting and aligned to actually access memref data), please don't remove them :), I don't have strong feelings about offset field.

ftynse commented 1 year ago

My preferred lowering scheme at this point would be:

aligned ptr with offset
sizes
strides
ptrdiff between aligned and allocated

That is, fold the existing offset into the aligned pointer and replace the allocated pointer with a ptrdiff (essentially offset, but using a different term to avoid confusion with the preexisting one). This has the following effects:

the descriptor takes less space;
address computation on access (most frequent) has one less arithmetic operation, view/subview (less frequent) has one more;
deallocation (least frequent) now needs to subtract ptrdiff;
there is only one pointer in the descriptor, so we can actually express (no)aliasing properties properly.

The last point is quite important IMO as we currently have no mechanism for indicating that two memrefs don't alias. Some flows put noalias on both aligned and allocated pointers, which precisely do alias, so they lie to the LLVM optimizer and only get away with it because the allocated pointer is never used in most function bodies. I also know we now have some support for alias scopes in the LLVM dialect, but no lowering to it or equivalent on memref.

As for downstream (ab)use of the second pointer, it can be obtained as inttoptr (aligned - ptrdiff) when needed, same as for deallocation. It may be increasing the number of computation for that downstream, but I think improvements to upstream should prevail, especially given that I don't seem to remember an upstream-visible discussion of this creative (ab)use of the lowering scheme before it was implemented.

qcolombet commented 1 year ago

My preferred lowering scheme at this point would be:

aligned ptr with offset

sizes

strides

ptrdiff between aligned and allocated

That is, fold the existing offset into the aligned pointer and replace the allocated pointer with a ptrdiff (essentially offset, but using a different term to avoid confusion with the preexisting one).

Sounds reasonable to me. One thing w.r.t. the aligned ptr + offset, I'd suggest we call it start address, because it may not be aligned or put differently, when offsets are thrown into the mix, we cannot guarantee that things will stay aligned (as with the memref.view instruction.)

As for downstream (ab)use of the second pointer, it can be obtained as inttoptr (aligned - ptrdiff) when needed, same as for deallocation. It may be increasing the number of computation for that downstream, but I think improvements to upstream should prevail, especially given that I don't seem to remember an upstream-visible discussion of this creative (ab)use of the lowering scheme before it was implemented.

+1

CC: @matthias-springer for the heads-up w.r.t. buffer deallocation.

maerhart commented 1 year ago

For the buffer deallocation, we are using extract_aligned_pointer_as_index to do a dynamic (at runtime) check to see if two memref values come from the same allocation (basically checking for 'aliasing', where two subviews that don't overlap but are subviews into the same originally allocated memref should also be counted as 'aliasing' in this case). This means, this memref.view bug actually leads to our implementation not being correct as well. Additionally, we use extract_strided_metadata to get the base memref to pass it to memref.dealloc.

If I understand the proposed change correctly, the base memref returned by extract_strided_metadata would still return a memref that is valid to be passed to dealloc by computing aligned - ptrdiff. But most importantly, I'm wondering how this change would affect extract_aligned_pointer_as_index. If it will return the 'aligned ptr with offset', our implementation would break. I guess what we could do to fix it, is combining the two ops such that we first use extract_strided_metadata to get the base memref and then use extract_aligned_pointer_as_index on that; or alternatively introduce an analogous extract_base_pointer_as_index operation.

Hardcode84 commented 1 year ago

there is only one pointer in the descriptor, so we can actually express (no)aliasing properties properly.

Isn't the same can be achieved by just storing allocated pointer as intptr_t and doing inttoptr/ptrtoint as needed? I this case you will have one less computation both for upstreamd dealloc and downstream users.

qcolombet commented 1 year ago

there is only one pointer in the descriptor, so we can actually express (no)aliasing properties properly.

Isn't the same can be achieved by just storing allocated pointer as intptr_t and doing inttoptr/ptrtoint as needed? I this case you will have one less computation both for upstreamd dealloc and downstream users.

Yes, but you'll have to lie for the alias analysis like @ftynse said.

joker-eph commented 1 year ago

@ftynse I'm not entirely sure I understood so the descriptor would be:

struct ranked_memref_descriptor {
  ptrdiff_t offset; // Offset from the aligned pointer back to the allocated pointer
  void *aligned_ptr; // Ptr to the first element of the memref.
  int64_t sizes[];
  int64_t strides[];
}

So what disappears is the allocated data pointer?

If so, than why not getting right of the offset instead?

struct ranked_memref_descriptor {
  void *alloc; // Allocated pointer
  void *aligned_ptr; // Ptr to the first element of the memref.
  int64_t sizes[];
  int64_t strides[];
}

I'm not sure I follow the discussion about alias/noalias, the no alias LangRef section says:

noalias This indicates that memory locations accessed via pointer values based on the argument or return value are not also accessed, during the execution of the function, via pointer values not based on the argument or return value. This guarantee only holds for memory locations that are modified, by any means, during the execution of the function. The attribute on a return value also has additional semantics described below. The caller shares the responsibility with the callee for ensuring that these requirements are met. For further details, please see the discussion of the NoAlias response in alias analysis.

We never access the memory or modify it through the alloc pointer, it shouldn't alias with the aligned_ptr?

What seems more difficult to me is that two different subview can alias with each other and so when it comes to accesses from two different descriptors we can't say anything about aliasing as far as I know?

vlad-vinogradov-47 commented 1 year ago

Personally, I'm +1 for @joker-eph proposal with 2 pointers without offset:

struct ranked_memref_descriptor {
  void *alloc_ptr; // Allocated pointer
  void *aligned_ptr; // Ptr to the first element of the memref.
  int64_t sizes[];
  int64_t strides[];
}

This case doesn't require any pointer-integer arithmetics in run-time. I would only rename aligned_ptr to somthing like data_ptr or first_elem_ptr to be more clear about its semantic.

ftynse commented 1 year ago

Isn't the same can be achieved by just storing allocated pointer as intptr_t and doing inttoptr/ptrtoint as needed? I this case you will have one less computation both for upstreamd dealloc and downstream users.

It is possible indeed, although feels very hacky to me. Upstream, I don't expect one addition/subtraction around allocation/deallocation to matter compared to the duration of the actual dynamic memory management call. Downstream may be another story, but I have zero visibility into that.

So what disappears is the allocated data pointer?

If so, than why not getting right of the offset instead?

It can be see as repurposing the offset field to store the distance between the allocated and aligned pointer, but I'd prefer it to be less subtle so we don't accidentally misuse it, hence the simultaneous name/type/position change.

We never access the memory or modify it through the alloc pointer, it shouldn't alias with the aligned_ptr?

We may be freeing the memref, which would be a modification that requires noalias guarantees to hold as per the following explanation in alias analysis:

The NoAlias response may be used when there is never an immediate dependence between any memory reference based on one pointer and any memory reference based the other. [...] Another is when the two pointers are only ever used for reading memory.

Consider specifically the following case:

func.func @f(%A: memref<*> {noalias}, %B: memref<*> {noalias}) {
  // ... access %A ...
  // ... access %B ...
  memref.free %A
}

Attributes are propagated during type conversion, which is context-free. Even if it were context-aware, we may only have a function declaration in that context. Converting the function signature to the following (which is what happens currently):

llvm.func @f(%A_allocated: !llvm.ptr {noalias}, %A_aligned: !llvm.ptr {noalias}, ...,
             %B_allocated: !llvm.ptr {noalias}, %B_aligned: !llvm.ptr {noalias}, ...)

may or may not be correct depending on %A being potentially freed in the function body. Certainly, we can be conservative about it and remove the noalias in absence of nofree on the memref (which we don't model AFAIK), but that sounds like introducing unnecessary subtleties in the already difficult type conversion. Having only one pointer is a much easier solution, which also matches the intuition most users seem to have about memref being pointer+sizes.

What seems more difficult to me is that two different subview can alias with each other and so when it comes to accesses from two different descriptors we can't say anything about aliasing as far as I know?

If we need to rely on this information in MLIR proper, we need to propagate it (and likely capture it as discardable attributes). It's a relatively straightforward dataflow analysis and one can see, e.g., that subviews originate from memref-typed function arguments that are marked "noalias" or that they originate from the same memref but with offset/size/stride configurations that define non-overlapping spaces (@pengmai is looking into that in context of Enzyme). We can now also express aliasing scopes thanks to distinct attributes. So far, we didn't need to rely on this upstream and most uses of noalias simply wanted to communicate this information to the LLVM optimizer.

joker-eph commented 1 year ago

Consider specifically the following case:

func.func @f(%A: memref<> {noalias}, %B: memref<> {noalias}) { // ... access %A ... // ... access %B ... memref.free %A }

I considered it, but the memref.free does not modify the memory, so it's not clear to me that this violates the noalias specification as currently spelled out in LLVM LangRef.

I'm fine of course if this is not possible and we'll use an offset for the alloc instead!

qcolombet commented 1 year ago

@joker-eph and @ftynse did we land on a design here?

joker-eph commented 1 year ago

I'm fine with @ftynse proposal.

llvm / llvm-project

[BUG][MLIR] expand-strided-metadata pass changes IR semantic #64334