vectors of i1 and vectors x86 long double don't work - Githubissues

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

http://llvm.org

Other

29k stars 11.95k forks source link

vectors of i1 and vectors x86 long double don't work #2156

Open llvmbot opened 17 years ago

llvmbot commented 17 years ago


Bugzilla Link	1784
Version	trunk
OS	Linux
Blocks	llvm/llvm-bugzilla-archive#31265 llvm/llvm-bugzilla-archive#3037 llvm/llvm-bugzilla-archive#3352 llvm/llvm-bugzilla-archive#7303
Reporter	LLVM Bugzilla Contributor
CC	@asl,@hfinkel,@RKSimon,@sunfishcode,@darkbuck,@JonPsson,@kaomoneus,@trenouf,@Nuullll

Extended Description

I noticed that a lot of the vector code assumes that there is no padding between vector elements. For example, the size is assumed to be the primitive size of the element times the number of elements. However for x86 long double the primitive size is 80 bits while elements will be spaced by 96/128 bits (depending on the os) in order to maintain alignment. There's a similar problem for vectors of apints.

llvmbot commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#8512

llvmbot commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#7303

llvmbot commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#6204

llvmbot commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#3451

llvmbot commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#34405

llvmbot commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#3434

llvmbot commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#3352

RKSimon commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#31265

llvmbot commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#3037

darkbuck commented 6 years ago

here my thoughts, GEP calculates the pointer address based on the data layout of the specified type. vector type is explicitly specified in the data layout of target and should not be assumed to be derived from corresponding scalar type, e.g. <8 x i1> from i1. The other point is not state very clearly in LLVM lang ref is that alignment (specified in bits) is always specified in unit of BYTE. That constrain is only stated for the natural stack alignment. But, we should assume that all alignment. In other word, GEP only calculates the pointer address in unit of BYTE.

Back to your specific issue, I agreed with that your frontend should not generated GEP into vector of i1. Instead, it should generate that in 2 steps of extraction of the corresponding bit following the vector load.

trenouf commented 6 years ago

I have been thinking about my GEP issue some more.

On my target (which has typical byte-addressable memory and 64 bit pointers), what does a GEP into a vector of i1 mean? If you allow it, it surely means that any pointer to i1 might (or might not) be the result of a GEP into vector of i1, and therefore needs to keep a bit offset as well as the byte address. My target certainly does not support that, and more generally I think LLVM assumes that a pointer to anything (with a particular address space) is always the same size, whatever type it points to.

So my frontend should not be generating GEP into vector of i1, because it is not representable on the target that it is targeting.

I guess that in itself does not make a GEP into vector of i1 illegal IR, as someone may want to implement a target where it does make sense. But it falls into a class of things that you can only use in IR if it makes sense on your target.

If someone did want to implement a target where it makes sense, that probably means bit addressable memory, which means that the assumptions in LLVM that a store size is an integral number of bytes would need to be fixed. But I guess this is unlikely to be needed.

JonPsson commented 6 years ago

Yes, I committed the handling for stores of vectors with non byte-sized elements, see https://reviews.llvm.org/D42100.

I think Uli and Eli had an interesting discussion there on Jan 30, taking up this very topic that I think you are concerned about.

To me, it seems that although (I think) there is no formal statement anywhere, their discussion seems to indicate the path to follow. I am thinking about exactly how a vector should be stored in memory on big/little endian machines, including non-byte sized elements.

Perhaps the optimization passes you mention can be extended, or at least learn to reject cases they can not handle?

RKSimon commented 6 years ago

Work is still progressing, although at a glacial pace - Jonas Paulsson has done some work recently for instance.

trenouf commented 6 years ago

I have just come across this because I am using the SPIR-V frontend https://github.com/KhronosGroup/SPIRV-LLVM and I have a case where it generates a GEP into a vector of i1 in memory.

A lot of code assumes that vector of i1 is packed (as already noted here), but at least GVN and alias analysis cannot cope with the GEP because it is not to a byte boundary.

Given that this issue has been hanging around for 10 years and does not seem to have a resolution, should LLVM stop claiming to fully support vector of i1? I.e. should the spec and the verify pass say that vector of i1 is illegal, or say that vector of i1 is legal but you can't GEP into it, or something?

llvmbot commented 8 years ago

A strong use case for the bitpacked approach is Parabix software. Parabix is a framework for high-performance text processing using the concept of parallel bit streams. In Parabix, bytestreams are transformed to parallel arrays of bit streams . The streams are then processed 256 elements at a time using the 256-bit registers of AVX2, for example.

One significant application of Parabix is the icgrep search application, which generally achieves GB/s regular expression search even with complex regular expressions.

We have workarounds, but bitpacked vectors of i1 would be ideal.

llvmbot commented 11 years ago

Making this a parent bug makes sense to me.

darkbuck commented 11 years ago

Shall we make this bug a parent bug for more concrete bugs due to the same issue but in different components. The issue raised in this bug is quite high level and covers much more code/design. Some bugs is root-caused in the same but the fix would be quite different and probably not in relevant components. Making this a parent bug will track all related/children bugs more easily.

llvmbot commented 11 years ago

Bug llvm/llvm-bugzilla-archive#15131 has been marked as a duplicate of this bug.

llvmbot commented 12 years ago

Bug llvm/llvm-bugzilla-archive#12650 has been marked as a duplicate of this bug.

llvmbot commented 12 years ago

Bug llvm/llvm-bugzilla-archive#11460 has been marked as a duplicate of this bug.

llvmbot commented 13 years ago

There's another possibility for big-endian machines: on such a machine, when storing a vector < 0, 1, 2, 3> store the components in reverse order: i.e. 3 first, 2 at next memory location etc. Who says that the lowest numbered index has to be stored first? Anyway, the result is that bitcast is then still the same as "store as one type, load out as another" on big-endian machines too. More generally, when a vector operation refers to element i, turn that into the processor operation on element num_elements - i.

sunfishcode commented 13 years ago

The steps needed to implement this are:

teach optimizer passes which convert store+load to bitcast to check endianness and either be conservative or insert bswap calls
teach front-ends to emit llvm.bswap calls around bitcasts on big-endian targets when appropriate
teach codegen to convert bswap calls around such bitcasts into no-ops.

The hardest part will be doing the codegen part for PowerPC.

sunfishcode commented 13 years ago

While there, why not extend bitcast to first class structs and arrays too?

There's no reason why it would be easier now than later, it would take work to implement, and it's not needed to fix the main bug here. And because no one should need it, imho, but we can discuss that elsewhere.

llvmbot commented 13 years ago

Hi Dan, your suggestion seems fairly reasonable to me. While there, why not extend bitcast to first class structs and arrays too?

sunfishcode commented 13 years ago

I think it's a mistake to make bitcast be the operation which determines how everything else must work.

I propose the first step to solving this problem is to redefine the bitcast operator. Instead of saying it is equivalent to a store and load in the new type, I propose it be defined as:

A bitcast operator reproduces the bits of its operand in precedence order.
For the purposes of bitcasting, the precedence of a bit in a vector value is the precedence index of the bit in its element plus the element index of its vector element times the precision of the element type.

This is an intuitive definition, and it makes bitcast a fully target-independent operation that doesn't care about endianness. Bitcasting an integer to a vector would always put the most significant bit in the same place, which is not true under the store+load definition.

On little-endian targets, there's effectively no change. On big-endian targets, this would change the byte order of bitcasts between vectors and non-vectors and between vectors with elements of different sizes. If front-ends desire the old behavior, they can emit an explicit llvm.bswap operations.

If we accept this definition, then we can move on to consider what makes the most sense for vectors without being artificially constrained.

llvmbot commented 13 years ago

I now think that vectors should be stored in memory in a target dependent manner, which may or may not be the same as how arrays are stored in memory. This means fixing all the places that think you can get the size of an array without target data, not to mention users of VectorType::getInteger VectorType::getExtendedElementVectorType VectorType::getTruncatedElementVectorType VectorType::getBitWidth since these methods are all essentially assuming that vectors are bitpacked.

An additional issue is that bitcast of vectors becomes problematic and should probably be disallowed. The issue is that bitcast is only legal if the source and destination types have the same bitwidth. But the bitwidth of a vector will now depend on the target, so the legality of the IR will depend on the target, which is not currently the case.

sunfishcode commented 14 years ago

And for the record, r97064 was reverted, following the discussion in the email thread linked above.

llvmbot commented 14 years ago

The decision is: vectors will be bit-packed. GEP on vectors will be disallowed.

llvmbot commented 14 years ago

There are still several issues, see this email:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100222/096825.html

sunfishcode commented 14 years ago

I committed the patch, plus several fixes to CodeGen to fix lowering of vectors with element padding, in r97064.

sunfishcode commented 14 years ago

proposed patch I propose we accept (2), lay out vectors in memory as arrays. This is good enough for most cases, and much less complicated to reason about. Note that this only affects the in-memory representation (not registers), and it only affects memory visible to the IR (not spill slots, constant pool entries, stack arguments, etc.)

People needing to bitpack vectors of i1 in IR-visible memory can bitcast the vectors to integers and store them as integers. If this becomes insufficient, we can revisit this issue.

Attached is a patch which implements the TargetData portion of this (I haven't looked at whether LegalizeTypes needs anything here yet).

llvmbot commented 14 years ago

Not to mention vectors of i24, and other types that are not naturally aligned.

llvmbot commented 14 years ago

Bug llvm/llvm-bugzilla-archive#6204 has been marked as a duplicate of this bug.

llvmbot commented 15 years ago

Bug llvm/llvm-bugzilla-archive#3434 has been marked as a duplicate of this bug.

llvmbot commented 16 years ago

Sorry, should have been getBitWidth rather than getPrimitiveSizeInBits: it is VectorType::getBitWidth that returns the number of elements * the element size in bits. Note that codegen does the same for MVT::getSizeInBits for a vector. The result is that vectors of i1 are considered to be bitpacked...

llvmbot commented 16 years ago

PS: While LegalizeDAG will happily "codegen" vectors of i1 and long double, the result is often bogus. This is why I put asserts in LegalizeTypes in various spots where the logic is wrong for such vectors

llvmbot commented 16 years ago

To be more explicit about "There's a similar problem for vectors of apints", consider a vector . getPrimitiveSizeInBits reckons this has size n bits, i.e. that it has been bitpacked. Yet you can do GEP on them, and I'm willing to bet that the GEP doesn't address individual bits! Likewise, vast parts of codegen assume that vectors are laid out like arrays, which in this case means that each i1 is in its own byte (also incompatible with what getPrimitiveSizeInBits returns).

Possible solutions: (1) bitpack vectors. This could be done but seems like a lot of work for not much benefit. Also, I think some targets support vector compare results producing a "vector of booleans". It would be nice if this mapped in a convenient way to a vector of i1. (2) lay vectors out like arrays. So elements of a vector of long double would be spaced by 12/16 bytes depending on the target. Vectors of i1 would typically get 1 byte per i1. (3) byte pack vectors. Here vectors of long double would have elements spaced by 10 bytes (unaligned!). Vectors of i1 would have elements 1 byte apart. My preference is for (2). However if GEP is disallowed for vectors, then other schemes become more feasible.

I think Chris needs to make a policy decision here.

llvmbot commented 16 years ago

Yes, it looks like gcc supports vectors of long double. I compiled this with it on x86-32 and it compiles (didn't check whether the assembler is sensible, or how it lays out the vector).

typedef long double dvec attribute ((__vector_size__ (24))); void d(dvec *x); int main() { dvec x, y, z; d(&x); d(&y); z = x + y; d(&z); return 0; }

Also, we may want to support vectors of boolean as results of vector comparisons.

lattner commented 16 years ago

Does gcc support vector of long double? Maybe it would make sense to just reject all vectors with elements that would require intra-element padding?