Open llvmbot opened 17 years ago
mentioned in issue llvm/llvm-bugzilla-archive#8512
mentioned in issue llvm/llvm-bugzilla-archive#7303
mentioned in issue llvm/llvm-bugzilla-archive#6204
mentioned in issue llvm/llvm-bugzilla-archive#3451
mentioned in issue llvm/llvm-bugzilla-archive#34405
mentioned in issue llvm/llvm-bugzilla-archive#3434
mentioned in issue llvm/llvm-bugzilla-archive#3352
mentioned in issue llvm/llvm-bugzilla-archive#31265
mentioned in issue llvm/llvm-bugzilla-archive#3037
here my thoughts, GEP calculates the pointer address based on the data layout of the specified type. vector type is explicitly specified in the data layout of target and should not be assumed to be derived from corresponding scalar type, e.g. <8 x i1> from i1. The other point is not state very clearly in LLVM lang ref is that alignment (specified in bits) is always specified in unit of BYTE. That constrain is only stated for the natural stack alignment. But, we should assume that all alignment. In other word, GEP only calculates the pointer address in unit of BYTE.
Back to your specific issue, I agreed with that your frontend should not generated GEP into vector of i1. Instead, it should generate that in 2 steps of extraction of the corresponding bit following the vector load.
I have been thinking about my GEP issue some more.
On my target (which has typical byte-addressable memory and 64 bit pointers), what does a GEP into a vector of i1 mean? If you allow it, it surely means that any pointer to i1 might (or might not) be the result of a GEP into vector of i1, and therefore needs to keep a bit offset as well as the byte address. My target certainly does not support that, and more generally I think LLVM assumes that a pointer to anything (with a particular address space) is always the same size, whatever type it points to.
So my frontend should not be generating GEP into vector of i1, because it is not representable on the target that it is targeting.
I guess that in itself does not make a GEP into vector of i1 illegal IR, as someone may want to implement a target where it does make sense. But it falls into a class of things that you can only use in IR if it makes sense on your target.
If someone did want to implement a target where it makes sense, that probably means bit addressable memory, which means that the assumptions in LLVM that a store size is an integral number of bytes would need to be fixed. But I guess this is unlikely to be needed.
Yes, I committed the handling for stores of vectors with non byte-sized elements, see https://reviews.llvm.org/D42100.
I think Uli and Eli had an interesting discussion there on Jan 30, taking up this very topic that I think you are concerned about.
To me, it seems that although (I think) there is no formal statement anywhere, their discussion seems to indicate the path to follow. I am thinking about exactly how a vector should be stored in memory on big/little endian machines, including non-byte sized elements.
Perhaps the optimization passes you mention can be extended, or at least learn to reject cases they can not handle?
Work is still progressing, although at a glacial pace - Jonas Paulsson has done some work recently for instance.
I have just come across this because I am using the SPIR-V frontend https://github.com/KhronosGroup/SPIRV-LLVM and I have a case where it generates a GEP into a vector of i1 in memory.
A lot of code assumes that vector of i1 is packed (as already noted here), but at least GVN and alias analysis cannot cope with the GEP because it is not to a byte boundary.
Given that this issue has been hanging around for 10 years and does not seem to have a resolution, should LLVM stop claiming to fully support vector of i1? I.e. should the spec and the verify pass say that vector of i1 is illegal, or say that vector of i1 is legal but you can't GEP into it, or something?
A strong use case for the bitpacked approach is Parabix software. Parabix
is a framework for high-performance text processing using the concept of parallel bit streams. In Parabix, bytestreams
One significant application of Parabix is the icgrep search application, which generally achieves GB/s regular expression search even with complex regular expressions.
We have workarounds, but bitpacked vectors of i1 would be ideal.
Making this a parent bug makes sense to me.
Shall we make this bug a parent bug for more concrete bugs due to the same issue but in different components. The issue raised in this bug is quite high level and covers much more code/design. Some bugs is root-caused in the same but the fix would be quite different and probably not in relevant components. Making this a parent bug will track all related/children bugs more easily.
Bug llvm/llvm-bugzilla-archive#15131 has been marked as a duplicate of this bug.
Bug llvm/llvm-bugzilla-archive#12650 has been marked as a duplicate of this bug.
Bug llvm/llvm-bugzilla-archive#11460 has been marked as a duplicate of this bug.
There's another possibility for big-endian machines: on such a machine, when storing a vector < 0, 1, 2, 3> store the components in reverse order: i.e. 3 first, 2 at next memory location etc. Who says that the lowest numbered index has to be stored first? Anyway, the result is that bitcast is then still the same as "store as one type, load out as another" on big-endian machines too. More generally, when a vector operation refers to element i, turn that into the processor operation on element num_elements - i.
The steps needed to implement this are:
The hardest part will be doing the codegen part for PowerPC.
While there, why not extend bitcast to first class structs and arrays too?
There's no reason why it would be easier now than later, it would take work to implement, and it's not needed to fix the main bug here. And because no one should need it, imho, but we can discuss that elsewhere.
Hi Dan, your suggestion seems fairly reasonable to me. While there, why not extend bitcast to first class structs and arrays too?
I think it's a mistake to make bitcast be the operation which determines how everything else must work.
I propose the first step to solving this problem is to redefine the bitcast operator. Instead of saying it is equivalent to a store and load in the new type, I propose it be defined as:
A bitcast operator reproduces the bits of its operand in precedence order.
For the purposes of bitcasting, the precedence of a bit in a vector value is the precedence index of the bit in its element plus the element index of its vector element times the precision of the element type.
This is an intuitive definition, and it makes bitcast a fully target-independent operation that doesn't care about endianness. Bitcasting an integer to a vector would always put the most significant bit in the same place, which is not true under the store+load definition.
On little-endian targets, there's effectively no change. On big-endian targets, this would change the byte order of bitcasts between vectors and non-vectors and between vectors with elements of different sizes. If front-ends desire the old behavior, they can emit an explicit llvm.bswap operations.
If we accept this definition, then we can move on to consider what makes the most sense for vectors without being artificially constrained.
I now think that vectors should be stored in memory in a target dependent manner, which may or may not be the same as how arrays are stored in memory. This means fixing all the places that think you can get the size of an array without target data, not to mention users of VectorType::getInteger VectorType::getExtendedElementVectorType VectorType::getTruncatedElementVectorType VectorType::getBitWidth since these methods are all essentially assuming that vectors are bitpacked.
An additional issue is that bitcast of vectors becomes problematic and should probably be disallowed. The issue is that bitcast is only legal if the source and destination types have the same bitwidth. But the bitwidth of a vector will now depend on the target, so the legality of the IR will depend on the target, which is not currently the case.
And for the record, r97064 was reverted, following the discussion in the email thread linked above.
The decision is: vectors will be bit-packed. GEP on vectors will be disallowed.
There are still several issues, see this email:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100222/096825.html
I committed the patch, plus several fixes to CodeGen to fix lowering of vectors with element padding, in r97064.
proposed patch I propose we accept (2), lay out vectors in memory as arrays. This is good enough for most cases, and much less complicated to reason about. Note that this only affects the in-memory representation (not registers), and it only affects memory visible to the IR (not spill slots, constant pool entries, stack arguments, etc.)
People needing to bitpack vectors of i1 in IR-visible memory can bitcast the vectors to integers and store them as integers. If this becomes insufficient, we can revisit this issue.
Attached is a patch which implements the TargetData portion of this (I haven't looked at whether LegalizeTypes needs anything here yet).
Not to mention vectors of i24, and other types that are not naturally aligned.
Bug llvm/llvm-bugzilla-archive#6204 has been marked as a duplicate of this bug.
Bug llvm/llvm-bugzilla-archive#3434 has been marked as a duplicate of this bug.
Sorry, should have been getBitWidth rather than getPrimitiveSizeInBits: it is VectorType::getBitWidth that returns the number of elements * the element size in bits. Note that codegen does the same for MVT::getSizeInBits for a vector. The result is that vectors of i1 are considered to be bitpacked...
PS: While LegalizeDAG will happily "codegen" vectors of i1 and long double, the result is often bogus. This is why I put asserts in LegalizeTypes in various spots where the logic is wrong for such vectors
To be more explicit about "There's a similar problem for vectors of apints",
consider a vector
Possible solutions: (1) bitpack vectors. This could be done but seems like a lot of work for not much benefit. Also, I think some targets support vector compare results producing a "vector of booleans". It would be nice if this mapped in a convenient way to a vector of i1. (2) lay vectors out like arrays. So elements of a vector of long double would be spaced by 12/16 bytes depending on the target. Vectors of i1 would typically get 1 byte per i1. (3) byte pack vectors. Here vectors of long double would have elements spaced by 10 bytes (unaligned!). Vectors of i1 would have elements 1 byte apart. My preference is for (2). However if GEP is disallowed for vectors, then other schemes become more feasible.
I think Chris needs to make a policy decision here.
Yes, it looks like gcc supports vectors of long double. I compiled this with it on x86-32 and it compiles (didn't check whether the assembler is sensible, or how it lays out the vector).
typedef long double dvec attribute ((__vector_size__ (24))); void d(dvec *x); int main() { dvec x, y, z; d(&x); d(&y); z = x + y; d(&z); return 0; }
Also, we may want to support vectors of boolean as results of vector comparisons.
Does gcc support vector of long double? Maybe it would make sense to just reject all vectors with elements that would require intra-element padding?
Extended Description
I noticed that a lot of the vector code assumes that there is no padding between vector elements. For example, the size is assumed to be the primitive size of the element times the number of elements. However for x86 long double the primitive size is 80 bits while elements will be spaced by 96/128 bits (depending on the os) in order to maintain alignment. There's a similar problem for vectors of apints.