llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.69k stars 11.87k forks source link

Implement N3369 (`_Lengthof`) #102836

Open alejandro-colomar opened 2 months ago

alejandro-colomar commented 2 months ago

Hi!

I've sent a patch set to GCC for adding a __lengthof__ operator: https://inbox.sourceware.org/gcc-patches/20240728141547.302478-1-alx@kernel.org/T/#t

There's a related proposal for ISO C (wg14): https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2529.pdf (although the proposal is old (not authored by me), and isn't as curated as the GCC patches). I have the intention of refining that proposal and sending a new one.

The specifications of the operator are:

The keyword __lengthof__ determines the length of an array operand,
that is, the number of elements in the array.
Its syntax is similar to sizeof.
The operand must be a complete array type or an expression of that type.
For example:

    int a[n];
    __lengthof__(a);           // returns n
    __lengthof__(int [7][3]);  // returns 7

The result of this operator is an integer constant expression,
unless the top-level array is a variable-length array.
The operand is only evaluated if the top-level array is a variable-length array.
For example:

    __lengthof__(int [7][n++]);  // integer constant expression
    __lengthof__(int [n++][7]);  // run-time value; n++ is evaluated

There are a few interesting reasons why this feature is better than just a macro around the usual sizeof division:

Please feel free to give any feedback for the feature in the GCC thread.

Are you interested in this feature?

llvmbot commented 2 months ago

@llvm/issue-subscribers-clang-frontend

Author: Alejandro Colomar (alejandro-colomar)

Hi! I've sent a patch set to GCC for adding a `__lengthof__` operator: <https://inbox.sourceware.org/gcc-patches/20240728141547.302478-1-alx@kernel.org/T/#t> There's a related proposal for ISO C (wg14): <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2529.pdf> (although the proposal is old (not authored by me), and isn't as curated as the GCC patches). The specifications of the operator are: ``` The keyword __lengthof__ determined the length of an array operand, that is, the number of elements in the array. Its syntax is similar to sizeof. The operand must be a complete array type or an expression of that type. For example: int a[n]; __lengthof__(a); // returns n __lengthof__(int [7][3]); // returns 7 The result of this operator is an integer constant expression, unless the top-level array is a variable-length array. The operand is only evaluated if the top-level array is a variable-length array. For example: __lengthof__(int [7][n++]); // integer constant expression __lengthof__(int [n++][7]); // run-time value; n++ is evaluated ``` There are a few interesting reasons why this feature is better than just a macro around the usual sizeof division: - This keyword could be extended in the future to also give the length of a function parameter declared with array notation and a specified length. - This macro causes a compiler error if the argument is not an array (it's a constraint violation). - It results in a constant expression in some cases where sizeof would evaluate the operand. For example: `__lengthof__(int [7][n++])`. - It only evaluates the operand once for VLAs, where the sizeof division would evaluate twice (one per sizeof call). Please feel free to give any feedback for the feature in the GCC thread. Are you interested in this feature?
AaronBallman commented 2 months ago

I think this is a reasonably common need; users can use the sizeof(array) / sizeof(array[0]) trick, but having a dedicated operator to do this instead would help catch mistakes.

One edge case would be with flexible array members; should those be a constraint violation?

Clang already supports __array_extent as a type trait, but only in C++: https://godbolt.org/z/54MKdMGd7 One thing that's interesting though is that makes it much more clear as to what "length" means for multidimensional arrays. You have to ask on a per-rank basis what the length is. Have you considered a similar design?

In terms of standardization, it's worth noting that N2529 has not been seen by WG14 and so it's unclear how the committee feels about the idea. That leaves some concerns (WG14 has a habit of renaming things or altering semantics slightly), but I think they could be overcome.

alejandro-colomar commented 2 months ago

I think this is a reasonably common need; users can use the sizeof(array) / sizeof(array[0]) trick, but having a dedicated operator to do this instead would help catch mistakes.

One edge case would be with flexible array members; should those be a constraint violation?

For now they are a constraint violation (incomplete types are rejected), with a reservation of the right to extend support to them.

If there appears a way to add length information such as proposed in https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3188.htm, it would make sense to extend this operator to work with them.

We've also discussed about supporting the [[gnu::counted_by()]] attribute, but the feedback was mixed, and the consensus was to not do it, at least for now. We prefer to only support array lengths that are expressed using the type system.

Clang already supports __array_extent as a type trait, but only in C++: https://godbolt.org/z/54MKdMGd7 One thing that's interesting though is that makes it much more clear as to what "length" means for multidimensional arrays. You have to ask on a per-rank basis what the length is. Have you considered a similar design?

The usual ARRAY_SIZE() or NITEMS() macros are the prior art I based the feature on. I like it because it's simple. And it's common-enough already that I expect it to be easy to explain.

I didn't consider something like __array_extent, because it's usually as easy as:

__array_extent(decltype(foo), 4) == __lengthof__(****foo)

That is, with regular language expressions you can ask for whatever length you're interested in.

In terms of standardization, it's worth noting that N2529 has not been seen by WG14 and so it's unclear how the committee feels about the idea.

Several WG14 members are CCed in the GCC thread (and a few more in a discussion about the state of that paper prior to the development of the patch. Around half a dozen in total. So far they haven't complained, other than suggesting the usual pedantic wording refinements (very welcome, of course). :)

That leaves some concerns (WG14 has a habit of renaming things or altering semantics slightly), but I think they could be overcome.

That's why we've started with the keyword __lengthof__ in GCC, to make it a GNU extension, without entering into ISO C reserved words territory. I expect that the semantics won't be touched by WG14. We're prepared to accept a new name (we expect _Lengthof, and then likely lengthof).

One detail where we didn't have consensus is in accepting expressions without parentheses like sizeof, or requiring them. One WG14 member suggested that we start clean without the mistakes of sizeof. But so far, the implementation is like sizeof in this regard. I think they should match, and if WG14 wants to remove the parentheses from lengthof, they should start by deprecating it from sizeof. But we can still provide sizeof-like behavior in GCC as an extension if ISO C decides to disagree. Having sizeof and lengthof differ here would mean more duplication of code in the compiler, which I'd avoid.

AaronBallman commented 2 months ago

I think this is a reasonably common need; users can use the sizeof(array) / sizeof(array[0]) trick, but having a dedicated operator to do this instead would help catch mistakes. One edge case would be with flexible array members; should those be a constraint violation?

For now they are a constraint violation (incomplete types are rejected), with a reservation of the right to extend support to them.

I could live with that.

If there appears a way to add length information such as proposed in https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3188.htm, it would make sense to extend this operator to work with them.

We've also discussed about supporting the [[gnu::counted_by()]] attribute, but the feedback was mixed, and the consensus was to not do it, at least for now. We prefer to only support array lengths that are expressed using the type system.

Great, thank you!

Clang already supports __array_extent as a type trait, but only in C++: https://godbolt.org/z/54MKdMGd7 One thing that's interesting though is that makes it much more clear as to what "length" means for multidimensional arrays. You have to ask on a per-rank basis what the length is. Have you considered a similar design?

The usual ARRAY_SIZE() or NITEMS() macros are the prior art I based the feature on. I like it because it's simple. And it's common-enough already that I expect it to be easy to explain.

I didn't consider something like __array_extent, because it's usually as easy as:

__array_extent(decltype(foo), 4) == __lengthof__(****foo)

That is, with regular language expressions you can ask for whatever length you're interested in.

Given that the only reason to add this feature is to help users more clearly express their intent, I definitely am not a fan of playing "guess how declarators and operators relate to one another" for the feature. Multi-dimensional arrays are a fairly prolific feature of C and it seems to me that getting the array rank and extent is a reasonable thing for users to want to do, and that maps nicely to the C++ features (https://en.cppreference.com/w/cpp/types/rank and https://en.cppreference.com/w/cpp/types/extent) used to get the same information.

In terms of standardization, it's worth noting that N2529 has not been seen by WG14 and so it's unclear how the committee feels about the idea.

Several WG14 members are CCed in the GCC thread (and a few more in a discussion about the state of that paper prior to the development of the patch. Around half a dozen in total. So far they haven't complained, other than suggesting the usual pedantic wording refinements (very welcome, of course). :)

There's three (active) committee members on that thread, and I make four, but that's only a bit over 10% of the committee.

That leaves some concerns (WG14 has a habit of renaming things or altering semantics slightly), but I think they could be overcome.

That's why we've started with the keyword __lengthof__ in GCC, to make it a GNU extension, without entering into ISO C reserved words territory. I expect that the semantics won't be touched by WG14. We're prepared to accept a new name (we expect _Lengthof, and then likely lengthof).

If WG14 insists on a design that separates rank and extent, that would be a pretty major shift in semantics and it would be unfortunate for either GCC or Clang to have to carry the extension interface in that case. Before we went ahead with such a feature in Clang, we'd really need some sort of signal from WG14 on that design decision (this is part of our criteria for adding extensions: https://clang.llvm.org/get_involved.html#criteria).

One detail where we didn't have consensus is in accepting expressions without parentheses like sizeof, or requiring them. One WG14 member suggested that we start clean without the mistakes of sizeof. But so far, the implementation is like sizeof in this regard. I think they should match, and if WG14 wants to remove the parentheses from lengthof, they should start by deprecating it from sizeof. But we can still provide sizeof-like behavior in GCC as an extension if ISO C decides to disagree. Having sizeof and lengthof differ here would mean more duplication of code in the compiler, which I'd avoid.

We already broke from tradition in that regard with typeof (it accepts either a type or an expression same as sizeof but it requires the parentheses) and alignof (it only accepts a parenthesized type name in ISO C, but both Clang and GCC allow an expression operand and no parens as an extension), but my preference would be to follow sizeof if we kept this interface and require parens if we went with a rank/extent pair of operators. We could technically leave off the parens for rank when given a type operand but then it would be inconsistent between rank and extent, which doesn't seem like good design. (I also think we should probably write a paper to allow alignof unary-expression same as sizeof given that it's a commonly supported extension, but that's neither here nor there for your proposal.)

alejandro-colomar commented 2 months ago

If there appears a way to add length information such as proposed in https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3188.htm, it would make sense to extend this operator to work with them. We've also discussed about supporting the [[gnu::counted_by()]] attribute, but the feedback was mixed, and the consensus was to not do it, at least for now. We prefer to only support array lengths that are expressed using the type system.

Great, thank you!

:-)

Clang already supports __array_extent as a type trait, but only in C++: https://godbolt.org/z/54MKdMGd7 One thing that's interesting though is that makes it much more clear as to what "length" means for multidimensional arrays. You have to ask on a per-rank basis what the length is. Have you considered a similar design?

The usual ARRAY_SIZE() or NITEMS() macros are the prior art I based the feature on. I like it because it's simple. And it's common-enough already that I expect it to be easy to explain. I didn't consider something like __array_extent, because it's usually as easy as:

__array_extent(decltype(foo), 4) == __lengthof__(****foo)

That is, with regular language expressions you can ask for whatever length you're interested in.

Given that the only reason to add this feature is to help users more clearly express their intent,

Not the only one. To me, the main reason is the "future directions" note that foresees adding support to function parameters declared with array notation. I'm paving the way for it. If this keyword was to stay as just a standard ARRAY_SIZE() macro, I wouldn't be so much interested. But we have to start somewhere. :)

I definitely am not a fan of playing "guess how declarators and operators relate to one another" for the feature. Multi-dimensional arrays are a fairly prolific feature of C

Yup, a very nice feature of C, indeed.

and it seems to me that getting the array rank and extent is a reasonable thing for users to want to do, and that maps nicely to the C++ features (https://en.cppreference.com/w/cpp/types/rank and https://en.cppreference.com/w/cpp/types/extent) used to get the same information.

Hmm.

In terms of standardization, it's worth noting that N2529 has not been seen by WG14 and so it's unclear how the committee feels about the idea.

Several WG14 members are CCed in the GCC thread (and a few more in a discussion about the state of that paper prior to the development of the patch. Around half a dozen in total. So far they haven't complained, other than suggesting the usual pedantic wording refinements (very welcome, of course). :)

There's three (active) committee members on that thread, and I make four, but that's only a bit over 10% of the committee.

Yup, plus other two that I had asked if they know about the state of n2529 before that thread.

That leaves some concerns (WG14 has a habit of renaming things or altering semantics slightly), but I think they could be overcome.

That's why we've started with the keyword __lengthof__ in GCC, to make it a GNU extension, without entering into ISO C reserved words territory. I expect that the semantics won't be touched by WG14. We're prepared to accept a new name (we expect _Lengthof, and then likely lengthof).

If WG14 insists on a design that separates rank and extent, that would be a pretty major shift in semantics and it would be unfortunate for either GCC or Clang to have to carry the extension interface in that case. Before we went ahead with such a feature in Clang, we'd really need some sort of signal from WG14 on that design decision (this is part of our criteria for adding extensions: https://clang.llvm.org/get_involved.html#criteria).

Makes sense. I've started to develop a paper for WG14, as you may have seen in your mailbox. :)

One detail where we didn't have consensus is in accepting expressions without parentheses like sizeof, or requiring them. One WG14 member suggested that we start clean without the mistakes of sizeof. But so far, the implementation is like sizeof in this regard. I think they should match, and if WG14 wants to remove the parentheses from lengthof, they should start by deprecating it from sizeof. But we can still provide sizeof-like behavior in GCC as an extension if ISO C decides to disagree. Having sizeof and lengthof differ here would mean more duplication of code in the compiler, which I'd avoid.

We already broke from tradition in that regard with typeof (it accepts either a type or an expression same as sizeof but it requires the parentheses) and alignof (it only accepts a parenthesized type name in ISO C, but both Clang and GCC allow an expression operand and no parens as an extension), but my preference would be to follow sizeof if we kept this interface and require parens if we went with a rank/extent pair of operators. We could technically leave off the parens for rank when given a type operand but then it would be inconsistent between rank and extent, which doesn't seem like good design. (I also think we should probably write a paper to allow alignof unary-expression same as sizeof given that it's a commonly supported extension, but that's neither here nor there for your proposal.)

Agree.

alejandro-colomar commented 2 months ago

Here's a link to the already submitted proposal to WG14: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3313.pdf

alejandro-colomar commented 3 weeks ago

This has been merged as _Lengthof into C2y today.

The paper that was merged was https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3369.pdf with some trivial/editorial wording cosmetic changes on top of it.

cor3ntin commented 3 weeks ago

@alejandro-colomar the fact it was standardized mean it can be implemented in clang without further discussions on whether we want it. Are you interested in submitting a pull request? Otherwise someone will get to it as part of our conformance work

alejandro-colomar commented 3 weeks ago

@alejandro-colomar the fact it was standardized mean it can be implemented in clang without further discussions on whether we want it. Are you interested in submitting a pull request? Otherwise someone will get to it as part of our conformance work

I would want to attempt it. :-)

I've never written any patches for Clang/LLVM, AFAIR, so I would appreciate some help on where should I look in the code. I expect it to be similar to GCC, but of course different, so some help would help.

If I find myself unable, I'll let you know. Thanks!

alejandro-colomar commented 3 weeks ago

If I find myself unable, I'll let you know. Thanks!

@cor3ntin

I've been looking at the code, and there's too much C++ for my taste; I give up. Would someone else mind implementing it? :-)

AaronBallman commented 3 weeks ago

No worries, we'll get around to it at some point, thanks for looking!