Closed ned14 closed 4 years ago
@ned14 Are you referring to the empty base class optimization that is applied to span?
I assume that as usual the ABI ship has sailed, so this will be a wontfix.
c++latest has unstable ABI
As I am the culprit here my 2 cents:
Generally I would always prefer composition over inheritance. However, as far as I understand [[no_unique_address]]
is currently not supported by MSVC, which is why we went with inheritance.
I do not know whether we could change it given that clang supports it. If the maintainers are ok with [[no_unique_address]]
I am happy to provide a patch for that.
@miscco I upgraded MSVC, tried compiling my code, and static asserts about loss of standard layout fired everywhere. The cause was that the code detected that MSVC now provides span<T>
, and thus used it for the first time.
Re no unique address for span<T>
, I assume that such an optimisation would only make sense if the span's size is consteval known, and thus storage for the span could be completely eliminated entirely. I'll be blunt in saying that I don't, personally speaking, see much use for such an optimisation as span's tend to occupy ephemeral storage which the compiler can as-if eliminate storage in any case.
Me personally, though perhaps you should bring in advice from the Microsoft people first, I'd choose C compatibility over a future ABI break caused by MSVC implementing no unique address on the basis that no unique address optimisation for spans really would be very very rarely useful - only where people store spans in long term storage where lifetime escapes analysis, and they have compile-time known lengths. I know I've never written code like that, and if I did, I'd use std::array
like any other sane person. But that's my personal opinon only.
Could you check with a different standard library? As far as I can see both libstdc++ (via [[no_unique_address]]
) and libc++ (via a specialization without the _Size
data member) implement the same empty base class optimization as we do.
Also I can see some use cases of static size spans, e.g. for sliding windows over data etc.
Could you check with a different standard library? As far as I can see both libstdc++ (via [[no_unique_address]]) and libc++ (via a specialization without the _Size data member) implement the same empty base class optimization as we do.
We were already up on libc++ and libstdc++ in C++ 20, and were using their span<T>
implementations if C++ 20 is available. In fact, they had quality of implementation problems which we reported (for reference, yours was fine, worked first time, apart from standard layout).
Lots of code passes spans to non-C++ code, and not having avoidable UB here would be nice.
Could you clarify this statement? Your request is light on technical details. What code do you believe has defined behavior only with a standard-layout span
?
I think the point here is that passing an object to some C code (or whatever other language) is only well defined if the object has standard layout. Presumably @ned14 is doing that with spans.
That depends on the internal layout of the span, which is a less than stellar idea.
Agreed, and ned mentions as much.
I appreciate that is not required to have standard layout, however every other implementation does have standard layout: libstdc++'s, libc++'s, and all the third party library implementations that I am aware of. Except for Microsoft's.
In general, however, I don't actually see a problem in attempting to make classes have standard layout if there's no reason to not have standard layout. (Please correct me if I'm wrong)
Edit: That's not to say I'm in favor of bending over backwards to support something which isn't in the standard. Just if it's easy and there's no downside, why not?
Most of the span implementations use a pointer and a size. The technique I used in Outcome to preserve standard layout was something like:
template<class T, bool pointer_is_consteval, bool size_is_consteval>
struct span_storage
{
T *_ptr{nullptr};
size_t _length{0};
};
template<class T, bool pointer_is_consteval>
struct span_storage<T, pointer_is_consteval, true>
{
T *_ptr{nullptr};
};
template<class T>
struct [[no_unique_address]] span_storage<T, true, true>
{
};
template<class T>
struct span : span_storage ...
So, basically, keep all your member variables inside a single class within the inheritance hierarchy, and standard layout is preserved. Composing member variables by inheritance is what removes standard layout.
Could you give some example of code that needs span
to be standard layout?
Ah now I see, https://en.cppreference.com/w/cpp/named_req/StandardLayoutType
We have our members defined in two classes, _Span_extent and span itself.
I have prepared a PR #877 to move to compostition. However, I am unsure if this can be merged before [[no_unique_address]]
is available in MSVC
Many thanks for the fix!
It is not yet merged so I would not close the issue
I'd also like to see a concrete example, since as @sylveon noted, this seems to be taking a dependency on internal layout, not just the standard layout property.
I also tend to think that if this is so important, it should have been in the Standard (or filed as an LWG issue). We do provide non-Standard guarantees from time to time, but it's not a great habit to get into.
Ah I see now, does @ned14's use-case likely (since we still don't have an example) require knowing the internal layout of span
? Not just the fact that it's standard layout? That is definitely not something we should make a change for.
@StephanTLavavej You can test in consteval for layout compatibility with a C structure easily enough, and static assert if it doesn't match up.
For me personally, my use case is LLFIO where llfio::file_handle::buffers_type
is a span<buffer_type>
, where buffer_type
is identical to span<byte>
except that it guarantees that the layout exactly matches a struct iovec
, when on POSIX. C code, in turn, calls LLFIO functions (all of which are C-compatible, as per P1095's proposed implementation of P0709, except using Experimental.Outcome as the lightweight exceptions emulation), and it all works just lovely, no horrible C++ bindings needed. I don't strictly speaking need standard layout here, I don't really care about UB here as all the major compilers are well defined in this area. But, if UB can be avoided, that's a nice to have situation.
BTW I did ask WG21 to make span<T>
standard layout, and I was told no, as the standard doesn't require layout of implementations without a truly great reason.
Ok, so what you're looking for is a layout guarantee, not standard_layout - and I don't understand why you'd want a specific layout from MSVC's STL which doesn't run on POSIX.
Ok, so what you're looking for is a layout guarantee, not standard_layout - and I don't understand why you'd want a specific layout from MSVC's STL which doesn't run on POSIX.
The code is portable and runs on Windows. You're overthinking the layout guarantee, we have static asserts which won't compile the code if layout is incompatible. This is a situation of "guarantee through facts on the ground" rather than "guarantee through warranty". And an awful lot of C, and C++, is written on the former (e.g. good luck getting a lot of C code to work on 10 bit char platforms).
877 may still be worth considering if we can simplify our inheritance and improve debug codegen, but this issue increasingly sounds like a non-Standard request which we consider to be a non-goal.
C++ 20 requires standard layout for C++ types to be interoperable with C types. Yes that's far far too conservative, but that's the rule as in the normative wording.
I think it would be really great if future C++ relaxed the requirements to:
std::is_layout_compatible
is true for any other type definable in C.... and now the standard would match what the compilers have guaranteed for donkey's years now (well apart from the move bitcopying, that's currently still in EWG-I).
You're overthinking the layout guarantee, we have static asserts which won't compile the code if layout is incompatible.
Compatible with what? What code, specifically, is expecting a particular layout? If I understand your comments correctly, on Windows you don't need to be compatible with any external library - you are simply using a span
and then providing a guarantee that the layout resembles a struct iovec
on POSIX. I would appreciate clarification if I'm misunderstanding.
In #877, we have a question - should the pointer or the length appear first in the representation? The answer from looking at POSIX docs appears to be that the pointer should be first - which is not what that PR was initially doing. Hence my focus on the specific request here.
I am still inclined to accept the PR as a special exception to our usual goals, since it does seem reasonable to expect a pointer-length representation, there is no particular cost to doing so (other than ABI churn, and /std:c++20
is not yet locked down), and there are debug codegen gains to be had from reworking the inheritance. But I would feel better if I clearly understood the scenario.
Alasing a span
with an iovec
is still an undefined behavior in the C++ standard IMO, even if they are layout compatible, unless the span
object and the iovec
object are both members of some union.
For standard layout types, the C++ standard currently only additionally defines the behavior of inspecting common initial sequences of different members of union.
As this is the small verification program I submitted to libstdc++, I also submit it here for reference, though it's not particularly apposite for non-POSIX:
#include <span>
#include <type_traits>
#include <sys/uio.h>
using buffer_type = std::span<std::byte>;
static_assert(sizeof(buffer_type) == sizeof(struct iovec));
static_assert(std::is_trivially_copyable_v<buffer_type>);
static_assert(std::is_standard_layout_v<buffer_type>);
static_assert(std::is_layout_compatible_v<buffer_type, struct iovec>);
Alasing a span with an iovec is still an undefined behavior in the C++ standard IMO, even if they are layout compatible, unless the span object and the iovec object are both members of some union.
I don't think you understand the use context I have in mind. What I care about is that the compiler, which can see layout compatibility between input buffer sequences and output buffer sequences even though they are of unrelated types, could elide the buffer sequence repack under the as-if rule.
Current compilers won't do that of course, because a syscall causes the compiler to assume that the scatter-gather array contents have escaped analysis, and so therefore the repack is always done. However myself and various people on WG14 and WG21 are working on function annotation to prevent this, and one day soon hopefully, we can reach avoiding having to employ UB in high performance zero copy code.
I don't think you understand the use context I have in mind. What I care about is that the compiler, which can see layout compatibility between input buffer sequences and output buffer sequences even though they are of unrelated types, could elide the buffer sequence repack under the as-if rule.
I mean that "making these types layout compatible according to the C++ standard" itself doesn't reduce UB.
While caring about the compiler, we should say that incompatibility between the old implementation and iovec
is not caused by lack of standard-layoutness or layout compatibility in the standard. It's just caused by the order of members and bases.
Even if span
with dynamic extent is not standard-layout, e.g. the pointer and the length are not declared in the same class, compilers, including MSVC and other compilers using Itanium ABI, can still make its actual layout same as iovec
- for example, let the first base class hold the pointer and the second base class hold the length.
The implementation improvement you proposed is meaningful in practice, though it doesn't actually reduce UB (in the standard) currently. I think your working on function annotation is worthy.
The implementation improvement you proposed is meaningful in practice, though it doesn't actually reduce UB (in the standard) currently. I think your working on function annotation is worthy.
That is exactly the defect of the C++ standard tbh. Technically everything is UB. Calling assembly? UB. Calling any APIs with syscalls? Undefined behavior. printf("Hello World"); undefined behavior under a different exec-charset like EBCDIC. In practice, the standard library itself is implemented as undefined behavior.
In practice, [[gnu::may_alias]] attributes just do work. I think [[gnu::may_alias]] should get just standardized.
You are almost completely wrong. Calling assembly is conditionally supported with implementation-defined semantics and there is no "undefined behavior" specified by the C++ standard. Calling with syscalls roughly has same properties. The difference between ASCII/EBCDIC charset doesn't affect anything here if the code is compiled with correct options.
Calling with syscalls roughly has same properties. The difference between ASCII/EBCDIC charset doesn't affect anything here if the code is compiled with correct options.
It is undefined behavior. Calling POSIX APIs is undefined behavior since standard only treats stdio and iostream as I/O. Any other functions, the compiler is free to optimize it away since it is not a part of standard. That is UB.
Implementation defined behavior. Then yes, [[gnu::may_alias]] is implementation-defined behavior, still does not change the fact that is UB.
The standard doesn't say anything like "Any other functions, the compiler is free to optimize it away since it is not a part of standard". Don't misuse items "unspecified" and "undefined". You should read [intro.compliance]/(6.3) carefully.
What constitutes an interactive device is implementation-defined.
By the way, [[gnu::may_alias]] modifies types and adds non-standard properties, so it may result well-defined behavior in implementation-defined manners. I think [[gnu::may_alias]] is acceptable, partially.
Calling with syscalls roughly has same properties. The difference between ASCII/EBCDIC charset doesn't affect anything here if the code is compiled with correct options.
It is undefined behavior. Calling POSIX APIs is undefined behavior since standard only treats stdio and iostream as I/O. Any other functions, the compiler is free to optimize it away since it is not a part of standard. That is UB. Implementation defined behavior. Then yes, [[gnu::may_alias]] is implementation-defined behavior, still does not change the fact that is UB.
The standard doesn't say anything like "Any other functions, the compiler is free to optimize it away since it is not a part of standard". Don't misuse items "unspecified" and "undefined". You should read [intro.compliance]/(6.3) carefully.
What constitutes an interactive device is implementation-defined.
By the way, [[gnu::may_alias]] modifies types and adds non-standard properties, so it may result well-defined behavior in implementation-defined manners. I think [[gnu::may_alias]] is acceptable, partially.
I am sorry. You are wrong. Read the discussion we had before. It is even hard to say whether printf or even write is an I/O function expnkx/fast_io#20 (comment)
In practice, you have to use memory barriers etc to avoid UB since yes, the compiler is free to optimize anything away.
https://www.yodaiken.com/2018/06/07/torvalds-on-aliasing/
Linus is technically right.
The *fact* is that gcc documents type punning through unions as the "right way". You may disagree with that, but putting some theoretical standards language over the *explicit* and long-time documentation of the main compiler we use is pure and utter bullshit. I've said this before, and I'll say it again: a standards paper is just so much toilet paper when it conflicts with reality. It has absolutely _zero_ relevance. In fact, I'll take real toilet paper over standards any day, because at least that way I won't have splinters and ink up my arse.
The item "I/O function" was mentioned by you, but not me. Again, not guaranteed by the standard (e.g. something described as "implementation-defined" or "unspecified") doesn't mean undefined. Whether Linus is right depends on personal opinions. And it's not an appropriate place for such discussion here.
The item "I/O function" was mentioned by you, but not me. Again, not guaranteed by the standard (e.g. something described as "implementation-defined" or "unspecified") doesn't mean undefined. Whether Linus is right depends on personal opinions. And it's not an appropriate place for such discussion here.
Linus is right does not depend on personal opinions. He had experiences and made much more code than you. His word is more reliable than you or even the standard.
There are facts. "Personal opinions" is just the excuse for you who disagree with facts.
"(e.g. something described as "implementation-defined" or "unspecified")"
How do these "implementation-defined" or "unspecified" what you said get implemented without undefined behavior? No, they cannot. They must contain undefined behavior.
Then why a third party library is not implementation-defined behavior anymore?
No, those are not facts, just your opinions. Linus is not so reliable because it has not been shown that he has written enough code according the strict aliasing rule, and hence it has not been shown that he is capable to capable to compare the standard sematic and -fno-strict-aliasing sematic. I'm convinced that Linus's word is more reliable than me about the his code relied on -fwrapv, -fno-strict-aliasing etc., but not about "which sematic is better". I don't care whether "they" can or cannot, they are permitted to be implemented without "behavior in C/C++". Again we should discuss this elsewhere.
Because it is "may be" undefined. LOL
Such word is not said by the standard. It's concluded by Cubbi (an administrator of cppreference) and modified by me.
P0593R6 (implicit object creation) partially resolved this question. However, the well-definedness of modifying objects via unsigned char
/std::byte
lvalue is underspecified in C++ standard, so there's unresolved defects IMO.
However, the well-definedness of modifying objects via
unsigned char
/std::byte
lvalue is underspecified in C++ standard, so there's unresolved defects IMO.You admit the standard library itself is UB. Great!
Nope. Defects (CWG issues) are treated differently IMO.
Linus is not so reliable because it has not been shown that he has written enough code according the strict aliasing rule, and hence it has not been shown that he is capable to capable to compare the standard sematic and -fno-strict-aliasing sematic.
These are facts. Calling POSIX apis is UB.
It's unrelated. Can you show that the implementation of POSIX APIs requires UB?
Calling with syscalls roughly has same properties. The difference between ASCII/EBCDIC charset doesn't affect anything here if the code is compiled with correct options.
It is undefined behavior. Calling POSIX APIs is undefined behavior since standard only treats stdio and iostream as I/O. Any other functions, the compiler is free to optimize it away since it is not a part of standard. That is UB. Implementation defined behavior. Then yes, [[gnu::may_alias]] is implementation-defined behavior, still does not change the fact that is UB.
The standard doesn't say anything like "Any other functions, the compiler is free to optimize it away since it is not a part of standard". Don't misuse items "unspecified" and "undefined". You should read [intro.compliance]/(6.3) carefully.
What constitutes an interactive device is implementation-defined.
By the way, [[gnu::may_alias]] modifies types and adds non-standard properties, so it may result well-defined behavior in implementation-defined manners. I think [[gnu::may_alias]] is acceptable, partially.
I am sorry. You are wrong. Read the discussion we had before. It is even hard to say whether printf or even write is an I/O function expnkx/fast_io#20 (comment) In practice, you have to use memory barriers etc to avoid UB since yes, the compiler is free to optimize anything away. https://www.yodaiken.com/2018/06/07/torvalds-on-aliasing/ Linus is technically right.
The *fact* is that gcc documents type punning through unions as the "right way". You may disagree with that, but putting some theoretical standards language over the *explicit* and long-time documentation of the main compiler we use is pure and utter bullshit. I've said this before, and I'll say it again: a standards paper is just so much toilet paper when it conflicts with reality. It has absolutely _zero_ relevance. In fact, I'll take real toilet paper over standards any day, because at least that way I won't have splinters and ink up my arse.
The item "I/O function" was mentioned by you, but not me. Again, not guaranteed by the standard (e.g. something described as "implementation-defined" or "unspecified") doesn't mean undefined. Whether Linus is right depends on personal opinions. And it's not an appropriate place for such discussion here.
Disclaimer The referenced discussions on I/O functions does not imply there is any UB. OTOH, it effectively claims that the standard has a series of (potentially more serious) defects on the basic definitions of I/O. I have submitted it to CWG chair, but there is still no reply. I don't know why, but anyway this is none of the business here.
As of Linus Torvalds' theory: full of BS. Nobody is obligated to take care on any implementation details, which is effectively nothing. The standard implies the conformance as the consensus to refine the things being standardized, which is logically strictly more than nothing.
Please do not use our issues for off-topic discussion.
I appreciate that
<span>
is not required to have standard layout, however every other<span>
implementation does have standard layout: libstdc++'s, libc++'s, and all the third party library implementations that I am aware of. Except for Microsoft's.I assume that as usual the ABI ship has sailed, so this will be a wontfix. However, fixing this would be great. Lots of code passes spans to non-C++ code, and not having avoidable UB here would be nice.