isocpp / CppCoreGuidelines

The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++
http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines
Other
42.77k stars 5.44k forks source link

Conversion of spans - love or hate? #1203

Open annagrin opened 6 years ago

annagrin commented 6 years ago

There is a helper function that some GSL users created for parsing byte streams. This function is currently is not in GSL due to us having problems implementing it efficiently, safely, and in a platform-independent way.

Is general conversion of spans going against the core guidelines? If yes, is there a limited conversion that is possible (i.e. from std::byte to T)? What are the checks that need to be done to make it (or a restricted variant) safe? Are there any platform-specific type layout problems that we need to be aware of? Are there better ways of achieving the same?

template <class T, class U>
gsl::span<T> convert_span(span<U> s)
{
    auto data = s.data();
    if (!data)
    {
        return {};
    }
    auto bytes = s.size_bytes();
    Expects(bytes % sizeof(T) == 0);

    return { reinterpret_cast<T*>(data), bytes / narrow_cast<int>(sizeof(T)) };
}

void DoesNotCompile(span<gsl::byte> bytes)
{
    gsl::span<int> s(bytes); 
}

void Compiles(span<gsl::byte> bytes)
{
    gsl::span<int> s = convert_span<int, gsl::byte>(bytes);
}
scraimer commented 6 years ago

You'll run into undefined behaviour. I think. For example, endianness and byte order. Different architectures will cause the expression s[4] in Compiles to return different values, such as only the highest bits of the integer, or the lowest bits of the integer.

When I look at the specified convert_span, I see that the added value is in error handling and the use of Expects for assertion. If it weren't for those, I think that a plain reinterpret_cast would give similar results, relying on the same sort of undefined behavior (de-referencing a pointer that has been reinterpred_cast-ed), and relying on the implementation details of gsl::span.

void Compiles(span<gsl::byte> bytes)
{
    gsl::span<int> & s = reinterpret_cast<gsl::span<int> &>(bytes);
}
MikeGitb commented 6 years ago

Imho, conversion to std::byte is ok and I'd like to have it (essentially a safer reinterpret_cast<std::byte*>(&my_variable)).

Almost anything else results in undefined behavior or at least makes it far too easy to make mistakes, so it should not be supported by a type whose purpose is to make c++ programming safer.

moretromain commented 6 years ago

I agree with Mike, conversion to std::byte could be ok, because ultimately any data is bytes, but anything else would defy the purpose of the span, which is bringing a view of something that is. An easy reinterpretation to something that could be completely breaks the core concept behind the span. Span is a life-jacket, start playing with it, you’ll drawn...

gdr-at-ms commented 6 years ago

Suggestion: if you limit convert_span to only "to" or "from" span of std::byte, would that be sufficient for your use cases

MikeGitb commented 6 years ago

Sorry for repeating myself, but I really don't think a cast from span of std::byte is a good Idea. As far as I can tell, there is no way for span to check at compile or runtime if that cast is valid (valid according to the c++ standard).

gdr-at-ms commented 6 years ago

@MikeGitb I read your original comment. I believe there is a confusion as to what the GSL is. The GSL isn't there to provide only operations that are checked at runtime or compile time. It is to support the Core Guidelines. We rather have a common notation to search for, than to have people invent their own wheels square and making it harder to contain or hunt for bugs due to everybody unleashing their creativities for something that they have to write anyway.

neilmacintosh commented 6 years ago

I'm with @gdr-at-ms here. I believe that the point of the OP by @annagrin is that there are users who already go from byte streams to object streams. This sort of operation is pretty common in systems programming. The question is about how to express that operation in a way that could be considered the "guideline" approach to the problem. It may not be the most portable or safe operation (maybe it never can be), but it could at least be consistent (and a little more safety and reliability would be nice if possible).

MikeGitb commented 6 years ago

@neilmacintosh : At least for trivially copyable types, there is a (relatively) safe and portable way to get from a byte stream to a T: Default construct a T and then copy the bytes from the byte stream over. If you want to encapsulate that technique behind a searchable function name, I'm all for it. Reinterpret casting a byte pattern into an object is currently - at least in theory - UB (I think there is a proposal to change that for c++20) and I don't think hiding UB behind a function instead of leaving it out in the open and endorsing a safe alternative is helping anyone.

@gdr-at-ms:

It is to support the Core Guidelines

so which part or rule of the Core Guidelines would that function support? Certainly not the note in P.2:

Avoid dependence on undefined behavior

or

ES.48: Avoid casts

And the common notation to search for is imo reinterpret_cast

JosephBialekMsft commented 6 years ago

I don't have a strong opinion on the exactly mechanics of how the spirit of this issue is accomplished, but it does need to be accomplished IMO.

As noted, it is extremely common in systems programming / binary protocol implementation to make zero-copy conversions between char/uchar arrays and POD structures or other simple types.

This is often accomplished by simply using a C-style or reinterpret_cast. BYTE networkBuffer; PACKET_HEADER header = (PACKET_HEADER)networkBuffer; PACKET_BODY body = (PACKET_BODY*)(header+1);

Developers commonly forget to do proper bounds checking before all casts and can easily end up with OOB reads and writes.

It is not ideal to default construct an object and then copy the bytes on top of it since this requires making a copy of the data where there previously was none.

This is an area that absolutely needs GSL support since code like this is extremely common in mission critical attack surface. Allowing span to cover these sorts of scenarios makes it substantially more useful, especially when developers are forced to do non-trivial parsing.

hsutter commented 6 years ago

When we originally designed gsl::span, we did provide conversion from span<T> to span<gsl::byte> for this reason. It seems to be still there in Microsoft's GSL implementation, as the functions as_bytes and as_writeable_bytes. I don't think they check that T is a POD, but probably could/should.

It sounds like you need that, and also something like a a function in the other direction (which ideally we should somehow make an error to call unless the user suppresses the type safety rules). You could use make_span where one of the arguments is a pointer cast and so the user already has to write the suppression, but it might be nice to have a specific function for this cast so it can still ensure bounds safety (if not type safety which it has to trust the programmer for) and can require that the type be a POD.

Correct?

JosephBialekMsft commented 6 years ago

You are correct Herb. And I'd be fine with disabling a type safety rule to access the functionality.

Today in Windows we have the following functions created to make span more useful for systems programming:

*template <class T, class SpanType> T get_struct(SpanType s, size_t offset = 0)** Return a pointer to a POD structure that is guaranteed to fit within the span. Offset is an optional offset (in elements) in to the span at which the structure begins. Fast fail if the structure doesn't fit.

*template <class T, class SpanType> T try_get_struct(SpanType s, size_t offset = 0)** Same as above except return nullptr if the structure doesn't fit instead of fast fail.

*template <class T, class SpanType>T try_remove_struct(SpanType& s, size_t offset = 0)** Same as above (nullptr return if structure doesn't fit), except the span position is also advanced to the next element after the structure that was returned.

*template <class T, class SpanType> T get_partial_struct(SpanType s, size_t extraction_size)** This function should be avoided at all costs. It is for parsing variable length structures (i.e. a structure where the last member is the first element of an array, and the next element of the array is 1 past the end of the structure). Unfortunately we have these structures and need to deal with them. The thing that makes this function super dangerous is that even bounds safety cannot necessarily be guaranteed since the compiler cannot prevent you from accessing members of the structure beyond the length that this function verifies to be valid.

template <class NewClass, class CurrentSpan> gsl::span convert_span(CurrentSpan s) Convert from a span of one POD type to another POD type. Ensures the span can be converted cleanly (i.e. if you convert from a span of byte to a span of uint32_t, the span size must be a multiple of 4).

template <class T, class ByteType = gsl::byte>gsl::span struct_as_bytes(const T& structure) Take a pointer to a structure, return a span of bytes (const).

template <class T, class ByteType = gsl::byte>gsl::span struct_as_writeable_bytes(T& structure) Take a pointer to a structure, return a span of bytes.