microsoft / GSL

Guidelines Support Library
Other
6.18k stars 740 forks source link

array_view not possible with incomplete Classes #135

Closed nolange closed 9 years ago

nolange commented 9 years ago

Hello,

I use a small simple class that takes a pointer and a size for wrapping normal arrays, quite the same what array_view is designed to solve in a much more generic fashion. I found a drawback compared to using a simple class, and that is that array_view does not work with incomplete classes. I dont know if its possible to fix this via templates specialization and sfinae voodoo, but would be really nice if this use-case is covered aswell

#include <array_view.h>
class CIncomplete;

void foo(gsl::array_view<CIncomplete> s); // OK
gsl::array_view<CIncomplete> s; // Not OK (says clang 3.6 and gcc 5.1 )
nolange commented 9 years ago

@gdr-at-ms How is that weakening the semantics, one bit? I want to pass an portion of the array around, not just a pointer, and ideally I want to just have a nice class that can then directly iterate. I get that array_view from a class that knows the type, and feed it into another that does so too. But unless I want to modify that array_view, I dont need any information of the type itself.

#include <logprinter.h>
#include <myLog.h>
CLogprinter print;
CLog::_instance.logEntry("Donuts exhausted");
auto activelogs = CLog::_instance.getEntries(); // returns array_view<CLogEntry> of messages, ptr to first + count
print.printLogs(activelogs); // prints the entries
mattnewport commented 9 years ago

@gdr-at-ms Physical Design is John Lakos term for how code is organized into actual .h and .cpp files in C++. It's very important on large projects for keeping build times down (don't expose code in headers that isn't absolutely required by clients) and to minimize the rebuild impact of code changes (don't force everybody to recompile if you change the definition of a type if they don't need to).

nolange commented 9 years ago

@neilmacintosh: I havent coded logEntry(), as I assumed it would be clear that I use the first free slot in array view, using the iterators, operator[] or both..... Similary I havent cared for access modifiers, its just for illustrative purposes, the array_view in CLog would be private so no part of the interface but necessary so the size of the class is known. (I am not entirely happy about that, but I cant use pimpl idiom or something else ).

CLog::getEntries() example I thought is more clear too, the LogManager and Printing are 2 seperate entities, and code above ties them together, not necessarely knows much about it. You could do a similar example with connecting those 2 through a queue. In many parts of system design you have layers which need to be seperated (no callbacks for example) and passing information back-and-forth happens per pull-push or queue. Ie. state has to be moved around with other words, and the mover doesnt need to know much about it.

And I hapened to watch Stroustrup's talk, and remember him asking for input from many different users from C++, well I am voicing my concerns/needs regarding array_view here. Modules sure are nice, but they wont solve all issues and definitly not right now.

As soon as you want to iterate/access an array_view without the definition of the type, the compiler will very nastily complain. So I really dont see the impact on the semantics/functionalily if you allow array_view to be a member in a class (even private, so no part of the interface) and beeing copied around

gdr-at-ms commented 9 years ago

We have decades of experience with delaying diagnostics; we've learned enough that we decided to go in a difference direction with the concepts effort being a concrete materialization of that learning.

Delaying semantics requirement checking isn't necessarily always a good design. In this specific, I've yet to see a convincing case, not just a case.

If you want to use pointers to incomplete type as a way to implement "compiler firewall", that is fine; please do.

neilmacintosh commented 9 years ago

@nolange so my point about coding logEntry() is you can create an array_view inside that function around a pointer + length member and get the same effect. No need to have the array_view member which is where your requirement for supporting an incomplete type stems from here.

I still don't see anything compelling here. There are plenty of other valid, safe alternatives to what you propose.

I'm glad you are voicing your thoughts and concerns regarding array_view here, and they are welcomed.

If you think modules won't address issues you have around physical source code organization and "compiler firewalling", then modifying the interface to array_view is not the right fix. Instead, please contribute to the discussion around modules - which is where we can really make progress on these issues - and make sure the proposal is as useful as it can be.

With that said, you can pass pointers to incomplete types today (as @gdr-at-ms observes), and I argue that you can still use array_view at the points in your implementation where you are prepared to know the complete definition of the sequence type.

mattnewport commented 9 years ago

@neilmacintosh If a class has a {pointer, size} pair as a member that is conceptually a view onto an array, it would be desirable to express that intent clearly by just having a member array_view, would you agree? Or do you not envision array_view as a type intended to be used as a member of a class?

Given that, it seems like it is obviously desirable to use an array_view anywhere where you might currently use a {pointer, size} pair unless there are very compelling technical or design considerations why this is not possible. In the case where you have a pointer to an incomplete type, saying "just don't use array_view in that case" doesn't seem like a very satisfactory answer. To me it seems like the obvious desirable thing is for array_view to support this, just as unique_ptr is specified to support an incomplete type. So far, I haven't seen any compelling technical or design considerations why this should not be supported. I think it's agreed that having as_bytes() and friends be non-members is a better design anyway and they are currently the only obstacles to supporting this.

Modules are great but I'm not sure they solve the problem of needing to recompile code merely because it can see the definition of a type through some included header that shouldn't be necessary to make visible.

nolange commented 9 years ago

Delaying semantics requirement checking isn't necessarily always a good design. In this specific, I've yet to see a convincing case, not just a case.

Doesnt compute. What semantic checking is performed if array_view gets a concrete type, what requirements of the type does array_view depend on?

@neilmacintosh: casting around and creating instances of array_view from is more code that can go wrong. If array_view is not fit for beein passed around, ok. I still dont feel I got an answer to why this has to be, or what is lost if it can - in your refence implementation thats just moving the as_bytes function, does this violate the intended use now? That I can do various worse stuff is also clear, but I thought the mere point of GSL is to promote safer, simpler code.

What I tested on our ~350K Codebase was mapping my own "pointer + size" class to your array_view. Thats the exact use-case for me and this fails for what in my opinion is a petty reason I dont get explained?

You werent happy with my generalized arguments, but I havent found a single concrete arrgument against it. What exactly is the problem, if you support passing the array-view from one "safe user" to the next, and why is it that needing conversions or splitting the components and recombining them is no concern for a package that tries to help code quality? I could use another class or pointer + size, sure, I could also cast those to floats if I am at it. But why, if the only operations I am using are available through array_view,

Btw. There is no usefull class for this in the standard, array_ref seems dead. Maybe ranges will fit, but I have no experience with those.

neilmacintosh commented 9 years ago

@nolange, @mattnewport array_view is certainly fine to use as a member. Where we disagree is that I don't see any reason to explicitly constrain its design to support use as a member where the element type is incomplete.

The only reasons I have heard for that scenario so far are because you want the element type incomplete to help with compile times of source code. I understand that scenario, but once we say "yes, element type can be incomplete" that constrains the interface of array_view permanently. I would not want to adopt that constraint unless I saw a compelling reason.

If the type is incomplete, then the compiler cannot ensure the element type meets all the needs of the array_view interface. So you end up deferring potential diagnostics until you finally have some code that manipulates the array_view with the full element type definition available. As @gdr-at-ms pointed out, delaying diagnostics is not something we want to encourage.

I find the argument of passing sequences of incomplete types as parameters "through" a system particularly unconvincing. If you are passing that sequence around, then it is also available for access to each function that accepts such a parameter and passes it along. So having it incomplete is of no design value...it is just an optimization for build time.

mattnewport commented 9 years ago

@neilmacintosh

If the type is incomplete, then the compiler cannot ensure the element type meets all the needs of the array_view interface.

What needs are those? What requirements are there on a type that you can have a pointer to it? That seems to be the only requirement for a type you can have an array_view over and to my knowledge it is not possible to create a type in C++ that doesn't meet that requirement.

mattnewport commented 9 years ago

@neilmacintosh You quite reasonably asked for concrete examples of code for why this should be supported and we provided them. I think it's only reasonable to hold you to the same standard when you raise non-specific concerns about constraining the interface of array_view permanently and the compiler being able to ensure the element type meets the needs of the interface. What actual extensions to array_view or alternative implementation approaches to you envision that supporting incomplete types would make difficult?

I also don't buy the delayed diagnostics argument here. If you try to access an element of an array_view via operator[]() while the type is still incomplete, you will get a diagnostic at exactly the point of use which is what you want here, not a diagnostic where you declare the member, just as you would if you tried to access an element via a pointer to incomplete type.

gdr-at-ms commented 9 years ago

Folks, please take a step back. Let's look at the larger picture.
I understand the format of the comments make it hard to avoid what might look like twitter-style design discussion. I am going to make one last effort here, then take the rest to the Core Guidelines (which array_view exists to support.)

The first thing to consider is why did we design an array_view type in the first place? The primary goal of the project (of which GSL is part) is promote C++ programming practices that yield correctness and safety by construction. A key component of that is to ensure memory and bound safety. Ideally, we would like questionable constructs caught at compile time, where possible; with fallback to runtime checks. But runtime checks should be minimized. See also this paper we released earlier in the week to give a general overview of the project: https://github.com/isocpp/CppCoreGuidelines/blob/master/docs/Introduction%20to%20type%20and%20resource%20safety.pdf )

We designed array_view to support and help enforce bound safety and memory safety, with assistance from analysis tools -- see Neil's presentations at CppCon 2015 (all the slides available from here: https://github.com/isocpp/CppCoreGuidelines/tree/master/talks). To achieve this, it is important that we have abstractions and mechanisms in place to check that memory accessed through array_views are correct. If you look at all the interesting constructors, you will see that almost all of them are from contiguous containers (built-in arrays, std::vector, etc.), all of which already require complete types. That allows simple check for validity of storage used to construct an array_view objects. The only one that is under discussion here is also, interestingly, the one that is the most unsafe by default: just a pointer and a length. We acknowledged this very early in the design and have been considering various mechanisms (not all of them are made public yet) to recover safety. A simple mechanism is a form of annotation that indicates to an analysis tool to generate codes for runtime validation a la "memory sanitizer" (e.g. http://clang.llvm.org/docs/MemorySanitizer.html and more). For example, we expect an analysis tool to generate a check for memory validity at the point where an array_view object is constructed from a pair of pointer and length. Such an extensive check isn't needed when the array_view is constructed from a std::vector because it is known that the base of the storage is valid (only extent needs validation). Requiring the element type to be complete is only natural, there is nothing earth shattering here.

Indexing is an operation that happens later, after the array_view object has been constructed and deemed valid, so that it can minimize the checks to the index being in range. It builds on the invariant (established by the constructors) that either an array_view is empty of it points to valid memory.

We didn't design array_view as a replacement for arbitrary pairs of pointer+length. We did it with specific goals in mind. It may very well be the case that you don't need memory and bound safety, and arbitrary pointer is all you need. That is fine. Use the right tool.

The feedback expressed here is good; one important take away is that we need to clarify the Core Guidelines with respect to the aims and why we designed the type in the GSL.

mattnewport commented 9 years ago

@gdr-at-ms I think this does get at something that has been bothering me about the GSL array_view - I think it is currently doing two jobs and is misleadingly named as a result.

The older array_view proposal gave a certain amount of static safety and put together as one type two things that should conceptually be a single unit for this use case (a pointer and length). It also provided a number of convenience benefits:

I very much want such a type in the standard. Because it is a vocabulary type there is real value in it being a standard one and because it is so convenient and useful, being able to rely on it being available without needing any extra dependencies beyond the standard library is hugely valuable. This means it can be used and spread as a best practice through code snippets and live examples on sites like Stack Overflow and ideone, used even in tiny standalone programs with no external dependencies, etc.

Unfortunately the GSL array_view is not that type. It has elements of that type but it adds mandatory bounds checking which I am skeptical I can afford for my use cases (I have a separate discussion around that going on on the sg14 mailing list) and currently it seems certain design decisions which I strongly disagree with for my desired array_view vocabulary type (this and the no construction from temporaries discussion for example) are being driven by another set of design goals for GSL array_view.

My concern is that the GSL array_view becomes the standard array_view and I lose the chance to see the usable array_view type I want in the standard. The unusable-in-its-current-form for my purposes array_view proposed here may mean we never get a standard usable array_view.

One possible way forward is a standard array_view more like the older proposal and a separate checked_array_view that is part of GSL and perhaps the standard too. I can't see myself using the GSL array_view in its current form but I see it has some valid use cases. Something like the older array_view proposal seems to me much more broadly useful and I absolutely think there is a place for it in the standard. Since it has a different design purpose than the GSL array_view there is room for both to be in the standard.

I understand this isn't the ideal place for these discussions. Are you suggesting moving to issues against the Core Guidelines or is there a better place to take them?

nolange commented 9 years ago

OK, maybe the takeaway is that I loked at array-view in isolation, but isnt this class also proposed for the standard? For gsl as a package I could accept this, for a utility class in the standard I wouldn't.

nolange commented 9 years ago

Matt explained my concerns quite well, I also asked about a better place to discuss this early.

neilmacintosh commented 9 years ago

@nolange and @mattnewport I don't think Gaby is saying this is not the place to discuss array_view, I think he was trying to point to the context for array_view here: which is the GSL and the Core Guidelines.

I get that you might not like the array_view that fits with the overall purpose and direction of the GSL and want something similar but different.

If you want to discuss the proposed standardization of array_view, I would suggest the LEWG is the right place to do that (which is where my proposal is published). Or you can write to me directly as author of that paper, of course.

If you want to discuss the design of GSL array_view, then here or at the Core Guidelines repo is a reasonable place to do so...but expect us at either of those locations to frame the discussion in terms of the aims @gdr-at-ms pointed out and not in terms of a type that is just a generalized replacement for pointer+length.

nolange commented 9 years ago

@neilmacintosh : Don`t know, but closing the issue means the discussion is shielded from the public. Is there a mailing list for LEWG, or does the discussion happen at the github page?

I red through the thread, and am trying to clear things up a bit, hope we dont go around in circles.

First, what I want: I dont need just a pointer + size type, I need an abstraction of a view of storage owned elsewhere. I want to access this with range-for, iterators and operator[]. Ideally I want to use a good standard type thats available everywhere and might even have some special compiler magic in the future. The only times I related to pointer + size I meant that the "state" of an array_view will be pretty much identical and everything that could be done to a class based on a pointer + size struct should be doable to an array_view - unless there are good reasons not to (versatility for as much use-cases as possible). Extracting pointers and building back array_views is exactly what shouldnt be done if you want to statically analyse your code

I have red various proposals including the GSL, I think I have a good understanding about the scope of array_view. What severely lacking is that a "view" should be possibly to pass around easily without the type information. I will correct my former possition and will now say that these requirements of array_view should be met:

Rationale: The propsed change does not infere with the invariants or semantic checks of an array_view, it does not interfere with "semantics requirement checking" since copying an existing instance does not change state. Unsafe Constructs can only arise at construction (aside from other instances) and access.

Explaination: The invariants of array_view have to be valid at all times, as far as I understood the checks are local and tools arent expected to do whole program analysis (would be hard with binary libraries you dont have all informations). What we know is that array_view has to start its life at some point and either the compiler can prove this is valid, or it cant and might issue a warning/error. As soon as array_view is alive and has its regular C++ state and compiler annotations you can do a few things:

  1. Pass it around
  2. Access it.
  3. Destroy it.

1) Passing around doesnt modify its state, so the invariants arent touched at all. If the compiler sees an array_view without its initialisation then it will have to assume its correct. Passing around to functions (per reference or value) should not yield warnings, and copying shouldnt either. Even if the compiler should be cautious enough to consider all array_view as "bad until proven good", passing them around makes no difference.

Consider follwing 2 files:

// acess.cpp
#include <mytype.h>
array_view<CMyType> accessIt(array_view<CMyType> val)
{
    for (auto : val)
        ...; // do some modifications
    return val;
}
// pass.cpp
class CMyType;
array_view<CMyType> accessIt(array_view<CMyType> val); 

array_view<CMyType> foo(array_view<CMyType> val)
{
    return accessIt(val);
}

This example can be played with a good and bad array_view instance:

foo is called with a good val -> accessIt is called with a good val -> accessIt returns a good val -> foo returns a good val

foo is called with a bad val -> accessIt is called with a bad val -> accessIt returns a bad val -> foo returns a bad val

Note that none of the functions is for its own "unsafe", as you cant find a case where those functions start with valid preconditions and does something unsafe. If the functions are called with a bad val then the error happened elsewhere, to detect the error you need full type information - and at that place you will have it. My point is that copying and referencing are per definiton safe, and should be possible with incomplete types - per interface - to allow broader adoption of array_view. Unless this is conflict with other design criteria of array_view, if it is, please explain and be explicit and explain the exact intention and scope of array_view and where I am wrong about it. I cant think of an issue with the intentions of array_view`s functionality, for those operations copying previously established and validated state is enough.

2) Accessing often (usually?) means you dont see where the instance got initialized and have to assume its valid (the view points to valid objects). The proposed change does not interfere with this, and full type information is required.

3) Destroying should cause no issue at all, unless I really misunderstood the intention of array_view.

And here is why I am a little unhappy - that this proposed change seems to get shot down without giving a precise answer which criteria is endangered. I can assure you that there is code out there using such schemes.

Some more explaining, about why this is done with prototypes... I received training in regard to SIL (https://en.wikipedia.org/wiki/Safety_integrity_level) and use static analysis for years. One key point is that you have to do an impact analysis of every change, this means that you have to look at every piece of code that could be affected. A change in a Header automatically means you have to do this impact analysis everywhere its included.

Hence our scheme is not really designed for faster builds (these are always nice though), but for seperating code as much as possible and beeing able to reasonable argue about that separation. If an internal header changes, you can recompile that module and leave out everything else - which otherwise would have to be tested again Adding to that some other requirements, its hard to come up with a solution that wont be similar.

I dont think modules will solve this completely, in our case I would still have to deduce what information "leaks out" to other modules, I really wont discuss modules in regards to this though - array_view should stand on its own and be fit for current projects.

nolange commented 9 years ago

One more example regarding use as a member variable:

// ------- mystuffmanager.h
#include <array_view>
class CStuffEntry; // Implementation detail.

class CMyStuffManager
{
public: 
    // Needs array_view default constructor
    constexpr CMyStuffManager()
    : _entries()
    {}
    // Needs array_view copy
    constexpr CMyStuffManager(array_view<CStuffEntry> resources)
    : _entries(resources)
    {}
    // Needs array_view destructor
    ~CMyStuffManager() = default;

    // array_view not part of the public interface, can be used without CStuffEntry definiton
    void doSomethingOnEntries(int x, int y);

    // can be still be copied, or ignored without CStuffEntry definiton 
    array_view<CStuffEntry> getInnerView(int);

private: // Not part of the interface, but necessary for definition of the class
    array_view<CStuffEntry> _entries
};

// ------- mystuffmanager.cpp
#include <mystuffmanager.h>
// Definition of CStuffEntry
#include <mystuffmanager_entry.h>
void CMyStuffManager::doSomethingOnEntries(int x, int y)
{
    // checked access _entries, full definition of CStuffEntry available
}

array_view<CStuffEntry> CMyStuffManager::getInnerView(int c)
{
    // checked access _entries, full definition of CStuffEntry available
    return array_view<CStuffEntry>(_entries.begin() + c, _entries.end());
}

// ------- api.h
class CMyStuffManager;
class CApi
{
public:
    static CMyStuffManager &getStuffManager();
private:
    CMyStuffManager* _stuffman;

static CApi _s_Instance;
};

// ------- api.cpp
#include <mystuffmanager.h>
// private implementation detail for construction 
#include <mystuffmanager_entry.h> 
#include <api.h>

static CStuffEntry s_Entries[128]; // For example in a special memory range for DMA via gcc section attribute 

/* Decisively simple */
CMyStuffManager &CApi::getStuffManager()
{
    CMyStuffManager* pMan = _s_Instance._stuffman;
    if (!pMan)
    {
        // construct array_view, full definition of CStuffEntry available
        // It doesnt matter to MyStuffManager`s how the array_view was constructed,
        // but it was checked at that time.
        // The constructor only copies a previously established state!
        pMan = new MyStuffManager(s_Entries);
        _s_Instance._stuffman = pMan;
    }
    return *pMan;
}

Note that the use is exactly the same as other prototypes and it has the upside that you dont include what you dont need for a functional public interface. The times where the definition is needed, the header is included (it wont even compile otherwise) - but only there. This has apart from faster build time the upside that you dont have to recompile tested modules. In this example, a change in mystuffmanager_entry.h will need the module to be retested/verified and the part in api.cpp where it is constructed. All the other points (the vast majority) where CMyStuffManager`s methods are accessed can easily be proven to not be affected (for example they dont have even to be recompiled). For maintanance of safety critical code this is a very big (and costly) issue.

So for our appliance, where static checks, formal and practical tests are incredibly important, an implementation/specification of array_view that cant be copied without the type fully defined wont be acceptable.