Summary of ABQ comments on `span`

dhollman commented 6 years ago

I'm going through the comments on span from Albuquerque to both understand where span stands relative to incorporation into the IS (and as it relates to P0546, P0856, and P0860) and to make sure we haven't made the same mistakes in mdspan. I'll enumerate summaries of the issues here (some with bulleted responses about how we've treated that in mdspan)

Comments from a thread titled "Questions raised while trying to implement " on the lib-ext reflector:

The comparison operators should work with differing extents, particularly if the extent of one of them is dynamic
- mdspan doesn't define comparison operators, not even operator== (could this be a problem/inconsistency for some people? Probably it's okay to just tell people to convert to span first, especially with the potential for deduction guides on the conversion operators)
The comparison operators should be able to compare spans of unequal extents, since std::equal and std::lexigraphical_compare do this
span<int> and span<long> should be comparable
The comparison operators take their parameters by const reference rather than by value, even though the paper itself recommends passing span by value
- We should change the free function subspan to take an mdspan by value for the same reason
Contiguous iterators and contiguous containers: span currently has (or had?) a constructor that takes a ContiguousContainer (i.e., a Container whose iterators meet the requirements of contiguous iterators [27.2.1]). This isn't implementable via SFINAE (or any other compile-time technique) since contiguous iterator is a runtime property.
- mdspan doesn't have this constructor. It doesn't look like we're using the term "contiguous" in any contexts that would imply the need for SFINAE, but we should double-check this
If the size of the span (in bytes) is larger than numeric_limits<ptrdiff_t>::max() (e.g., as could be the case if sizeof(value_type) is much larger than 1), then as_bytes() and as_writable_bytes() could introduce undefined behavior.
- This applies to us also. We should consider using size_t here instead. (Also, seems like a good motivation for the T[][][] syntax, so that we don't have to use -1 for a magic extent value)
- A later note says that LEWG has already discussed this and decided to stick with ptrdiff_t, so maybe we shouldn't change anything.
No reason for the nullptr_t constructor, since it's the same as the default constructor
- Not sure I agree with this one. Implicit convertibility from a nullptr_t is a reasonable trait to have. Either way, we omitted discussion/description of this constructor in the paper, even though it's in the wording.

Other Pre-meeting comments

span<int, 42>.subspan<0>() shouldn't return a span<dynamic_extent>
- This is another artifact of the dynamic_extent magic number. It doesn't look like we have that problem because our subspan extents are always given as normal function arguments rather than template arguments (though this seems like a bit of a shortcoming since it means we can't have a subspan with static extents other than the entirety of the parent's extent for a given rank)
all iterator functions shoudl be constexpr
- mdspan is not iterable (?!?) so this doesn't apply.
Shouldn't have both length and size

Small Group discussion comments

Lots of discussion about constructors taking shared_ptr/unique_ptr (presumably the array partial specializations)
Something indecipherable about propagating noexcept (presumably relating to comparison operators?)
The static version should work with std::get, since it is like std::array (though then it would have to participate in structured binding?)
- Probably not that applicable to mdspan. If you really need this, just convert to a span

Full group discussion

Maybe considering switching back to array_view, now that the term view does not imply immutable (presumably with some change to string_view?)
Poll to remove unique_ptr and shared_ptr constructors: 6-8-0-0-0
Poll to remove nullptr_t constructor: 4-5-3-0-0
Poll to remove span(pointer, pointer) constructor: 0-0-2-10-0

Other notes (not from `span` discussion, but encountered while working on it):

On the conversion constructors and assignment operators:

template< typename UType , typename ... UProperties >
constexpr mdspan( mdspan< UType , UProperties ... > const & ) noexcept;
template< typename UType , typename ... UProperties >
mdspan & operator = ( mdspan< UType , UProperties ... > const & ) noexcept;

One of the requirements is that "V::static_extent(r) == U::static_extent(r) or V::static_extent(r) == std::dynamic_extent for 0 < r < V::rank()". This should probably also include the possibility that U::static_extent(r) == std::dynamic_extent.

mdspan defines the constraints here in terms of std::is_assignable<U::pointer, V::pointer>, but span defines it in terms of accessing a U::value_type through a V::pointer meeting the rules for well-defined object access defined in [basic.lval]/8. Are these definitions exactly equivalent?

dhollman commented 6 years ago

Next question: where does this leave us on P0546

There are some issues that still need to be addressed here, but it seems reasonable to leave this as a separate paper. There are a lot of rules you might want to discuss for AccessProperties in general. For instance, can you unconditionally convert between span<int[], Property1> and span<int[], Property2>? What about span<int[]> and span<int[], Property> and back? I don't think we want to say "only one property", but this implies that the presence or absence of all (exponentially many) combinations of properties needs to be addressed. Moreover, every time someone introduces a new property, there is a precedent that implies that you'll have to implement (or prohibit) exponentially many types (though templates help with this some). We're dealing with this right now in the Executors paper, and it's not an easy problem. I think this discussion alone justifies a "generic" AccessProperties paper that sets out some ground rules.

Some of the major insights from looking at the analogous discussion with executors:

We should choose between either "all properties allowed with all properties" or "no properties allowed together". I've already said that I think the second one is a no-go. For instance, there's no reason someone shouldn't be able to make a span<double, atomic_access, bound_checking>. The lesson from executors so far has been that it is a huge mess to allow some combinations and prohibit others. Properties should be orthogonal. If they aren't naturally so, your interface should essentially require them to "fake it" (more on this below)
Wherever possible, attributes of spans with a combination of properties should be expressible in terms of attributes of the spans with each of the properties individually. For instance, the noexcept clause of span<T, Props...> should be expressible as noexcept(noexcept(span<T,Props>{}[index_type{}]) && ...) (if fold expressions worked on template parameters like this, which — side note — they should but don't). Places where this sort of expression was possible in the executors proposal have been way easier to deal with than places where it wasn't. I might even go so far as to say that parts of the general class template that can't be expressed this way shouldn't be allowed to be dependent on the AccessProperties.... We should at least explore this option and weigh the costs of potentially overconstraining some properties with the benefits of simplicity.
- An example where this could get sticky that immediately comes to mind is the combination of atomic_access and something like bounds_checking. The single specialization versions of both would want to override operator[], so which one do you call? Does this require every property to know about every other property, and for you to implement all of the combinatorial possibilities? I suggest that we consider a different route on this. We should explore the implications of the following design principle: for each method or free function overload that a property cares about, it should be implementable as some combination of a pre-method hook and a post-method hook. For instance, span<int[42], atomic_access, bounds_checking>::operator[] could be implemented as access_property_traits<atomic_access>::pre_subscript_hook(...), then access_property_traits<bounds_checking>::pre_subscript_hook(...), then span<int[42]>::operator[], then access_property_traits<atomic_access>::post_subscript_hook(...), then access_property_traits<bounds_checking>::post_subscript_hook(...), some of which could be no-ops. (This could also be implemented using the ADL-like traits idiom that seems to be favored now over the monolithic traits-class approach). Importantly, if this sort of thing gets people most of what they need, I don't think we should stall the proposal waiting for all of what we need.
For similar reasons, interconversion between spans of different properties is sticky. I suggest at least considering an "all-to-all" approach, in a spans with any set of properties is always convertible to another span with different properties, as long as their property-less versions are interconvertible. Again, this may be too constraining, but if it gets everyone most of what they need, we should probably run with it.
Another concern is property ordering. For instance, it might be surprising that binding an instance of span<int[73],UProp,VProp,WProp> to a const reference to span<int[73],VProp,WProp,UProp> causes the invocation of the copy constructor. We mostly can get away with this because we're working with a value type, but it's something to keep in mind.

More on this later, but the amount of text here seems to me to justify the existence of a "generic" properties paper to explore the customization point design itself.

We do still need P0856 and P0860 to be presented along with this, though, keeping in mind the following feedback from the end of the notes at the ABQ meeting:

we have precedent for poor design of customization points when they are not used (allocators).

dhollman commented 6 years ago

Oh, one more "executors lesson" I forgot: it pays to be very careful and deliberate early on about the difference between properties and their values. For instance, you shouldn't have a property called gpu_memory and one called cpu_memory because these are mutually exclusive (well, UVM aside...). Instead, there should be one property called something like memory_space with different values. Then you don't have to worry about all of this orthogonality stuff with respect to gpu_memory and cpu_memory; you just specify that a property can't have multiple values. (Note that "values" can still be types; they just have a specific internal mutual exclusion that properties don't have with other properties). This is probably obvious to those of us who have worked on Kokkos, but it still seems to be a point of confusion from time to time (even for me) on the executors proposal.

mhoemmen commented 6 years ago

mdspan is not iterable (?!?)

I'm OK with that. Iteration is a 1-D thing. Multidimensional iteration usually wants to know the indices.

Lots of discussion about constructors taking shared_ptr/unique_ptr (presumably the array partial specializations)

I've been prototyping stuff with {shared, weak, unique}_ptr lately. In particular, I wanted to model a memory pool: single owner, with multiple viewers. It's not really right to use unique_ptr for the owner, because of the non-unique viewers. I ended up using shared_ptr for the owner and weak_ptr for the viewers, so that viewers would have an option to test whether the owner still exists.

This is relevant because one important use case for an mdspan would be viewing or slicing some independently managed memory allocation. If we want constructors that take shared_ptr or unique_ptr, we beg the question about the ownership relationship between the input *_ptr and the mdspan. (Contrast that with the weak_ptr constructor that takes a shared_ptr. There, the relationship is clear.) If the mdspan just takes a raw pointer, we don't have to worry about defining that relationship.

mhoemmen commented 6 years ago

for each method or free function overload that a property cares about, it should be implementable as some combination of a pre-method hook and a post-method hook.

Hm, how do we define the order in which the hooks get applied? What if they don't commute? I guess they are supposed to be orthogonal; does that really just mean that properties always need to behave as if these hooks commute?

(Also, wouldn't you want to apply the post hooks in reverse order of the pre hooks?)

mhoemmen commented 6 years ago

for each method or free function overload that a property cares about, it should be implementable as some combination of a pre-method hook and a post-method hook.

Wouldn't that mean that atomic_access has to let non-atomic operator[] do its thing? I don't see how this would work.

mhoemmen commented 6 years ago

Oh wait, you could use C++17 constexpr if for this, right? Just protect the inner non-atomic operator[] with the constexpr if thing based on some combination of the properties.

mhoemmen commented 6 years ago

That constexpr if thing would work like this:

// class ... Props
if constexpr (... && invoke_default_implementation<Props>::value) {
  span<int[42]>::operator[] (whatever_arguments_go_here);
}

dhollman commented 6 years ago

Hm, how do we define the order in which the hooks get applied? What if they don't commute? I guess they are supposed to be orthogonal; does that really just mean that properties always need to behave as if these hooks commute?

I think the easy answer is that you would specify that the hooks are invoked in the order that the properties are given, and if they don't commute then that's the property implementer's fault (but it at least means that the user can figure out what's going on). It would mean that the properties are order dependent, but if the property implementers are being orthogonal enough this order should almost never matter (a lot of these operations should be compiler-reorderable anyway). Another option is to say that the invocation order is implementation-defined, meaning that any code that relies on an invocation order for these hooks would be disallowed (or something like that).

dhollman commented 6 years ago

Wouldn't that mean that atomic_access has to let non-atomic operator[] do its thing? I don't see how this would work

At least as I understand it, the atomic_access implementation we have uses a hash table of sharded locks that is dependent on the address of the mdspan itself (and maybe on the offset?). It's not actually using std::atomic objects. This sharded lock version could easily be implemented in terms of hooks. The constexpr if version isn't ideal because it lacks the extensibility of the hook version.

dhollman commented 6 years ago

The implementation of restrict_access in terms of hooks is a lot trickier, though. One option is to expose the type of the underlying pointer via a customization point. For instance (using the old, monolithic attributes class paradigm):

template <>
struct mdspan_property_attributes<restrict_access> {
  template <typename MDSpanType>
  struct pointer_type {
    using type = typename MDSpanType::pointer_type __restrict;
  };
  // ...
};

It's a bit difficult to see how this composes with other properties, but you could specify that it works as a chain just like the other hooks, for instance.

dhollman commented 6 years ago

Example of the hook I suggested above for restrict_access (seems to work with gcc, but nothing else)

https://godbolt.org/g/tHNBtj

mhoemmen commented 6 years ago

@dhollman wrote:

I think the easy answer is that you would specify that the hooks are invoked in the order that the properties are given, and if they don't commute then that's the property implementer's fault (but it at least means that the user can figure out what's going on).

I like this better.

mhoemmen commented 6 years ago

(seems to work with gcc, but nothing else)

Huh, works for recent enough Clang and Intel too, for me at least

dhollman commented 6 years ago

Huh, works for recent enough Clang and Intel too, for me at least

By "works", I don't mean that it just compiles. If you look at the instruction output, gcc "correctly" ends the two restrict overloads with movl $11, %eax (that is, return 11 as a constant), while clang and intel both do the actual addition in every case except for with the raw pointers (which they both get "correct" as well). I say "correctly" in quotes because they're allowed to completely ignore restrict, but the "performance correct" thing to do here is to return a constant.

mhoemmen commented 6 years ago

@dhollman Ah, I see, thanks!

dhollman commented 6 years ago

At least as I understand it, the atomic_access implementation we have uses a hash table of sharded locks that is dependent on the address of the mdspan itself (and maybe on the offset?).

Looking at the Kokkos code, I think I might be wrong about this — I must have been thinking of the atomics for arbitrary types thing from elsewhere. @crtrott and @hcedwar can correct me on this, but it looks like Kokkos implements the Kokkos::Atomic memory trait for, e.g., operator() and operator[] by modifying the return values of these methods to return something that essentially acts like (an extended version of) std::atomic. I'm still trying to figure out how this fits in to the property ontology. I have serious concerns about the usefulness of these properties in generic code if the the return type of operator() can be changed to something that doesn't support all of the same operations as the return type in the absence of that property, which would be the case if std::atomic were used. On the other hand, if we added something that wraps std::atomic and adds operator overloads, etc., I highly doubt LEWG and EWG would be okay with this kind of thing (i.e., if they thought we should have those operators on std::atomic, they would have put them there). @dsunder how are you addressing this in P0860?

hcedwar commented 6 years ago

span<T[],atomic_access>::operator[] returns atomic_ref<T> That is all that is required. The intent is that construction / setup of the span more efficient than construction of each individual atomic_ref.

kokkos / array_ref