Open jturner314 opened 3 years ago
Cool, I think it's a good idea.
The core of this idea has been attempted - I'm 90% sure we've talked about it but I can't see it in the issues, and I've played around with this as a code change. But never so fleshed out as this (To be clear - now it's a much better time to do it too, it feels like we know what we are allowed to do in unsafe Rust to a bigger extent now). I would be tempted to start with having struct ArrayBase<S, D> { core: ArrayRef<A, D>, data: S }
to start with, but there could be many ways to start. Something that doesn't touch the public interface and then migrates it piece by piece would probably be a good way to go.
I would like to reduce the monomorphization bloat. Do you have any example in particular where you have experienced this?
And yes, traitification has been the solution I have favoured but it seems like it would be good to do both.
I agree with everything you've written.
I believe the .deref_mut()
needs a call to the unshare method, it's probably the right place to put it. 🙂
With this - Borrow and all - can we then deprecate CowArray
? It has not developed so much.
I'm 90% sure we've talked about it but I can't see it in the issues
Now that you mention it, I vaguely recall this too.
I would be tempted to start with having
struct ArrayBase<S, D> { core: ArrayRef<A, D>, data: S }
to start with
I thought about this, but it would require ArrayRef
to be a different struct from ArrayBase
instead of a type alias of ArrayBase
, which may be less convenient for generic code. (If ArrayRef
was a field of ArrayBase
but wasn't a different struct, then ArrayBase
would be an infinitely recursive type.)
Something that doesn't touch the public interface and then migrates it piece by piece would probably be a good way to go.
A fairly minimal starting point would be to implement all of the RFC except for the "moving most methods to be implemented only on ArrayRef
" part. In other words, start by leaving them implemented for generic ArrayBase
. That way, the only breaking change would be adding the S: MetaMut
bounds to the methods that modify the shape/strides/pointer.
I would like to reduce the monomorphization bloat. Do you have any example in particular where you have experienced this?
It's not something I've investigated or noticed in practice (other than maybe compile times?); it's just something I'm aware of. Basically, by changing function parameter types to be &ArrayRef
or &mut ArrayRef
instead of being generic over the storage types, most monomorphization bloat due to storage types would be eliminated. A quick look through impl_methods.rs
shows that most methods are fairly small or call a less-generic inner function, so the RFC may not have as large an impact on ndarray
itself as I originally thought. However, there are some larger functions and extension methods in ndarray-stats
, ndarray-linalg
, ndarray-npy
, and my own private code that are generic over the storage type and haven't been modified to use the "less-generic inner function" trick. A couple of examples are partition_mut
in ndarray-stats
and write_npy
in ndarray-npy
.
And yes, traitification has been the solution I have favoured but it seems like it would be good to do both.
Yeah, long-term, I think a trait-based approach is the way to go.
I believe the
.deref_mut()
needs a call to the unshare method, it's probably the right place to put it. slightly_smiling_face
I assume you mean .ensure_unique()
? Good catch. I've edited the post to add it.
With this - Borrow and all - can we then deprecate
CowArray
? It has not developed so much.
This RFC would make it possible to use std::borrow::Cow<'a, ArrayRef<A, D>>
. However, with this Cow
type, it wouldn't be possible to modify the shape/strides/pointer without cloning the entire array. (Of course, you could always create a view, but sometimes that's less convenient than modifying the shape/strides/pointer directly.) In contrast, CowArray
allows the shape/strides/pointer to be modified without cloning the data, so it's still useful. (Although, you could argue whether or not it would still be useful enough to keep.)
Oh thanks for spelling that out, I guess once I was "set" on the separate ArrayRef
struct, I didn't think of RefRepr
as a proper data storage in ArrayBase
of its own. It makes sense now.
It seems like as defined ArrayBase<RefRepr<_>, _>
will have a self-referential Deref
impl and Rust won't like this (it will compile but cause errors on deref). Some trait magic might need to be used to make sure we don't have a self-loop there.
And yes, I have been trying to reduce impl bloat but this sounds like a good idea anyway! (like #819 etc)
It seems like as defined
ArrayBase<RefRepr<_>, _>
will have a self-referentialDeref
impl and Rust won't like this (it will compile but cause errors on deref). Some trait magic might need to be used to make sure we don't have a self-loop there.
Oh, good point. I didn't think of that. I tried a simplified example, and you're right:
error[E0055]: reached the recursion limit while auto-dereferencing `ArrayBase<RefRepr<{integer}>>`
--> src/main.rs:70:24
|
70 | println!("{:?}", a.first());
| ^^^^^ deref recursion limit reached
|
= help: consider adding a `#![recursion_limit="256"]` attribute to your crate (`playground`)
A workaround is to add a DataDeref
trait implemented by all the storage types other than RefRepr
and which is used as a bound for Deref
and DerefMut
. Here are quick-and-dirty playground examples for methods implemented on ArrayBase
and methods implemented on ArrayRef
. I've updated the RFC.
This might solve the long standing += operator papercut. The thing that says that we can't do the operation *array.column_mut(0) += 1.
fluently (In general: applying assign-ops to mutable views that are temporaries). With this, we can, just use the dereference there and *ArrayViewMut1
will have type ArrayRef1
which will implement the assign ops.
The separate structs are needed sooner or later I think, otherwise the ArrayBase
system is going to be a growing confusion factor. Even with the gains in convenience here.
(I'm just adding pros and cons as I find them)
This might solve the long standing += operator papercut.
This does, indeed, appear to be the case. This is a simplified, quick-and-dirty example in the playground.
The separate structs are needed sooner or later I think, otherwise the
ArrayBase
system is going to be a growing confusion factor. Even with the gains in convenience here.
I could go either way on this. I've added my thoughts on this to RFC above.
More details about the "easing the transition":
If ArrayRef
is not a separate struct, then existing functions that accept &ArrayBase
would still be compatible with &ArrayRef
. Ideally, they would eventually be modified to take &ArrayRef
instead, but it wouldn't be as urgent.
If ArrayRef
is a separate struct, then existing functions that accept &ArrayBase
would not be compatible with &ArrayRef
. So, until libraries update to the new &ArrayRef
type, users may find ArrayRef
to be inconvenient. That said, this would strongly encourage libraries to update, and since ArrayRef
will have to be introduced in a breaking release anyway, we could strongly encourage libraries to switch to ArrayRef
when they bump the ndarray
dependency.
Do we have a reference for the fact that casting between pointers to these two structs (const ArrayBase to const ArrayRef) that have identical common prefix, is legal?
I've updated RefRepr
to be repr(C)
in the RFC to make sure that it has size 0 and alignment 1.
As far as I can tell, the cast is legal. The Rust reference documents the C representation in detail.
Fields lining up: With repr(C)
, the dim
, strides
, and ptr
are guaranteed to line up between the two types. After casting to ArrayRef
, the offset to the ArrayRef
's data
field may end up inside padding between the original instance's ptr
and data
fields, but, AFAIK, that's fine since RefRepr
is zero-sized. (In fact, the reference says that it's okay to read padding bytes, which I didn't realize.)
Sizes: The size of ArrayRef<A, D>
will not extend past the end of the ptr
field, except for possibly padding, and it's not possible to construct an ArrayBase<S, D>
with less padding after ptr
than ArrayRef<A, D>
(since RefRepr
has size 0 and alignment 1).
Alignment: The alignment of ArrayRef<A, D>
will always be less than or equal to the alignment of ArrayBase<S, D>
for any S
, since the alignment of RefRepr
is 1.
That said, I may be missing something.
The guaranteed-safe option would be to make ArrayRef
a separate struct and replace the dim
, strides
, and ptr
fields of ArrayBase
with a single struct field. While I originally somewhat preferred making ArrayRef
an instance of ArrayBase
, the idea of making it a separate struct is growing on me. If we did make ArrayRef
a separate struct, I'd do it like this:
struct ArrayMeta<A, D> {
dim: D,
strides: D,
ptr: NonNull<A>,
}
pub struct ArrayBase<S, D>
where
S: RawData,
{
meta: ArrayMeta<S::Elem, D>,
data: S,
}
mod array_ref {
#[repr(transparent)]
pub struct ArrayRef<A, D> {
meta: ArrayMeta<A, D>,
}
impl<A, D> ArrayRef<A, D> {
pub(crate) fn meta(&self) -> &ArrayMeta<A, D> {
&self.meta
}
}
impl<A, S, D> Deref for ArrayBase<S, D>
where
S: Data<Elem = A>,
{
type Target = ArrayRef<A, D>;
fn deref(&self) -> &ArrayRef<A, D> {
let meta_ptr = &self.meta as *const ArrayMeta<A, D>;
unsafe {
&*meta_ptr.cast::<ArrayRef<A, D>>()
}
}
}
impl<A, S, D> DerefMut for ArrayBase<S, D>
where
S: DataMut<Elem = A>,
{
fn deref_mut(&mut self) -> &mut ArrayRef<A, D> {
self.ensure_unique();
let meta_ptr = &mut self.meta as *mut ArrayMeta<A, D>;
unsafe {
&mut *meta_ptr.cast::<ArrayRef<A, D>>()
}
}
}
}
pub use array_ref::ArrayRef;
The reason for keeping ArrayRef
separate from ArrayMeta
and in its own array_ref
module (and not putting anything else in array_ref
) is so that, outside of the small array_ref
module, we can say things like, "if you have a &mut ArrayRef<A, D>
, then it's safe to mutably access the elements it references". It also prevents accidental modification of an ArrayRef
's meta
field.
About the reference to docs - that's neat. I was hoping we'd find it spelled out when exactly it's legal to cast between pointers to two different struct types. In C we can't do this pointer cast even if the types are in theory representation identical - but that's due to type-based alias analysis which we don't have in Rust.
The following RFC seems to be a very current strain of the representation compatiblity discussion https://github.com/rust-lang/rfcs/pull/2981
About separate struct or not - there seems to be downsides with either alternative. if ArrayRef
is just another parameterization of ArrayBase
then it seems like that's the smoothest change for existing code, existing users should barely notice.
I was hoping we'd find it spelled out when exactly it's legal to cast between pointers to two different struct types. In C we can't do this pointer cast even if the types are in theory representation identical - but that's due to type-based alias analysis which we don't have in Rust.
Okay, I understand. I didn't know that about C; that sounds really limiting. I thought manipulations like this were somewhat common in C. Do people work around the alias analysis by converting pointers to integers and back again (to break pointer provenance), use unions, or something else?
I think the cast to ArrayRef
would be well-defined according to RFC 2981, but I'm not sure much of RFC 2981 describes current Rust and how much is just a proposal. Another reference is the Transmutes page of the Rustonomicon, but it's pretty vague on what's allowed, although it does imply that transmutes between different types with the same layout can be okay. I would feel more comfortable if someone with more expertise than me could provide guidance. The part of the cast that makes me the most nervous is the zero-sized data
field in the ArrayRef
, which may technically be located in padding of the original struct. Maybe we could reach out to the Unsafe Code Guidelines Working Group.
As far as implementing the RFC goes, I'd be happy for someone else to implement it. I wrote up this RFC now because I think that it's a good idea that'll solve problems with the current API, and I didn't want to forget about it. If no one else implements it, I'll probably get around to it eventually, but it'll be a while. (I'd guess at least half a year.)
C is tricky. The most useful rule is that one can cast a struct pointer to and from the first member's pointer https://stackoverflow.com/questions/9747010/does-accessing-the-first-field-of-a-struct-via-a-c-cast-violate-strict-aliasing
And then type punning through a union is ambiguous if it's really a rule, https://stackoverflow.com/questions/11639947/is-type-punning-through-a-union-unspecified-in-c99-and-has-it-become-specified
With that said, I think the casts in this RFC feel right but it makes sense to make sure every step in the reasoning here has support.
I would be remiss not to mention the compiler option (in C) "-fno-strict-aliasing" which turns off the strict type-based aliasing rules. Many projects use that so that they can have more fun. That's pragmatic but it's basically turning off one of the language rules.
As far as implementing the RFC goes, I'd be happy for someone else to implement it.
Hi! I hope this isn't presumptuous, but I've started an implementation of this RFC; I figure if nothing else, it will let me really understand ndarray
's internals; I also just spent a while developing a library at work that was operating on ArrayView
s and personally experienced the annoyance that this RFC would relieve.
I've opted for the "separate struct" implementation as prototyped by @jturner314's comment above, and just wanted to mention it and address one note that I've found so far.
I think that the various debug_assert!(self.pointer_is_inbounds())
that verify safety in debug builds are no longer possible with the separate ArrayRef
. I don't think this is a huge deal - after all, it's exclusively used for debug assertions anyway, so it doesn't change release safety - but it's worth mentioning. I believe this could be remedied by relying on the len()
method (which uses dim.size()
instead of the OwnedRepr
len
), but I'm not sure that quite gets the desired safety check.
I'm also wondering whether Arc
could join Cow
and get eliminated? But that's a discussion for a bit farther down the road, I think - Is should get my code to compile first 🙃
If you're curious, the implementation (which doesn't even build yet) is located here.
Ok, I've made some progress on the implementation and (in doing so) gotten a much stronger understanding of ndarray
's guts. I've run into a challenge, however, that I'm curious to get some feedback on. I personally really like the "separate struct" implementation of ArrayRef
; I found ndarray
's storage types to be challenging to understand when I got started, and liked the idea of reducing the cognitive load. However, the S: Data
and S: DataMut
bounds on ArrayRef - which create that safety guarantee outside of the array_ref
module - also mean that RawArrayView
s and RawArrayViewMut
s cannot be dereferenced into ArrayRef
s. So if you start moving methods away from ArrayBase
(such as the widely-used ndim
) and into ArrayRef
, those methods are either a) non-existent for Raw...
or b) must be re-implemented with identical code for Raw...
.
First off, if I've misunderstood an issue, please stop here and let me know. I've been thinking this over and I would like to discuss a few solutions. The first is to essentially re-implement all methods that are appropriate for raw arrays, but I strongly dislike that much copied code. It could potentially be done by using a macro that implements a given behavior for both, but this feels a little like using macros to paste over poor design. Either way, implementing functions for both ArrayBase<S: RawData, D>
and ArrayRef<A, D>
essentially means that all the existing code (and any bloat) are still there, just with even more code for ArrayRef
.
The second idea was to move closer to a design suggested at the end of the RFC: rely on traits a bit more, but also push the idea of Raw
down to the Ref
level. As far as I can tell, there are three types of behavior that should be separated:
ndim
, or others that only read metadata) and should therefore be available both for Raw
variants and normal onesRaw
variantsArrayBase
variantsFurthermore, we'd like to push as much code as possible down to the Ref
level to provide a cleaner function interface and to reduce monomorphization bloat. So, here's what I was thinking:
RawArrayRef
and ArrayRef
Deref
so that S: RawData
derefs to RawArrayRef
and S: Data
derefs to ArrayRef
RawArrayRef
and ArrayRef
with the behavior described by (1) above, imbue ArrayRef
with the behavior described by (2) above, and imbue ArrayBase
with the remaining behavior described by (3) above.The problem is, I really cannot find a way to implement Deref
twice for the same type. Trait specialization (RFC 1210) might be able to do it, but that seems unlikely to land any time soon. I tried toying with having ArrayRef
be a const generic-parameterized type (ArrayRefBase<const R: bool>
) and adding an associated type const RAW: bool
to the RawData
trait (playground here); but apparently you can't use type parameters in const expressions. I even tried autoref specialization, but the blanket Deref
implementation messes this up.
So I circle back and am left with using either a macro (if we want to hide meta
) or a common trait (if we're ok with a fn meta(&self) -> &ArrayMeta
) to implement the same behavior for ArrayRef
and ArrayBase<S: RawData, D>
.
Any ideas are greatly welcome; pointing out that I missed something and this really is possible would be even more welcome 😁 thanks!
Please post your thoughts, or let me know if something is unclear/confusing!
Summary
This RFC proposes the addition of a new storage type
RefRepr
and associated type aliasesArrayRef<A, D>
,ArrayRef1<A>
,ArrayRef2<A>
, etc., such that we can implement the following traits:Borrow<ArrayRef<A, D>>
forArrayBase<S, D> where S: Data
Deref<Target = ArrayRef<A, D>>
forArrayBase<S, D> where S: Data
BorrowMut<ArrayRef<A, D>>
forArrayBase<S, D> where S: DataMut
DerefMut<Target = ArrayRef<A, D>>
forArrayBase<S, D> where S: DataMut
ToOwned<Owned = Array<A, D>>
forArrayRef<A, D>
&'a ArrayRef<A, D>
would be analogous toArrayView<'a, A, D>
, and&'a mut ArrayRef<A, D>
would be analogous toArrayViewMut<'a, A, D>
.Many of the methods on
ArrayBase
would be moved to be implemented only onArrayRef
instead, and be available through deref coercion, similar to how much of the functionality ofVec<T>
is actually implemented on[T]
.Motivation
Good way to express function parameters
Consider a function that needs a view of a 2-D array of
f64
elements. With currentndarray
, we have the following ways to express this:Each of these has disadvantages:
Options 1 and 2 are verbose, and become particularly bad for functions taking many array parameters.
Option 4 is unnatural for callers of the function. (In most cases, they have to call
.view()
on the argument, which is verbose and is different from most Rust code, which typically uses references for this use-case.)Options 1, 2, and 3 are generic, so they suffer from monomorphization bloat, as discussed in the next section. For functions taking multiple array parameters, it's even worse—the number of possible type combinations is exponential in the number of array parameters. (To minimize the bloat, it's necessary to create an inner function like option 4, and make the outer function convert the parameter(s) to
ArrayView
and call the inner function.)With this RFC, it will be possible to express the function like this:
This is more concise than the other options, and it's not generic, so there is no monomorphization bloat.
Mutable views would be handled similarly, just with
&mut
instead of&
.Substantially reduce monomorphization bloat
Most of the functionality of
ndarray
is currently implemented as methods onArrayBase<S, D>
. As a result, the methods are monomorphized for each new combination ofS
andD
types, which means that there's a lot of nearly-identical generated code, i.e. "monomorphization bloat". The same issue occurs for crates which provide extension traits, such asndarray-linalg
,ndarray-stats
, andndarray-npy
. (It's possible to minimize monomorphization bloat for a method by changing it to convert everything toArrayView/Mut
and then call an inner function which takesArrayView/Mut
parameters. However, that's very verbose and tedious.)If, instead, we move most functionality to be implemented only on
ArrayRef<A, D>
, analogous to how much of the functionality ofVec<T>
is actually implemented on[T]
, then the monomorphization bloat due to differing storage types would be eliminated. (Callers would rely on deref coercion, like they do for methods actually implemented on[T]
forVec<T>
.)Enable in-place arithmetic operator syntax for the result of a slicing operation
Currently, it's not possible to use in-place arithmetic operator syntax for the results of slicing operations; instead, it's necessary to call the appropriate method:
The error message is
By making
ArrayBase
dereference toArrayRef
and implementing the in-place arithmetic operators onArrayRef
, this will compile without errors.Better integration into the Rust ecosystem
The existing Rust ecosystem provides functionality dependent on traits like
Borrow
. Implementing these traits will allowndarray
to interact more cleanly with the rest of the ecosystem. See #878 for one example.Guide-level explanation
In many ways, an owned array
Array<A, D>
is similar toVec<A>
. It is an allocated container for a collection of elements.The new
ArrayRef<A, D>
type for arrays is analogous to the[A]
(i.e. slice) type for vectors. Arrays dereference toArrayRef<A, D>
, likeVec<A>
derefs to[A]
, so functionality available forArrayRef<A, D>
is available for other array types through deref coercion, like how functionality available for[A]
is available forVec<A>
through deref coercion.When writing a function, you should use
&ArrayRef<A, D>
or&mut ArrayRef<A, D>
as the parameter types when possible, similar to how you'd use&[A]
or&mut [A]
instead of&Vec<A>
or&mut Vec<A>
.In general, handle
ArrayRef<A, D>
analogously to how you'd handle[A]
. The primary ways in which they are not analogous are the following:ArrayRef<A, D>
isSized
, unlike[A]
. As a result,&ArrayRef<A, D>
is a thin pointer, while&[A]
is a thick pointer.Due to technical limitations, given an array it's not possible to directly create an
&ArrayRef
that is a view of a portion of the array. In other words, you can create a slice (i.e.&[A]
) of a portion of a&[A]
/Vec<A>
with indexing, e.g.&var[3..12]
, but this is not possible for&ArrayRef
. In order to create an&ArrayRef
that references a portion of an array, you have to first create anArrayView
(by slicing, etc.), and then coerce thatArrayView
toArrayRef
.Also note that
&'a ArrayRef<A, D>
is similar toArrayView<'a, A, D>
, and&'a mut ArrayRef<A, D>
is similar toArrayViewMut<'a, A, D>
, with the following exceptions:You don't have call a method like
.reborrow()
(forArrayView/Mut
) to manually shorten the lifetime of the borrow. The lifetimes of references toArrayRef
will be automatically adjusted as necessary, just like other references in Rust.The shape, strides, and pointer of an
&ArrayRef<A, D>
or&mut ArrayRef<A, D>
cannot be modified. For example, you can't call.slice_axis_inplace()
. To get an array with different shape/strides/pointer, you have to create a view. For example, you could call.slice_axis()
, or you could create a view with.view()
and then call.slice_axis_inplace()
on the view.Implementation
Implementing
Deref
andDerefMut
withTarget = ArrayRef<A, D>
The new
RefRepr
storage type will be:The
ArrayBase
struct will be changed to the following:The important parts of this change are:
The
dim
,strides
, andptr
fields are the first fields in the struct.The struct is declared
repr(C)
.These changes are necessary so that it's possible to implement
Deref
andDerefMut
by casting references toArrayBase
into references toArrayRef
.@bluss pointed out below that it's necessary to avoid infinite recursion in deref coercion. To do this, a
DataDeref
trait will be added:This trait will be implemented by all storage types except for
RefRepr
and will be used as a bound for theDeref
andDerefMut
implementations.Deref
andDerefMut
will be implemented like this:Moving most existing methods to be implemented only for
ArrayRef<A, D>
Most of the existing methods which don't modify the shape/strides/pointer (such as
slice
,mapv
, etc.) will no longer be implemented for allArrayBase<S, D>
, but rather will be implemented only onArrayRef<A, D>
. This change isn't necessary in order to introduce theArrayRef
type, but it is necessary in order to reduce the existing monomorphization bloat, which is one of the primary benefits of theArrayRef
type.Data traits for
RefRepr
The
RefRepr
type will implement the following traits:RawData
RawDataMut
Data
(but make theinto_owned
implementation panic withunreachable!()
)DataMut
RawDataSubs
In other words, the
&'a
or&'a mut
in a reference to anArrayRef
will control the lifetime and mutability of the data.We need to audit all of the available methods to make sure that:
&'a ArrayRef<A, D>
only for a base array withS: Data
that lives at least as long as'a
,&'a mut ArrayRef<A, D>
only for a base array withS: DataMut
that lives at least as long as'a
, andArrayRef<A, D>
(I believe this is already true if we just add the
Deref
,DerefMut
,Borrow
, andBorrowMut
implementations, but we should check.)Preventing modification of shape/strides/pointer via
&mut ArrayRef<A, D>
To minimize confusion and behave analogously to
&mut [A]
, it should not be possible to modify the underlying shape/strides/pointer through a&mut ArrayRef<A, D>
. For example,To achieve this:
A new
MetaMut
trait will be added to control whether the shape/strides/pointer can be modified.MetaMut
will be implemented for all of the existing storage types (OwnedRepr
,ViewRepr
, etc.), but not the newRefRepr
type.A
S: MetaMut
bound will be added to all methods which modify the shape/strides/pointer in-place (such asslice_collapse
,slice_axis_inplace
,collapse_axis
,swap_axes
,invert_axis
,merge_axes
,insert_axis_inplace
, andindex_axis_inplace
) .Drawbacks
This proposal introduces yet another storage type, when
ndarray
already has quite a few.The similarity of
&'a ArrayRef<A, D>
toArrayView<'a, A, D>
and&'a mut ArrayRef<A, D>
toArrayViewMut<'a, A, D>
could be confusing to new users.The requirement that it must be possible to create
ArrayRef
references by reinterpreting the bits in existingArrayBase
instances limits the possible field layouts ofArrayBase
.Rationale and alternatives
Allow mutation of the shape/strides/pointer for
&mut ArrayRef<A, D>
It would be possible to avoid the new
MetaMut
trait by allowing modification of the shape/strides/pointer through&mut ArrayRef<A, D>
instances. However, this would likely be confusing to users who are familiar with how&mut [T]
/Vec<T>
behave, and it would increase the potential for bugs.Keep most method implementations on all
ArrayBase<S, D>
It would be possible to keep the existing method implementations for all
ArrayBase<S, D>
instead of moving them to be implemented only onArrayRef<A, D>
. However, this would not bring the reduced monomorphization bloat which is one of the advantages of the newArrayRef
type.Make
ArrayRef
be a separate struct fromArrayBase
It would be possible for
ArrayRef
to be a separate struct fromArrayBase
, instead of an alias ofArrayBase
. This would have the following advantages:Deref
,DerefMut
,Borrow
, andBorrowMut
could be implemented forArrayBase
withoutunsafe
code.MetaMut
andDataDeref
traits wouldn't be necessary.The disadvantages of a separate struct type are:
The transition to
ArrayRef
would be more disruptive. Until libraries update their functions to take&ArrayRef
or&mut ArrayRef
parameters instead of&ArrayBase
or&mut ArrayBase
parameters,ArrayRef
would not be compatible with those functions. (Users of those libraries would likely have to call.view()
/.view_mut()
in some places.)It may be less convenient for generic code which wants to operate on all array types.
For example, if
ArrayRef
is not a separate struct, then these impls are sufficient forAddAssign
:(Note that, unfortunately, we can't replace
impl<'b, A, B, S2, D1, D2> AddAssign<&'b ArrayBase<S2, D2>> for ArrayRef<A, D1>
withimpl<'b, A, B, D1, D2> AddAssign<&'b ArrayRef<B, D2>> for ArrayRef<A, D1>
. For some reason, the compiler has trouble figuring out which impl to use whenimpl<A, B, D> AddAssign<B> for ArrayRef<A, D>
also exists.)However, if
ArrayRef
is a separate struct, then it would be necessary to add another impl:Custom DST (i.e. custom thick pointer types)
Instead of using thin pointers to
ArrayRef
, we could wait for custom DSTs (e.g. RFC 2594) to be added to Rust, and then create a custom DST similar toArrayRef
. This would have the following advantages:Since the shape/strides/pointer would be stored within the thick pointer, it would be possible to modify the shape/strides/pointer in the thick pointer in-place, unlike for a reference to an
ArrayRef
.Slicing methods could return a thick pointer instead of an
ArrayView/Mut
except in theIxDyn
case.Raw thick pointers (i.e.
*const
/*mut
) would be more useful. In other words, since the shape/strides would be stored within the thick pointer itself, it should be possible to access that information without dereferencing the pointer. So, except in theIxDyn
case, thick pointers could replaceRawArrayView/Mut
. In contrast, given a*const ArrayRef<A, D>
or*mut ArrayRef<A, D>
, you have to dereference the pointer to access the shape/strides, soArrayRef
cannot replace most uses ofRawArrayView/Mut
.However, I'm not aware of any Rust RFC for custom DSTs which would make it possible to create a custom DST with support for the dynamic-dimensional (
IxDyn
) case. So, there would be this weird dichotomy between the convenience of thick pointers for the const-dimensional case and the complexity of the dynamic-dimensional case. In contrast, theArrayRef
type proposed in this RFC would support the const-dimensional and dynamic-dimensional cases in a uniform way.Future possibilities
One thing I wanted to make sure when writing this proposal was that it would still allow replacing the
ArrayBase
struct with a trait-based API in the future. In other words, if we hadinstead of having all arrays be instances of
ArrayBase
, it would be nice for this proposal to still apply. This proposal would, in fact, be compatible with a trait-based API, and the implementation could be somewhat cleaner because it wouldn't require anyunsafe
code. I'd suggest implementing the types like this: