Open Robbepop opened 7 years ago
That sounds good, and I love a good illustration to go together with expressing an idea.
One concern I have here is that it's probably still not an efficient way to implement convolution, to go through this.
That sounds good, and I love a good illustration to go together with expressing an idea.
Thanks! :)
One concern I have here is that it's probably still not an efficient way to implement convolution, to go through this.
True! For performance the most promising implementation for convolution is to use a GPGPU approach. But what if one doesn't want to go that far? The main question for this issue for me is, if it is possible to provide such iterators with equal performance as extern definitions. But as far as I can imagine such iterators would perform much better if implemented internally in this crate with hidden data of ArrayBase
.
Also: What implementation would fit best? We could implement such iterator for example by using an item-sized local Array
within the iterator to reuse the memory for copying the data out of the nd-array.
Or we could also use iterators over iterators over the nd-array and thus avoid copying of elements completely but I don't know if this approach is faster since I do not know As far as I can think the strike-throughed idea is not feasable if we want ndarray
internals too well.ArrayView
as items for the new iterators.
Between as_ptr, as_mut_ptr, and the .axes()
iterator (each axis' length and stride), all the information of the raw representation is available, except for issues related to ownership, and they don't matter here. So I think it's possible to do it outside of the crate, but not so natural.
Between as_ptr, as_mut_ptr, and the .axes() iterator (each axis' length and stride), all the information of the raw representation is available, except for issues related to ownership, and they don't matter here. So I think it's possible to do it outside of the crate, but not so natural.
Okay, good to know about the currently possible "hack", but yeah it wouldn't be a very natural way to encode this behavious and would require unsafe rust to deref the pointers which is unnecessary.
For me both proposed iterators would be great and would be of great use for my projects!
Sure, the window iterator is a good idea. If I understand correctly, chunks has a chunk size in each dimension? .axis_chunks_iter() currently covers it if you just want to chunk it along one dimension, (using the full width of every other).
I also want to underline that using the unsafe
methods or lower level code is not a hack but a feature, we need to allow it so that people can use it to improve their programs if needed.
For my proposed idea I cannot use the axis_chunks_iter
since it has a different behaviour. The items of the chunks
iterator I proposed has the same dimension as the underlying Array
. It just acts like the 1-dimensional chunks iterator for slices but for n-dimensions.
For 2-dimensions I could really use this for the pooling function within the convolution.
And for a 3 dimensional array of size 10x10x10
I could for example chunk it with 5x5x5
cubes and would iterate over (10³ / 5³)
chunks in total.
For both iterators it should be decided whether we also are in need of full-iterator that (as described) above iterate over the full area, instead of only the valid area. This is possible since in ndarray
elements are all NDFloat
which is Copy
and has a default value 0.0
which we could simply use for invalid elements.
axis_chunks_iter gives views that have the same dimensionality (1d → 1d, 2d → 2d) though.
axis_chunks_iter gives views that have the same dimensionality (1d → 1d, 2d → 2d) though.
Okay I will have to take another look into the docs. Maybe I have overseen things and it could really be used to simulate the chunks iterator that I proposed here.
I have found a semi-nice visualization of the proposed chunks iterator for 2 dimensions here.
Right, the current axis_chunks_iter cuts stripes, not squares. So they are different in that way.
How do you feel about the importance of a chunks
iterator facility for ndarray
?
Not important, but we can try to do it
How do you feel about me trying to do it via PR? (This would probably take a while since I have to wrap my head around the libraries internals.)
Go ahead.
Remember how windows
for slice has no mutable variant? That needs to be true for us too.
Maybe I should tell about the trick that we usually use for iterators: We reuse the basic element iterator (Baseiter) . How to construct a chunks iterator? Make an iterator that visits exactly the top right corner of each chunk (This can be done using slicing with a stride, which we can also emulate manually). For each element, you map it to an ArrayView that starts in that element (its pointer is the arrayview's pointer!)
Okay I will try! :)
@bluss
Today I looked into the ndarray code to wrap my head around the iterator and especially the BaseIterator
and ArrayBase
code.
It is not yet very clear to me but could it be possible to design a windows or chunks iterator without any copy operations into an iterator-internal owning ArrayBase
object by exploiting the strides of the ArrayBase
and just giving out cheap instantiations of ArrayView
s that have a proper ptr
and strides?
I need feedback on this.
I recognize it's not super easy to implement a new iterator, since it's so much about ndarray implementation details.
I think you would want do do something like InnerIter: have a Baseiter and also extra information. However for chunks/windows it needs to have a whole dimension/stride set, not just a single number (Basically D
instead of an Ix
). https://github.com/bluss/rust-ndarray/blob/master/src/iterators/mod.rs#L400
Yep I got that point with the D: Dimension
strides for the ArrayView
but I was just curious if the whole thing is really implemented in that way as I thought that we have to store an internal copy of all the floats within the current iteration window or chunk but with this architecture all that copying is not needed and thus the whole iteration should be a lot more efficient than I thought in the first place. ^.^
Let's see if I can get it working. :S
If I think of the chunks iterator and am not mistaken I think that
We use sliced dimension and sliced stride to initialize Baseiter (One can slice in place using the array method islice
, but it can be done with lower level functions as well). Each yielded ArrayView will have the dimensions of a chunk (clipped to the full array's boundaries) and strides that are equal to those of the original array(!)
This code could also be read to understand how to use a slice / subview of an array as a “map” of array views.
Imagine we have an array of N dimensions and want to cut it into separate one-dimensional “sticks” (or “lanes”).
That's what these methods do. Only difference is they use a callback instead of an iterator.
I think I understood how I can utilize islice
to create the iterators.
On iterator instantiation I create an ArrayView
of the n-dimensional Array
under iteration and with islice
I mutate the internal ArrayView
and give out references to it on a call to next
.
I could also use slice
method to create a new instance of the ArrayView
instead of mutating an old one. The advantage here would be that I could easily move the newly created ArrayView
on a call to next
and would yield values instead of references on iteration which would be nice in case of non-owning ArrayView
s.
The part that is currently unclear to me is, how I can perform the n-dimensional iteration of all possible Slice
s of the chunked or windowed Array
. Is there built-initerator functionality for that to yield all possible indices of the array? - Is the Indices
iterator the right utility for this approach?
We need an efficient way for iterating over all possible slices or indices. I saw that the iteration Item
for Indices
is a Vec<usize>
which implies tons of potential malloc
invokations. We should avoid that I think.
The iteration Item
for Indices
is D::Pattern
where D
is a dimension.
That means if D is Ix2
, the item is (usize, usize)
. If D
is IxDyn, dynamic number of axes, the item is indeed Vec<usize>
. It's now IxDyn
works, it uses a lot of dynamic allocations, but it's not the usual way you'd use ndarray. Ix1, Ix2 etc are the common ones.
This is maybe not so straightforward, you could leave this to me if you want.
That means if D is Ix2, the item is (usize, usize).
That's awesome!
This is maybe not so straightforward, you could leave this to me if you want.
I would like to try it out now if this is okay for you.
So I think I can design Windows
like so:
struct Windows<'a, A: 'a, D: 'a + Dimension> {
/// view of the entire array under iteration
view : ArrayView<'a, A, D>,
/// iterator for all possible valid window indices
indices: Indices<D>,
/// the shape of the windows
shape : Shape<D>
}
pub fn windows<'a, A: 'a, D: 'a + Dimension>(array: ArrayView<'a, A, D>, shape: Shape<D>) -> Windows<'a, A, D> {
Windows{
view: array.view(),
indices: indices_of(&array), // restrict array to area for valid windows shapes:
// e.g. array.shape() == [10, 8], window shape is [5, 4], then indices, area
// should be (0..5, 0..4).
shape: shape
}
}
impl<'a, A: 'a, D: 'a + Dimension> Iterator for Windows<'a, A, D> {
type Item = ArrayView<'a, A, D>;
fn next(&mut self) -> Option<Self::Item> {
match self.indices.next() {
None => None,
Some(idx) => {
let slice_arg = slice_at_idx_of_shape(self.shape, idx); // function that converts a shape and a given index into a SliceArg
Some(self.view.slice(slice_arg)) // return window with given shape at current index
}
}
}
}
Questions for me to solve:
indices
iteration area to valid windows for given shape s
.idx
in next
into a shape of the given shape `s.How close or far am I from your theory behind the codes for the windows iterator?
To create a windows-slice for slice
method from a given shape Dim
and an index (offset) Dim
without dynamic memory allocation in the special case of 2 dimensions I came to this code:
fn slice_at_idx_of_shape_2d<D: Dimension>(shape: D, idx: D) -> [Si; 2] {
[
Si(idx[0] as isize, Some((idx[0] + shape[0]) as isize), 1),
Si(idx[1] as isize, Some((idx[1] + shape[1]) as isize), 1)
]
}
Provided that this implementation is correct it would be best imo to create a 1D, 2D, ... 6D version for it and a generalize dynamic allocation requiring method to avoid dynamic allocation up to 5 dimensional arrays. With strides we might be able to easily implement chunks-iterators.
I have created a prototype implementation of a generic, malloc avoiding into_si_with_offset
trait implementation which can be found here.
This makes it possible to map a pair of shape: Dim
(windows/chunks size) and offset: Dim
(from indices) into a slice argument for slicing.
Making SliceArg available like that is good
I have sort of gone into this territory with Zip, a chunk iterator might show up there.
I haven't had quite enough time to work on this the last few days.
It is nice that you also worked on another path there which looks really promising!
Maybe I can actually use this zip a lot in order to improve already existing Prophet
code.
I have sort of gone into this territory with Zip, a chunk iterator might show up there.
Does that mean that you are able to implement a Chunks
iterator using this new facility?
I came to the conclusion that a Windows
iterator is a lot easier to implement than a Chunks
iterator.
Basically the template you just gave me needs only little changes for it to be a valid Windows
iterator.
What I want now with Zip is that one can zip together chunk producers and arrays.
If an array is dim (10, 10) and we get a chunks producer with (2, 2) chunks, the chunks producer has dim (5, 5) and we can zip it with other (5, 5) arrays.
Iterators are stateful, and handling the halfway-iterated state is troublesome. So I'm going the route (like rayon) of having “Producers” (Think of that as IntoIterator for the array zip). A producer can be split etc for parallelization. Some objects like Chunks is a Producer and IntoIterator, other objects like AxisIter can implement Producer and Iterator directly (because the axis iter is one dimensional, this is not a problem).
Sounds very nice so far - especially the improved integration with rayon!
The question that I am asking myself is: Do you still need a Chunks
and Windows
iterator implementation after this huge commit or is it self-contained in it?
If you no longer need it, I can at least serve you with the test cases implementation I came up with. There are many for chunks: (just for 2 dimensional arrays so far)
Odd chunks are shaped so fit into the remaining valid region, as in the standard chunks iterator.
We could maybe also implement an EvenChunks
iterator besides this one, that would simply skip odd chunks or panic on misusage (what you like better) which should give us some performance when we do not need to cover these many edge cases.
What do you think?
I've only implemented the "EvenChunks" case so far (I called it WholeChunks to be less ambiguous). Basically it was there because I needed it to ensure that the trait for n-ary Zip could support it.
I think that yes we want at least the windows iterator (and windows producer!)
chunks of size greater than the underlying array -> what is the expected behaviour for this?
For whole chunks, the result must be 0 chunks (all chunks produced have the requested size), for ragged chunks it would be as many ragged chunks that fit. If the chunk is larger than the array in all dimensions, the result is 1 chunk, surely? Not sure what to do if the source array is size 0.
How do you like working with the Result
type in such cases?
Could it be worth returning a Result<Chunks>
instead of Chunks
iterator directly in cases where it doesn't make sense to create such an iterator, like in the case of a zero sized array?
Or should we just return a ragged chunk of size zero? Or panic on zero sized arrays?
I personally thing the cleanest solution would be to simply return a zero sized ragged chunk, however, this is not the best solution to point a user to a logic error if this scenario is the result of one.
If it's a logic error and programmer's error we have a convention to just panic; for example [].chunks(0)
panics and [1, 2, 3][10] panics. Result is preferred for situations when errors is part of the regular use of the API.
I'm not sure zero sized arrays are common.
I think I have prototyped an implementation for the Windows
iterator based on the IterBase
- was actually easy compared to the Chunks
iterator with all its edge cases.
The implementations for the Windows
and Chunks
iterators will be splitted into two different PRs and the more difficult Chunks
iterator will follow soon, hopefully.
I am going to write some tests for the Windows
iterator now to verify that it is working as far as the tests are verifying it.
Great! I wrote some chunks tests for whole chunks (the easy version), maybe those can inspire. tests/iterator_chunks.rs
I'm wishing that we make an NdProducer out of Windows, so that we can use it in azip.
I have inspected your WholeChunks
iterator implementation. Maybe with your code as template I can get NdProducer
working with my Windows
implementation.
I also looked at your WholeChunks
test - looks kinda complicated to me.
My tests would look more like that:
#[test]
fn test_windows_iterator_simple() {
let a = Array::from_iter(10..30).into_shape((5, 4)).unwrap();
let mut iter = windows(a.view(), Dim((3, 2)));
assert_eq!( iter.next().unwrap(), arr2(&[ [10, 11], [14, 15], [18, 19] ]) );
assert_eq!( iter.next().unwrap(), arr2(&[ [11, 12], [15, 16], [19, 20] ]) );
assert_eq!( iter.next().unwrap(), arr2(&[ [12, 13], [16, 17], [20, 21] ]) );
assert_eq!( iter.next().unwrap(), arr2(&[ [14, 15], [18, 19], [22, 23] ]) );
assert_eq!( iter.next().unwrap(), arr2(&[ [15, 16], [19, 20], [23, 24] ]) );
assert_eq!( iter.next().unwrap(), arr2(&[ [16, 17], [20, 21], [24, 25] ]) );
assert_eq!( iter.next().unwrap(), arr2(&[ [18, 19], [22, 23], [26, 27] ]) );
assert_eq!( iter.next().unwrap(), arr2(&[ [19, 20], [23, 24], [27, 28] ]) );
assert_eq!( iter.next().unwrap(), arr2(&[ [20, 21], [24, 25], [28, 29] ]) );
assert_eq!( iter.next(), None );
}
I wanted to (KISS) keep it simple stupid. :S
Looks pretty good. I'm trying to learn to write more “exhaustive” style tests than just inputting examples. Your tests look good as they are, I'd probably use itertools::assert_equal to test the iterator's sequence though.
So for example I'd prefer a test that tested all possible window sizes, since that's more exhaustive.
Looks pretty good.
Okay thanks!
I'd probably use itertools::assert_equal to test the iterator's sequence though.
Awesome! Didn't know about its existance. :)
So for example I'd prefer a test that tested all possible window sizes, since that's more exhaustive.
True! I will first implement some naive implementations and later add some exhaustive tests. I would also like to test other dimensionalities, such as 3D.
#[test]
fn test_windows_iterator_simple() {
let a = Array::from_iter(10..30).into_shape((5, 4)).unwrap();
itertools::assert_equal(
windows(a.view(), Dim((3, 2))),
vec![
arr2(&[ [10, 11], [14, 15], [18, 19] ]),
arr2(&[ [11, 12], [15, 16], [19, 20] ]),
arr2(&[ [12, 13], [16, 17], [20, 21] ]),
arr2(&[ [14, 15], [18, 19], [22, 23] ]),
arr2(&[ [15, 16], [19, 20], [23, 24] ]),
arr2(&[ [16, 17], [20, 21], [24, 25] ]),
arr2(&[ [18, 19], [22, 23], [26, 27] ]),
arr2(&[ [19, 20], [23, 24], [27, 28] ]),
arr2(&[ [20, 21], [24, 25], [28, 29] ])
]);
}
All tests are passing now but this one:
/// Test that verifites that no windows are yielded on oversized window sizes.
#[test]
fn windows_iterator_oversized() {
let a = Array::from_iter(10..37)
.into_shape((3, 3, 3))
.unwrap();
let mut iter = windows(a.view(), Dim((4, 3, 2))); // (4,3,2) doesn't fit into (3,3,3) => oversized!
assert_eq!(iter.next(), None);
}
How is this functionality achieved in itertors using BaseIter
in general?
I do not want to implement it via a hack ...
Just make sure the dimension of the iterator is computed to be (0, 1, 2)
which means 0 elements
Ah nice, I got screwed by this because of another bug with underflowed subtraction. All tests are working now.
Sorry for all the churn around NdProducer and so on. It's a big project to make it work.. and a lot needs to change around it. Let me know, I'm fine with merging a windows iterator and fixing it to be ndproducer.
Sorry for not notifying you on the status of the current windows implementation and tests. I haven't had that much time in the past days ... I will make a PR today if I can get it working. =)
PR completed #306 - awaiting feedback
I have added a task-list for the todos until this issue can finally be closed.
We should discuss if we still need and/or want the RaggedChunks
iterator.
Good. Lists are good. Checking things off is the next best thing to closing issues :wink: Thanks for working on Windows and working with ndarray in general.
And thank you for maintaining all these awesome, high-quality libraries that are critical for the well-being of the rust ecosystem!
Already implemented features and todos:
ExactChunks
andExactChunksMut
aswell as theirNdProducer
implRaggedChunks
or simplyChunks
aswell as theirNdProducer
implWindows
NdProducer
impl forWindows
Implement n-dimensional
windows
andchunks
iterators.Windows-iterators are especially useful for convolutional neural networks while chunk iterations may prove useful for paralell algorithms.
N-Dimensionality for both iterators use N-dimensionally shaped items.
For example for a 1-dimensional Array the
windows
andchunks
items are also 1-dimensional. For a 2-dimensional Array thewindows
andchunks
items are 2-dimensional, and so on.For
windows
iterator it could be useful to support two different kinds ofwindows
iterators. One for "valid" windows only that iterate only over windows within a valid region. For example in a valid-windows iterator over a10x10
shaped 2d-array and window sizes of4x4
the iterator would iterate totally over(10-4+1)x(10-4+1)
items of size4x4
. A full-windows iterator would also iterate over the windows with invalid elements (on the edges of the array) and would simulate missing elements with zero elements.Semi-nice visualization of n-dimensional chunks and windows iterators.
A similar approach could be used for n-dimensional chunks that iterate beyond the array sizes and replace missing/invalid array elements with zero elements.