xtensor-stack / xtensor

C++ tensors with broadcasting and lazy computing
BSD 3-Clause "New" or "Revised" License
3.31k stars 392 forks source link

Create `xt::xvector` #2020

Open tdegeus opened 4 years ago

tdegeus commented 4 years ago

In some cases it would be great to have features like push_back and insert for a 1-d container (like std::vector)

JohanMabille commented 4 years ago

Let's create xt::xvector since it will be an expression with broadcasting abilities. We first need to update svector so it provides push_front / pop_front methods.

JohanMabille commented 4 years ago

Actually push_front and pop_front are not part of the std::vector API. This does not prevent to add them to svector, however the implementation will not be optimal. Do you need these methods or can we omit them for now?

tdegeus commented 4 years ago

Let's omit them for now. We can always add them if they are requested

tdegeus commented 4 years ago

I wanted to start with implementing xt::xvector, but in fact I need some guidance.

JohanMabille commented 4 years ago

Ha I started yesterday! Let me know if you want to do it anyway, that can be a good way to dive into the core of xtensor.

I should probably follow https://xtensor.readthedocs.io/en/latest/external-structures.html#embedding-a-full-tensor-structure ?

So basically, the xvector class will be very similar to the xtensor one, with some specific features:

In fact I realised that I cannot find where xtensor and xarray are defined as classes.

xtensor and xarray are specializations of xtensor_container and xarray_container with a default data container which is uvector. You can find their definition in xtensor_forward.hpp

I thought it should be completely independent of xt::svector, how should I place your earlier remark?

Like xarray is a specialization of xarray_container, xvector should be a specialization of xvector_container for a default data container. This default data container should be xt::svector instead of std::vector because this latter always initializes its content even when it is about to be assigned the result of a computation.

tdegeus commented 4 years ago

That was part of the reason that I started on it, to learn the internals. At the same time, if you already have it, or allocated time for it, I don't want to stop you.

JohanMabille commented 4 years ago

Well I don't have that much, so I let you do it ;). Happy to give some guidance if you need, do not hesitate to open a PR early so we can both comment on it.

spectre-ns commented 2 years ago

I could really use this functionality. It feels weird jumping back and forth between xtensor and std when trying to do essentially numpy.Ndarray.append(). Love xtensor!

tdegeus commented 2 years ago

@spectre-ns would you be willing to help with this contribution?

spectre-ns commented 2 years ago

@tdegeus I saw that here was a pull request in draft. How much work remains?

tdegeus commented 2 years ago

Honestly, I don't remember. Normally making this class should not be very hard as is subclasses classes that provide almost everything.

I can look in the next two weeks, but please do not feel slowed down by this, we welcome a new PR!

spectre-ns commented 2 years ago

@tdegeus I'm thinking for the functionality I'm looking for I would be better served to implement this part of the numpy API. https://numpy.org/doc/stable/reference/generated/numpy.append.html

Is there something about the implementation of xarray that prevents it from being appended to with another xexpression?

tdegeus commented 2 years ago

I think it should be perfectly implementable. Note, however, that according to the documentation you link there is copying involved each time. So I guess that is is quite a costly function.

spectre-ns commented 2 years ago

Yeah I agree. There is an upper bound on the problem I'm trying to solve. I think the best approach would be to allocate the upper bound then trim the remainder away at the end... Always appreciate your insight!

tdegeus commented 2 years ago

A problem is that I think there is no way to trim away memory in xtensor :

https://xtensor.readthedocs.io/en/latest/quickref/basic.html?highlight=resize#resize

I actually forgot why that is. Do you remember @JohanMabille ?

JohanMabille commented 2 years ago

@tdegeus because we didn't take the time to implement it (preserving the elements is not trivial).

@spectre-ns I think you could reuse the concatenate and flatten functions to provide a first implementation of append. It would not be optimal, nor lazy, but a least it would do the job.

The optimized solution would be to crate a new kind of expression that would maintain internally the different arrays and / or appened values and make them appear as if they were in a single array. But that requires more work (but that is still easier than doing so for the concatenante function).

tdegeus commented 2 years ago

@JohanMabille So that is what I don't get. If storage is contiguous, and data is stored as a std::vector<T>, then shrinking should preserve data (and even pointers to it I think).

JohanMabille commented 2 years ago

the problem is not in 1D, but for higher dimension. Let's consider the following tensor:

{{ 0, 1, 2 }, { 3, 4, 5 }}

Its internal buffer is [ 0, 1, 2, 3, 4, 5 ]. Now let's say you want to resize it to (2, 2) so that it becomes:

{{ 0, 1 }, { 3, 4 }}

Now its internal buffer is [ 0, 1, 3, 4 ]. So you have to move the second row in the internal buffer so that it becomes contiguous to the first (new) one.

tdegeus commented 2 years ago

That is indeed perfectly ok, but my question arose from the docs : I got the impression that even in 1d items are not guaranteed to be preserved, which I do think should be the case. But now it seems that they might be. It would be good to add a note in the docs.