Open yesint opened 7 years ago
py::array
will automatically copy your data if you don't give it a base
argument in the constructor (though maybe that's indeed not very well documented).
If you don't want to copy, one solution would be to move the std::vector
into a py::capsule
and use that capsule as the base
for a new py::array
, then just continue using the v.data()
of that moved vector to construct. If I'm not mistaken, the returned py::array
will then keep that capsule alive and delete the vector once capsule can be and is garbage collected.
Untested code, but this should be the implementation of the non-copy approach:
auto v = new std::vector<int>(some_func());
auto capsule = py::capsule(v, [](void *v) { delete reinterpret_cast<std::vector<int>*>(v); });
return py::array(v->size(), v->data(), capsule);
Yes, probably more black magic than you might expect. But then again, you're not doing something simple either. You are keeping a C++ object alive to make sure you can access its internal data safely, without leaking the memory.
But if you don't mind the copy, just go:
auto v = some_func();
return py::array(v.size(), v.data());
@YannickJadoul, thank you very it really works! I don't mind doing black magic (and the magic is in fact quite logical) but currently the user is not even aware that this kind of magic exists. Are there any plans to document the usage of py::array and py::capsule? The constructors of these types are non-trivial and usage of base argument is, well, a bit arcane.
Another suggestion. Probably it makes sense to provide an easy non-copying conversion from any contiguous buffer to py::arrray? Something like:
auto v = new std::vector<int>(some_func());
py::array array_from_buffer<int>(v, int ndim, shape, strides);
which will create corresponding py::buffer_info and capsule internally? Could be a great addition in the cases when numerical data have to be returned, especially if one needs to wrap the function like:
void some_func(vector<int>& val1, vector<vector<float>>& val2);
Manual wrapping of each argument with py::buffer, py::capsule into the py::array becomes tedious in such cases.
but currently the user is not even aware that this kind of magic exists.
Agreed, I had to look into the actual headers to check the exact constructors, etc. But I don't know about planned documentation updates. If you feel like it, I'm sure a PR with more documentation on this would be gladly accepted ;-) Then again, I'm not always sure what's stable API and what're implementation details.
Probably it makes sense to provide an easy non-copying conversion from any contiguous buffer to py::arrray?
Not sure how easy that is to do (and how much more confusing this will make the whole situation). Maybe some kind of a static function as 'named constructor' could make sense, though?
By the way, std::vector<std::vector<int>>
is not a contiguous structure. And I don't think this technique works when (un)wrapping the arguments of a function. What I just described was a way of not copying a return std::vector
.
Sure, it won't work with "input" function parameters but works for "output" when one transforms c++ signature to python function returning tuple of numpy arrays instead of bunch of ref parameters (tht's exactly my case). In any case such thing should not be automatic - the user have to make it explicit in lambda.
Vector
@YannickJadoul your code works, thanks for the reference, I just wanted to point out that you are missing a parenthesis at the end of the line
auto cap = py::capsule(v, [](void *v) { delete reinterpret_cast<std::vector<int>*>(v); });
By the way,
anybody knows, how to get py::array_t
from std::shared_ptr<std::vector<T>>
without copy (and using new
/delete
)?
I tried this:
std::shared_ptr<std::vector<float>> ptr = get_data();
return py::array_t<float>{
ptr->size(),
ptr->data(),
py::capsule(ptr.get(), [](void* p){ reinterpret_cast<decltype(ptr)*>(p)->reset(); }),
};
Obviously, this will never work, because when return happens, ptr
will be deallocated from stack.
Using capture also does not help, because py::capsule
can't accept them:
std::shared_ptr<std::vector<float>> ptr = get_data();
return py::array_t<float>{
ptr->size(),
ptr->data(),
py::capsule([ptr](){ }), // using lambda-capture to increase lifetime of ptr
};
Worked this solution (which seems very dirty):
std::shared_ptr<std::vector<float>> ptr = get_data();
return py::array_t<float>{
ptr->size(),
ptr->data(),
py::capsule(
new auto(ptr), // <- can leak
[](void* p){ delete reinterpret_cast<decltype(ptr)*>(p); }
)
};
@arquolo Indeed, the only data that can be stored in a py::capsule
is a single void *
and a simple function pointer (this is a Python C API thing, by the way; pybind11 just made a C++ wrapper around it). So if you want the capsule to be a (co-)owner of the shared_ptr
, I would think that the last solution is the only one that works and stores the actual shared_ptr
object.
Is it that dirty, though? In the end, a capsule
taking a std::function
(or any kind of lambda/functor object) would incur this same allocation (inside of the std::function
) because of the variable size of the capture.
The one thing to note, though, is that the object doesn't need to be a capsule
. I can just as well be any other object
(though hopefully one that keeps the data alive), so if your shared_ptr
would be stored as member in a C++ class that is exposed to Python, you could also take use that py::object
.
We define the following utility functions, which have proven to be live savers :)
template <typename Sequence>
inline py::array_t<typename Sequence::value_type> as_pyarray(Sequence&& seq) {
// Move entire object to heap (Ensure is moveable!). Memory handled via Python capsule
Sequence* seq_ptr = new Sequence(std::move(seq));
auto capsule = py::capsule(seq_ptr, [](void* p) { delete reinterpret_cast<Sequence*>(p); });
return py::array(seq_ptr->size(), // shape of array
seq_ptr->data(), // c-style contiguous strides for Sequence
capsule // numpy array references this parent
);
}
and the copy version
template <typename Sequence>
inline py::array_t<typename Sequence::value_type> to_pyarray(const Sequence& seq) {
return py::array(seq.size(), seq.data());
}
Thanks @ferdonline However, the move-helper needs to change signature to:
template <typename Sequence,
typename = std::enable_if_t<std::is_rvalue_reference_v<Sequence&&>>>
inline py::array_t<typename Sequence::value_type> as_pyarray(Sequence&& seq)
With such fix, the compiler will warn you if you calls with without std::move
With such fix, the compiler will warn you if you calls with without std::move
@arquolo If you call without std::move, it will bind as an L-value reference and then inside it does the std::move
anyway. IMHO that's a fine behavior.
@arquolo If you call without std::move, it will bind as an L-value reference and then inside it does the
std::move
anyway. IMHO that's a fine behavior.
You will destroy the original container, then, though. That's quite unexpected if you didn't call the container with an rvalue reference.
Isn't the standard solution to use std::forward<Sequence>(seq)
? In that case you'll copy if you pass an lvalue reference, and you'll move if you get an rvalue or rvalue reference.
The function is called as_array
and the "docs" say it will move, so I think it's fine, but you choose.
It's standard to use std::forward
in case you want to pass on the same reference type. Here we don't care, we just want to transform whatever reference type to an rvalue reference.
By the way, anybody knows, how to get py::array_t from std::shared_ptr<std::vector
> without copy (and using new/delete)?
@arquolo , you might be interested in what I have found: https://github.com/pybind/pybind11/issues/323#issuecomment-575717041
If anyone's interested in a version of @ferdonline's utility function without explicit/manual new
and delete
:
template <typename Sequence>
inline py::array_t<typename Sequence::value_type> as_pyarray(Sequence &&seq) {
auto size = seq.size();
auto data = seq.data();
std::unique_ptr<Sequence> seq_ptr = std::make_unique<Sequence>(std::move(seq));
auto capsule = py::capsule(seq_ptr.get(), [](void *p) { std::unique_ptr<Sequence>(reinterpret_cast<Sequence*>(p)); });
seq_ptr.release();
return py::array(size, data, capsule);
}
Apart from avoiding new
and delete
, this also does not leak if for some reason py::capsule
would throw.
@YannickJadoul
template <typename Sequence> inline py::array_t<typename Sequence::value_type> as_pyarray(Sequence &&seq) { auto size = seq.size(); auto data = seq.data(); std::unique_ptr<Sequence> seq_ptr = std::make_unique<Sequence>(std::move(seq)); auto capsule = py::capsule(seq_ptr.get(), [](void *p) { std::unique_ptr<Sequence>(reinterpret_cast<Sequence*>(p)); }); seq_ptr.release(); return py::array(size, data, capsule); }
Apart from avoiding
new
anddelete
, this also does not leak if for some reasonpy::capsule
would throw.
I'm not sure this would work?
The memory would be freed early as there is nothing left to hold onto the heap allocation after the unique_ptr
goes out of scope.
Then another heap allocation could grab the same memory, and new writes could corrupt what is already there (i.e. the numpy buffer we just returned). See https://www.cplusplus.com/reference/memory/unique_ptr/get/.
@YannickJadoul This is what I am using:
/**
* \brief Returns py:array<T> from vector<T>. Efficient as zero-copy.
* - Uses std::move to obtain ownership of said vector and transfer everything to the heap.
* - Only accepts parameter using std::move(...), or else the vector metadata on the stack will go out of scope (heap data will always be fine).
* \tparam T Type.
* \param passthrough Numpy array.
* \return py::array_t<T> with a clean and safe reference to contents of Numpy array.
*/
template<typename T>
inline py::array_t<T> toPyArray(std::vector<T>&& passthrough)
{
// Pass result back to Python.
// Ref: https://stackoverflow.com/questions/54876346/pybind11-and-stdvector-how-to-free-data-using-capsules
auto* transferToHeapGetRawPtr = new std::vector<T>(std::move(passthrough));
// At this point, transferToHeapGetRawPtr is a raw pointer to an object on the heap. No unique_ptr or shared_ptr, it will have to be freed with delete to avoid a memory leak.
// Alternate implementation: use a shared_ptr or unique_ptr, but this appears to be more difficult to reason about as a raw pointer (void *) is involved - how does C++ know which destructor to call?
const py::capsule freeWhenDone(transferToHeapGetRawPtr, [](void *toFree) {
delete static_cast<std::vector<T> *>(toFree);
//fmt::print("Free memory."); // Within Python, clear memory to check free: sys.modules[__name__].__dict__.clear()
});
auto passthroughNumpy = py::array_t<T>(/*shape=*/{transferToHeapGetRawPtr->size()}, /*strides=*/{sizeof(T)}, /*ptr=*/transferToHeapGetRawPtr->data(), freeWhenDone);
return passthroughNumpy;
}
@sharpe5
The memory would be freed early as there is nothing left to hold onto the heap allocation after the
unique_ptr
goes out of scope.
That's why you call seq_ptr.release()
, to release ownership of the pointer, right? (but only after you're certain the creation of the py::capsule
worked) See https://en.cppreference.com/w/cpp/memory/unique_ptr/release
@YannickJadoul This is what I am using:
This seems quite similar (or the same?) to @ferdonline's utility function. As far as I can see, it will still leak memory when py::capsule
throws, because there's nothing holding on to that raw pointer? But yes, it probably won't, and if it throws, something else is probably wrong, so it's fine enough to use.
Also, it uses raw new
/delete
, which is what I tried and managed to avoid with my fragment.
@YannickJadoul You are right, your code is absolutely correct.
I can't help but think that the content of the capsule function is just a very complicated way of calling delete. I greatly prefer modern C++ and smart pointers, but if there is (void *) in the middle it becomes more difficult to reason about the data flow (for me at least!). Either smart pointers up and down the entire stack, or not at all? It is tricky to choose the right level of abstraction, and sometimes if one abstracts too much the intent gets obscured.
I did not see @ferdonline's utility function initially (see above), the one I quoted was written from first principles. It's somewhat interesting that they are virtually identical :)
I can't help but think that the content of the capsule function is just a very complicated way of calling delete.
Yes, it definitely is, but it does have the advantage of covering the corner case of exceptions in py::capsule
's constructor and applying the good practice of avoiding new
and delete
. I don't think it's that much more complicated,, so I just threw out that addition, if people want to use it. But do of course use what is most comfortable to you.
This issue has been resolved. @YannickJadoul has done a great job answering questions here. Further question are better suited for gitter.
I'm thinking. Maybe we can/should add a convenience function for this to pybind11, since it seems to be such a popular issue. I'll reopen to remind ourselves.
This seems to be a good place to use a memoryview for holding onto the buffer instead of a capsule? #2307 is useful for invalidating the buffer once it has been released.
Actually, I think I misunderstood the problem, never mind. A memoryview might be useful in some of these cases however.
For the record, I have a large Python module that has zero-copy communication between Python and C++ when working with columns in a DataFrame. It is zero-copy both ways, i.e. Python >> C++ and C++ >> Python.
It is blazingly fast.
I usually combine it with OpenMP or TBB to do multi-threaded calculations on the column data.
It is all in pybind11 and Modern C++ (except for one raw pointer reference which is wrapped in a function; see above). It's easily testable, when the function is called from C++ is accepts a templated vector, and when it is called from Python it accepts a templated span.
The zero-copy C++ >> Python adapter is in my post above.
This is the zero-copy Python >> C++ adapter:
/**
* \brief Returns span<T> from py:array_T<T>. Efficient as zero-copy.
* \tparam T Type.
* \param passthrough Numpy array.
* \return Span<T> that with a clean and safe reference to contents of Numpy array.
*/
template<class T=float32_t>
inline std::span<T> toSpan(const py::array_t<T>& passthrough)
{
py::buffer_info passthroughBuf = passthrough.request();
if (passthroughBuf.ndim != 1) {
throw std::runtime_error("Error. Number of dimensions must be one");
}
size_t length = passthroughBuf.shape[0];
T* passthroughPtr = static_cast<T*>(passthroughBuf.ptr);
std::span<T> passthroughSpan(passthroughPtr, length);
return passthroughSpan;
}
Hi, I would like to check whether the cleanup function is really called, so wrote the following code.
auto v = new std::vector<int>(some_func());
auto capsule = py::capsule(v, [](void *v) {
py::scoped_ostream_redirect output;
std::cout << "deleting int vector\n";
delete reinterpret_cast<std::vector<int>*>(v);
});
return py::array(v->size(), v->data(), capsule);
However, "deleting int vector"
is not printed out when I run a python script. I even add the following python code at the end of the python script, but there was no use.
import gc
gc.collect(2)
gc.collect(1)
gc.collect(0)
Could you help me to make the cleanup function called explicitly?
Thank you
@tlsdmstn56-2 You need to delete the variable returned by the pybind11 module on the Python side, or else the memory will not be freed. py::array
returns a zero-copy reference to the data, so the memory will be held on the C++ side until it is no longer needed on the Python side.
del my_variable
@sharpe5
For the record, I have a large Python module that has zero-copy communication between Python and C++ when working with columns in a DataFrame. It is zero-copy both ways, i.e. Python >> C++ and C++ >> Python.
It is blazingly fast.
I usually combine it with OpenMP or TBB to do multi-threaded calculations on the column data.
It is all in pybind11 and Modern C++ (except for one raw pointer reference which is wrapped in a function; see above). It's easily testable, when the function is called from C++ is accepts a templated vector, and when it is called from Python it accepts a templated span.
The zero-copy C++ >> Python adapter is in my post above.
This is the zero-copy Python >> C++ adapter:
/** * \brief Returns span<T> from py:array_T<T>. Efficient as zero-copy. * \tparam T Type. * \param passthrough Numpy array. * \return Span<T> that with a clean and safe reference to contents of Numpy array. */ template<class T=float32_t> inline std::span<T> toSpan(const py::array_t<T>& passthrough) { py::buffer_info passthroughBuf = passthrough.request(); if (passthroughBuf.ndim != 1) { throw std::runtime_error("Error. Number of dimensions must be one"); } size_t length = passthroughBuf.shape[0]; T* passthroughPtr = static_cast<T*>(passthroughBuf.ptr); std::span<T> passthroughSpan(passthroughPtr, length); return passthroughSpan; }
This is great for sharing the raw data, but how does it handle ownership? It looks like the short answer is that it doesn't, but maybe I'm missing something. Thanks!
@cchriste mentioned:
This is great for sharing the raw data, but how does it handle ownership? It looks like the short answer is that it doesn't, but maybe I'm missing something. Thanks!
Short answer: it doesn't, but that's fine as the parent Python function caller holds ownership for the duration of the call.
Remember, this is the "zero-copy Python >> C++ adapter", so Python creates the Numpy array, C++ modifies the array contents, then returns.
Here is an example scenario:
toSpan
method above to obtain a reference to this
array.This is really useful when modifying columns in a DataFrame.
It would be possible to break this if we really wanted to. The C++ side could create another thread, and that thread could start modifying the array behind Python's back, even after the original function call had returned and the Python side had deallocated it. But we assume that once the C++ function returns it does not touch that array again.
On Tue, 1 Jun 2021, 23:20 Cameron Christensen, @.***> wrote:
For the record, I have a large Python module that has zero-copy communication between Python and C++ when working with columns in a DataFrame. It is zero-copy both ways, i.e. Python >> C++ and C++ >> Python.
It is blazingly fast.
I usually combine it with OpenMP or TBB to do multi-threaded calculations on the column data.
It is all in pybind11 and Modern C++ (except for one raw pointer reference which is wrapped in a function; see above). It's easily testable, when the function is called from C++ is accepts a templated vector, and when it is called from Python it accepts a templated span.
The zero-copy C++ >> Python adapter is in my post above.
This is the zero-copy Python >> C++ adapter:
/**
- \brief Returns span
from py:array_T . Efficient as zero-copy. - \tparam T Type.
- \param passthrough Numpy array.
- \return Span
that with a clean and safe reference to contents of Numpy array. / template inline std::span passthroughPtr = static_cast<T*>(passthroughBuf.ptr); std::spantoSpan(const py::array_t & passthrough) { py::buffer_info passthroughBuf = passthrough.request(); if (passthroughBuf.ndim != 1) { throw std::runtime_error("Error. Number of dimensions must be one"); } size_t length = passthroughBuf.shape[0]; T passthroughSpan(passthroughPtr, length); return passthroughSpan; } This is great for sharing the raw data, but how does it handle ownership? It looks like the short answer is that it doesn't, but maybe I'm missing something. Thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pybind/pybind11/issues/1042#issuecomment-852515991, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ3FJHLSPLXCIF4OG3VKCDTQVMHHANCNFSM4DY367GQ .
@cchriste mentioned: This is great for sharing the raw data, but how does it handle ownership? It looks like the short answer is that it doesn't, but maybe I'm missing something. Thanks! Short answer: it doesn't, but that's fine as the parent Python function caller holds ownership for the duration of the call. Remember, this is the "zero-copy Python >> C++ adapter", so Python creates the Numpy array, C++ modifies the array contents, then returns. Here is an example scenario: Python creates a Numpy array, it is the owner. Python calls a method written in C++/pybind11. The C++ uses the
toSpan
method above to obtain a reference to this array. The C++ can then safely edit the contents of this array. The C++ returns. The Numpy array is now modified, without the overhead of copying the array's contents back and forth from Python to C++ to Python. This is really useful when modifying columns in a DataFrame. It would be possible to break this if we really wanted to. The C++ side could create another thread, and that thread could start modifying the array behind Python's back, even after the original function call had returned and the Python side had deallocated it. But we assume that once the C++ function returns it does not touch that array again. …
I appreciate the quick reply, and agree this is very useful. For our use case, we do in fact want to take ownership of the data.
Going from C++ to Python seems safe: memory buffers are tagged with an ownership flag and, after the last reference to that memory is removed, won't be freed unless owned. Thanks for your other example demonstrating a trick to claim ownership when creating arrays for which pybind11 should simply provide a more straightforward argument.
The other way around does not seem as straightforward. Even if some clever combination of PyObject_GetBuffer/PyObject_Release can be used to ensure Python doesn't delete memory out from under C++, if it's deleted by C++ then any existing Python objects will suddenly be pointing to deallocated space. Maybe if ownership transfer is achieved using a move (a py::array&
can be passed to C++, so it's possible to modify the object directly), and only if the reference count is exactly one, the desired goal can be achieved.
@cchriste For Python to C++, I imagine that if the C++ wanted to take ownership of the data, the easiest and safest way would be to make a copy. I imagine that's the only way to prevent Python garbage collecting that data once del variable
is executed on the Python side. Get it working first, then optimise it later.
You also mentioned:
if it's deleted by C++ then any existing Python objects will suddenly be pointing to deallocated space
... but the method above exposes the Numpy array as a span
which is read-only as far as memory allocation/deallocation goes, and can be range checked, which goes a long way towards making any subsequent C++ code more robust. The span container is actually quite nice like that, see comments on StackOverflow. I'd also recommend putting some comments in the code as insurance against other developers making changes without a clear understanding of the limitations.
This seems to be a good place to use a memoryview for holding onto the buffer instead of a capsule? #2307 is useful for invalidating the buffer once it has been released.
Actually, I think I misunderstood the problem, never mind. A memoryview might be useful in some of these cases however.
@virtuald I am also encountering this problem. As far as I understand, returning a memoryview
means "lend" my memory to a memoryview
, while returning an array
with a capsule
described in this thread means "move" my memory to an array
. I would prefer lending (or borrowing), because there is less black magic. I can keep my owner object alive using keep_alive
, which is equivalent to "moving", if the owner object is also exposed to PyBind11.
However, a memoryview
is not a NumPy object. It dose not support NumPy's arithmetic operations. Can I lend my memory to an array
, instead of a memoyview
? ~I found some of the array
's constructor support a borrowed
or stolen
parameter, but I did not find any document.~
I have figured it out. I can "lend" my data to an array
by passing it a capsule with an empty destructor.
not necessarily a pybind solution, but you could allocate the std::vector on the heap with new
, this way it won't get freed until you call delete, given that, it should be safe to use .data()
pointer as a pointer for the NumPy array
For the record, I have a large Python module that has zero-copy communication between Python and C++ when working with columns in a DataFrame. It is zero-copy both ways, i.e. Python >> C++ and C++ >> Python.
It is blazingly fast.
I usually combine it with OpenMP or TBB to do multi-threaded calculations on the column data.
It is all in pybind11 and Modern C++ (except for one raw pointer reference which is wrapped in a function; see above). It's easily testable, when the function is called from C++ is accepts a templated vector, and when it is called from Python it accepts a templated span.
The zero-copy C++ >> Python adapter is in my post above.
This is the zero-copy Python >> C++ adapter:
/** * \brief Returns span<T> from py:array_T<T>. Efficient as zero-copy. * \tparam T Type. * \param passthrough Numpy array. * \return Span<T> that with a clean and safe reference to contents of Numpy array. */ template<class T=float32_t> inline std::span<T> toSpan(const py::array_t<T>& passthrough) { py::buffer_info passthroughBuf = passthrough.request(); if (passthroughBuf.ndim != 1) { throw std::runtime_error("Error. Number of dimensions must be one"); } size_t length = passthroughBuf.shape[0]; T* passthroughPtr = static_cast<T*>(passthroughBuf.ptr); std::span<T> passthroughSpan(passthroughPtr, length); return passthroughSpan; }
Thanks for sharing the code. One thing to notice is that if T is a struct but not packed (i.e., std::is_class_v<T> && alignof(T) > 1
), this might lead to core dump on some machines. The reason is that when registering T to numpy dtype, it loses the alignement requirement of the dtype. One can simply check that by the assertion assert(py::dtype::of<T>.attr("alignment") == 1);
.
In this case, the alignment of the input buffer passthroughBuf.ptr
would be 1, which violates the alignment of T and triggers errors on some platforms.
If anyone's interested in a version of @ferdonline's utility function without explicit/manual
new
anddelete
:template <typename Sequence> inline py::array_t<typename Sequence::value_type> as_pyarray(Sequence &&seq) { auto size = seq.size(); auto data = seq.data(); std::unique_ptr<Sequence> seq_ptr = std::make_unique<Sequence>(std::move(seq)); auto capsule = py::capsule(seq_ptr.get(), [](void *p) { std::unique_ptr<Sequence>(reinterpret_cast<Sequence*>(p)); }); seq_ptr.release(); return py::array(size, data, capsule); }
Apart from avoiding
new
anddelete
, this also does not leak if for some reasonpy::capsule
would throw.
I was using this version for a while in a library, but recently I noticed it did not work anymore. It must something related to the compiler because I did not change the pybind11 version I was using (its commit is fixed a git submodule in my library). But the version of @sharpe5 works. The main difference seems to come from the constructor of py::array
, so a fix for @YannickJadoul seems to be:
template <typename Sequence>
inline pybind11::array_t<typename Sequence::value_type> as_pyarray(Sequence &&seq) {
auto size = seq.size();
auto data = seq.data();
std::unique_ptr<Sequence> seq_ptr = std::make_unique<Sequence>(std::move(seq));
auto capsule = pybind11::capsule(seq_ptr.get(), [](void *p) { std::unique_ptr<Sequence>(reinterpret_cast<Sequence *>(p)); });
seq_ptr.release();
return pybind11::array({size}, {sizeof(typename Sequence::value_type)}, data, capsule);
}
This is a question of documentation rather than an issue. I can't find any example of the following very common scenario:
I don't know the answer. It would be very nice to have this explained in docs since this scenario if rather common.