Closed nbecker closed 7 years ago
I seem to have broken something, now it segfaults
Correct me if I'm wrong, but I guess that line 35 you want to take all elements along the i-th axis if it is not present in the axes parameter ?
In that case, the correct code would be
if (std::find (axes.begin(), axes.end(), i) != axes.end())
sv.push_back(xt::newaxis());
else
sv.push_back(xt::all());
After fixing that (thanks!) and retesting with:
from timeit import timeit
u = np.ones ((10, 100000))
v = np.ones ((100000, 10))
for x in ('u', 'v'):
for l in ('logsumexp', 'logsumexp2'):
print (l + '(%s, 1,))'%x, timeit (l + '(%s, (1,))'%x, 'from __main__ import %s, %s'%(l,x), number=100))
print (l + '(%s, 0,))'%x, timeit (l + '(%s, (0,))'%x, 'from __main__ import %s, %s'%(l,x), number=100))
I get
logsumexp(u, 1,)) 16.155551936000023
logsumexp(u, 0,)) 2.8450079509999853
logsumexp2(u, 1,)) 1.2417155439999874
logsumexp2(u, 0,)) 1.5459593610000013
terminate called after throwing an instance of 'pybind11::error_already_set'
what(): SystemError: <built-in method __contains__ of dict object at 0x7f205b71af48> returned a result with an error set
Aborted (core dumped)
So 2 issues:
The error is due to a huge memory allocation attempt: 100k x 100k, when calling logsumexp(v, 1)
. Here is a decomposition of the last lines of logsumexp
with temporaries and their shapes:
auto max2 = xt::dynamic_view(max, sv); // shape = { 100000, 1}
xt::pyarray<value_type> tmp1 = xt::exp(e1 - max2); // shape = { 100000, 10}
xt::pyarray<value_type> tmp2 = xt::sum(tmp1, axes); // shape = { 100000 }
xt::pyarray<value_type> tmp3 = xt::log(tmp2); // shape = {100000}
xt::pyarray<value_type> res = max2 + tmp3; // shape = { 100000, 100000}
The shape of res
is the result of broadcasting { 100000, 1}
with { 100000}
.
Replacing max2
with max
in the last line should fix this problem.
I'm confused. The purpose of introducing max2 was that max can't be used, because it won't in general broadcast against e1. numpy amax has a keepdims keyword, which is what I'd need here (but xt::amax lacks), so I had to simulate it by introducing max2 which puts back the missing dimensions (with size==1).
That large allocation isn't supposed to happen. Am I doing this all wrong? The whole objective here was to write the equivalent of scipy.misc.logsumexp https://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.misc.logsumexp.html
Your usage of max2
to compute exp(e1 - max2)
is correct, and you're right on this: in the general case, e1
and max
won't broadcast. Indeed, max
is a reduction of e1
across specified axes. Thus max2
is still required to compute tmp1
. At this stage of the computation, tmp1
and e1
have the same shape, that is, { 100000, 10}
.
Now, when you compute tmp2
, you perform the same type of reduction (i.e. across the same axes) on tmp1
that you did on e1
to get max
. That means tmp2
and max
will have the same shape, i.e. {100000}
.
Then tmp3
has the same shape as tmp2
since no broadcasting operation is performed.
At this stage, we have:
tmp3
, 1-D expression with shape { 100000 }
max2
, 2-D expression with shape { 100000, 1}
max
, 1-D expression with shape { 100000 }
Both tmp3 + max2
and tmp3 + max
are valid expressions, however the first one ends up with a 2-D expression with shape { 100000, 100000}
, the last one ends up with a 1-D expression with shape { 100000}
, which is what you want.
Here are the results of the benchmarks (not the same number of iterations) on my laptop:
logsumexp(u, 1,)) 2.8467595061728397
logsumexp(u, 0,)) 3.709872592592592
logsumexp2(u, 1,)) 1.3747942716049382
logsumexp2(u, 0,)) 1.5532693333333336
logsumexp(v, 1,)) 3.6790692345679012
logsumexp(v, 0,)) 3.1909791604938285
logsumexp2(v, 1,)) 2.092781827160497
logsumexp2(v, 0,)) 1.6343920987654315
I start to investigate on the performance issue.
That's much better! Here are my results: logsumexp(u, 0,)) 2.7423938450001515 logsumexp2(u, 0,)) 1.568801411000095 logsumexp(u, 1,)) 2.0876657210001213 logsumexp2(u, 1,)) 1.3879509569997026 logsumexp(v, 0,)) 2.3853387870003644 logsumexp2(v, 0,)) 1.608310051999979 logsumexp(v, 1,)) 2.6268803089997164 logsumexp2(v, 1,)) 2.058452197999941
On Wed, May 3, 2017 at 11:28 AM Johan Mabille notifications@github.com wrote:
Here are the results of the benchmarks (not the same number of iterations) on my laptop:
logsumexp(u, 1,)) 2.8467595061728397 logsumexp(u, 0,)) 3.709872592592592 logsumexp2(u, 1,)) 1.3747942716049382 logsumexp2(u, 0,)) 1.5532693333333336 logsumexp(v, 1,)) 3.6790692345679012 logsumexp(v, 0,)) 3.1909791604938285 logsumexp2(v, 1,)) 2.092781827160497 logsumexp2(v, 0,)) 1.6343920987654315
I start to investigate on the performance issue.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/QuantStack/xtensor-python/issues/88#issuecomment-298946276, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHK0IEYdRN5fF294CnHGKgKgUDUQjKDks5r2J0BgaJpZM4NI359 .
I reordered the tests for easier comparison
u = np.ones ((10, 100000))
v = np.ones ((100000, 10))
for x in ('u', 'v'):
for axis in (0, 1):
for l in ('logsumexp', 'logsumexp2'):
print (l + '(%s, %s,))'%(x, axis), timeit (l + '(%s, (%s,))'%(x, axis), 'from __main__ import %s, %s'%(l,x), number=100))
I've also been trying to learn about how to do this in Julia, and have some interesting results https://discourse.julialang.org/t/apply-reduction-along-specific-axes/3301/15?u
On Wed, May 3, 2017 at 11:28 AM Neal Becker ndbecker2@gmail.com wrote:
That's much better! Here are my results: logsumexp(u, 0,)) 2.7423938450001515 logsumexp2(u, 0,)) 1.568801411000095 logsumexp(u, 1,)) 2.0876657210001213 logsumexp2(u, 1,)) 1.3879509569997026 logsumexp(v, 0,)) 2.3853387870003644 logsumexp2(v, 0,)) 1.608310051999979 logsumexp(v, 1,)) 2.6268803089997164 logsumexp2(v, 1,)) 2.058452197999941
On Wed, May 3, 2017 at 11:28 AM Johan Mabille notifications@github.com wrote:
Here are the results of the benchmarks (not the same number of iterations) on my laptop:
logsumexp(u, 1,)) 2.8467595061728397 logsumexp(u, 0,)) 3.709872592592592 logsumexp2(u, 1,)) 1.3747942716049382 logsumexp2(u, 0,)) 1.5532693333333336 logsumexp(v, 1,)) 3.6790692345679012 logsumexp(v, 0,)) 3.1909791604938285 logsumexp2(v, 1,)) 2.092781827160497 logsumexp2(v, 0,)) 1.6343920987654315
I start to investigate on the performance issue.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/QuantStack/xtensor-python/issues/88#issuecomment-298946276, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHK0IEYdRN5fF294CnHGKgKgUDUQjKDks5r2J0BgaJpZM4NI359 .
Indeed the implementation in Julia seems really performant.
I've worked on reducers, could you try with the branch performance of my fork ? On my side I have the following numbers:
master branch:
logsumexp(u, 0,)) 3.709872592592592
logsumexp2(u, 0,)) 1.5532693333333336
logsumexp(u, 1,)) 2.8467595061728397
logsumexp2(u, 1,)) 1.3747942716049382
logsumexp(v, 0,)) 3.1909791604938285
logsumexp2(v, 0,)) 1.6343920987654315
logsumexp(v, 1,)) 3.6790692345679012
logsumexp2(v, 1,)) 2.092781827160497
performance branch:
logsumexp(u, 0,)) 3.015181827160494
logsumexp2(u, 0,)) 1.4681777777777776
logsumexp(u, 1,)) 2.451084641975309
logsumexp2(u, 1,)) 1.3026453333333334
logsumexp(v, 0,)) 2.7342716049382716
logsumexp2(v, 0,)) 1.7015956543209878
logsumexp(v, 1,)) 3.1623810370370364
logsumexp2(v, 1,)) 2.248043851851852
We're still slower but this seems to go in the right direction.
Sorry, my git-fu isn't up to it. I tried: git clone https://github.com/QuantStack/xtensor.git xtensor.jm Cloning into 'xtensor.jm'... remote: Counting objects: 4504, done. remote: Compressing objects: 100% (47/47), done. remote: Total 4504 (delta 19), reused 0 (delta 0), pack-reused 4451 Receiving objects: 100% (4504/4504), 1.58 MiB | 0 bytes/s, done. Resolving deltas: 100% (3096/3096), done. Checking connectivity... done. [nbecker@nbecker2 ~]$ cd xtensor.jm [nbecker@nbecker2 xtensor.jm]$ git fetch https://github.com/JohanMabille/xtensor.git From https://github.com/JohanMabille/xtensor
But I don't see any sign of a performance branch, so I guess this wasn't correct.
On Thu, May 4, 2017 at 12:02 PM Johan Mabille notifications@github.com wrote:
Indeed the implementation in Julia seems really performant.
I've worked on reducers, could you try with the branch performance of my fork ? On my side I have the following numbers:
master branch:
logsumexp(u, 0,)) 3.709872592592592 logsumexp2(u, 0,)) 1.5532693333333336 logsumexp(u, 1,)) 2.8467595061728397 logsumexp2(u, 1,)) 1.3747942716049382 logsumexp(v, 0,)) 3.1909791604938285 logsumexp2(v, 0,)) 1.6343920987654315 logsumexp(v, 1,)) 3.6790692345679012 logsumexp2(v, 1,)) 2.092781827160497
performance branch:
logsumexp(u, 0,)) 3.015181827160494 logsumexp2(u, 0,)) 1.4681777777777776 logsumexp(u, 1,)) 2.451084641975309 logsumexp2(u, 1,)) 1.3026453333333334 logsumexp(v, 0,)) 2.7342716049382716 logsumexp2(v, 0,)) 1.7015956543209878 logsumexp(v, 1,)) 3.1623810370370364 logsumexp2(v, 1,)) 2.248043851851852
We're still slower but this seems to go in the right direction.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/QuantStack/xtensor-python/issues/88#issuecomment-299230785, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHK0M_DLAP4VQKJuSdDF4EeX3Y0SpoRks5r2fZ-gaJpZM4NI359 .
If you already cloned quantstack/xtensor, you don't have to clone my fork, you can just add my repo as a new remote and checkout my branch:
cd xtensor
git remote add jm https://github.com/JohanMabille/xtensor.git
git fetch jm
git checkout performance
After that, you can run git branch
or gitk
to check the current branch is correct.
I did rebase of your performance change to xtensor onto master, which applied cleanly. But my code doesn't compile:
/home/nbecker/.local/include/xtensor-python/pycontainer.hpp:80:78: error:
'typename xt::pycontainer<xt::pyarray<double> >::iterable_base {aka
xt::xiterable<xt::pyarray<double> >}::broadcast_iterator' names
'template<enum class xt::layout_type L> using broadcast_iterator =
xt::xconst_iterable<xt::pyarray<double> >::broadcast_iterator<L>', which is
not a type
using broadcast_iterator = typename
iterable_base::broadcast_iterator;
On Fri, May 5, 2017 at 7:29 AM Johan Mabille notifications@github.com wrote:
If you already cloned quantstack/xtensor, you don't have to clone my fork, you can just add my repo as a new remote and checkout my branch:
cd xtensor git remote add jm https://github.com/JohanMabille/xtensor.git git fetch jm git checkout performance
After that, you can run git branch or gitk to check the current branch is correct.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/QuantStack/xtensor-python/issues/88#issuecomment-299441435, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHK0JUyOFwJuuNde6BrJgXqQyexApsrks5r2wflgaJpZM4NI359 .
You need to update xtensor-python, it looks like your version is not 0.11.1
OK, but I needed:
*** /home/nbecker/xtensor-python.hg/include/xtensor-python/pycontainer.hpp
2017-05-05
07:47:15.893865148 -0400
--- /home/nbecker/.local/include/xtensor-python/pycontainer.hpp 2017-05-05
07:54:52.351386557 -0400
***************
*** 161,167 ****
template <class D>
inline pycontainer<D>::pycontainer(const pybind11::object& o)
! : pybind11::object(raw_array_t(o.ptr()), pybind11::object::stolen)
{
if (!this->m_ptr)
{
--- 161,167 ----
template <class D>
inline pycontainer<D>::pycontainer(const pybind11::object& o)
! : pybind11::object(raw_array_t(o.ptr()), stolen_t{})
{
if (!this->m_ptr)
{
On Fri, May 5, 2017 at 7:44 AM Johan Mabille notifications@github.com wrote:
You need to update xtensor-python, it looks like your version is not 0.11.1
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/QuantStack/xtensor-python/issues/88#issuecomment-299444073, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHK0MZ2fi3z5NF04r1AoPAHTcR6-iptks5r2wu5gaJpZM4NI359 .
And I can't run the logsumexp test, I get segfault on the simplest test python3 Python 3.5.3 (default, Apr 24 2017, 13:32:13) [GCC 6.3.1 20161221 (Red Hat 6.3.1-1)] on linux Type "help", "copyright", "credits" or "license" for more information.
from xtensor_test.logsumexp import logsumexp import numpy as np u = np.ones(4) v = logsumexp(u) Segmentation fault (core dumped)
On Fri, May 5, 2017 at 7:57 AM Neal Becker ndbecker2@gmail.com wrote:
OK, but I needed:
*** /home/nbecker/xtensor-python.hg/include/xtensor-python/pycontainer.hpp 2017-05-05 07:47:15.893865148 -0400 --- /home/nbecker/.local/include/xtensor-python/pycontainer.hpp 2017-05-05 07:54:52.351386557 -0400 *************** *** 161,167 **** template <class D> inline pycontainer<D>::pycontainer(const pybind11::object& o) ! : pybind11::object(raw_array_t(o.ptr()), pybind11::object::stolen) { if (!this->m_ptr) { --- 161,167 ---- template <class D> inline pycontainer<D>::pycontainer(const pybind11::object& o) ! : pybind11::object(raw_array_t(o.ptr()), stolen_t{}) { if (!this->m_ptr) {
On Fri, May 5, 2017 at 7:44 AM Johan Mabille notifications@github.com wrote:
You need to update xtensor-python, it looks like your version is not 0.11.1
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/QuantStack/xtensor-python/issues/88#issuecomment-299444073, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHK0MZ2fi3z5NF04r1AoPAHTcR6-iptks5r2wu5gaJpZM4NI359 .
Problem fixed on performance branch, the fixed will be merged in master soon.
Performance have been improved in xtensor 0.10.1. Thanks again for reporting!
Test files are here:
https://gist.github.com/nbecker/13c6ca2869d8b5266b6a1e5ae64e9806
This is not much different than previous benchmark. The actual benchmark is at the end of test_logsumexp.py