pygae / clifford

Geometric Algebra for Python
http://clifford.rtfd.io
BSD 3-Clause "New" or "Revised" License
754 stars 72 forks source link

Can we change `MultiVector(layout, arr)` to no longer copy `arr`? #282

Open eric-wieser opened 4 years ago

eric-wieser commented 4 years ago

Almost all of our operations create an extra copy (see #280). The fix proposed there is to opt-in for copy-free construction (spelt layout.MultiVector(value, copy=False)). However, it seems that every single construction within clifford would want this optin, which would make it seem better as a default.


If we made not copying the default, there are two types of code that would start doing the wrong thing:

Both of these are easy to fix by using buffer.copy() instead of buffer, but:

Does the benefit of not having copy=False sprinked throughout tools justify the cost of the potential breakage here?

arsenovic commented 4 years ago

what are the performance benefits?

eric-wieser commented 4 years ago

20% less time spent on multiplication, 30% less time on addition. Possibly a little less, since the benchmark in that PR folds in another change too.

But remember, the question isn't whether we want these benefits- it's whether we want user (or clifford.tools) code to automatically get the same benefit, or if we want to force that code to opt in by making this change everywhere:

-layout.MultiVector(my_optimized_function(...))
+layout.MultiVector(my_optimized_function(...), copy=False)
arsenovic commented 4 years ago

right. but the motivation behind this change is performance, correct?

eric-wieser commented 4 years ago

That's one way of looking at it, We can get the performance boost either by:

  1. Adding copy=False on 75 lines of our code, and telling users to do the same to their code. If they don't listen, all that happens is that user code is slower.
  2. Changing the behavior to not copy by default, and warning users via our docs that the behavior has changed. If they don't listen, then their code might not do the right thing any more.

My hunch is that the number of users using layout.MultiVector(some_array) is approximately zero anyway, so the only point that matters is that choosing 1 over 2 saves us some code.

arsenovic commented 4 years ago

right. users who are constructing MV's manually like this probably care about performance, whould you agree? in this case i think breaking things is ok, and putting a warning in docs is good.

hugohadfield commented 4 years ago

Yeah I agree, by far the main reason to be manually constructing multivectors in this way is for performance and in that case making the copy = false default makes sense

hugohadfield commented 4 years ago

Here are some reasons why I (and hence maybe users) build multivectors manually:

I don't know if any of those scenarios are likely to cause trouble

hugohadfield commented 4 years ago

Having thought about this, I think the copy=True default makes more sense. I think it is important to remember our position in the GA software ecosystem and goals of the project. In most of the places I have seen Clifford in use our users are students (eg. in Brno) or mathematicians/applied scientists who are getting started with GA and want everything to just work and want to play around with a few algebras etc, not write super performant code (at first anyway). If we change it to not copy by default we introduce a higher probability of writing bugs, especially for inexperienced users.

Of course we would in general love to have fast code. Users who want higher performance for some bits of their code can change to using copy=False for those sections. In doing this they make a conscious decision and, hopefully, will be aware of the safety tradeoff inherent in their choice.

So overall on reflection I think we should just add the sprinklings of copy=False into the tools, its not a big pain for us at all, but a new user getting behaviour they don't expect is definitely something that would sit badly with them.