Open allefeld opened 4 years ago
There are mgrid
and ogrid
that create arrays using "slice-like" syntax, i.e [start:end:step]
, though they are still exclude the endpoint so are not quite the same:
>>> np.mgrid[0:1:0.1]
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
AFAIK there is no option to change whether the end point is included nor any particular special attention paid to floating point error, so it's not exactly the same, but it exists just FYI.
There are
mgrid
andogrid
yes, they are basically multidimensional equivalents of arange
, so they suffer from the same problem.
so they suffer from the same problem.
Just to clarify - the problem you're referring to in this case is the fact that the endpoint is not included, correct? Or is it related to the syntax?
Yes, exactly.
There is a fun, rarely known and maybe a bit ugly, hack: np.ogrid[1:2:10j]
will give you linspace behaviour. Of course using that without a hardcoded step would be evil IMO, but if you must use shorthands...
The other option is using np.ix_
as a shorthand for adding dimensions like ogrid
, etc.
To be honest, I am not sure I like any of these options (i.e. whether I would want to teach them to new users mainly). I think discussing such an addition or proliferation of such a syntax would be nice to start on the mailing list though.
I did once had a PR to add np.linspace(..., step=0.1)
, but... Either that ends up super unsafe, or you need to be super strict about getting the step right, at which point it is only reasonable to use with a hardcoded step, not a calculated one. So I buried it, because I was not convinced its worth the trouble and enough need exists...
Another option might be to add an endpoint
kwarg to arange
(there is one in linspace
) to achieve the desired behavior, though I seem to remember comments from other discussions expressing dislike for using arange
for this purpose anyway.
@seberg
but if you must use shorthands
I would like to quote the Zen of Python: "explicit is better than implicit". It's not about a shorthand, but about writing what I mean. I don't really want 101 points, I want steps of 0.01.
Either that ends up super unsafe, or you need to be super strict about getting the step right, at which point it is only reasonable to use with a hardcoded step, not a calculated one.
I meant to have addressed that in the first comment. I agree it is hard to get right, but the good folks at Mathworks seem to have successfully done so.
The ogrid
trick is interesting to know, but doesn't really address my request, because it is still "number of points"-based.
I don't like to subscribe to mailing lists because they are hard to manage (filters never work 100%) in my experience, the issues interface is so much nicer. But if that is the way things proceed in NumPy, ok.
@rossbar, I would prefer not to add this as an option to arange
or some other existing function, because internally it is not an extension of its functionality (even though it looks like that). The values from arange
will not in general be numerically identical to the first n – 1 values from the quoted algorithm.
@allefeld, I am not too worried about getting the points right, once you figured out the number of steps (but then I assume linspace does so). But what do you do if I use 1:2:0.13
which just does not work from a linspace perspective where the end-point should be included (or things should be symmetric for -1:1:0.13
@seberg the intended behavior of the algorithm – in a world of infinite precision – is to include all 1 + 0.13 * i
which are smaller than or equal to 2. Comment from the code:
% Ideally, in exact arithmetic, for b > a and d > 0,
% v = a:d:b should be the vector of length n+1 generated by
% v = a + (0:n)*d where n = floor((b-a)/d).
In this case, the intended result is
1, 1.13, 1.26, 1.39, 1.52, 1.65, 1.78, 1.91
That means that if the step is an integer fraction of the interval length, the endpoint is one of the values, and otherwise it isn't.
With floating point arithmetic the problem is that, if the step is (intended as) an integer fraction of the interval length, e.g. 0.1, the value closest to the 2 might be slightly larger or slightly smaller than 2 due to rounding error. That is fixed by transforming the step into a number of points n = round((b-a)/d)
.
The tricky bit is to figure out whether "the step is intended as an integer fraction of the interval length". The Mathworks algorithm decides that on the basis of whether the error at the end is smaller than
tol = 2.0*eps*max(abs(a),abs(b))
where eps
is the difference between 1 and the next larger number that can be represented as a double precision floating point number.
We then still have the issue that the last value, in this case 1 + 0.1 * n
, will not be exactly 2. The Mathworks algorithm fixes that by creating the first half of the values upwards from 1, and the second half downwards from 2. This makes the floating point error about half as large on average, and ensures that the values are symmetrically distributed over the interval.
Actually I used the "non-integer fraction" behavior as a hack in Matlab in case I didn't want to include the endpoint. E.g. 1:0.1:1.99
. That is kind of the reverse ugliness, and something that is as straightforward as it should be in NumPy. In my mind, having arange
and my proposed irange
would be the best of both worlds.
Well maybe @eric-wieser had a nice proposal (I think he was arguing this stuff before), this must have been discussed many times over the years.
I just do not think we can fix arange
, so yes irange
is maybe the best of both worlds. I am not immediately clear that the best of both worlds is actually very good, but I tend to be pessimistic :). I.e. it likely has worse properties then linspace
?
Long story short. Likely others will be convinced, and than I will be happy even if I am not, but I expect someone will have to flesh out the proposal a bit more and in either case write to the mailing list to get some support and get it rolling...
I think we need to at spend some effort collecting together old bug reports on this, because we keep repeating ourselves in addressing them.
@allefeld, I think a lot of the responses you got here ended up being about the syntactic sugar of the matlab :
operator, when really all you cared about was the behavior of the function colon(a, b, c)
. Sorry about that.
Coming to Python/NumPy from Matlab, I'm missing something like Matlab's three-argument colon-operator:
In Matlab, I used this operator routinely. It is especially useful when creating a range of values for plotting a function, e.g.
x = -1:0.1:1; plot(x, x.^2)
.NumPy's alternatives for this are
linspace
andarange
. However, the former makes it hard to hit specific values (in the plotting example, I have to figure out that there are 21 values), and the latter excludes the stop value. I get thatarange
follows the behavior ofrange
, but for the mentioned application which has nothing to do with indexing it is just impractical.I also understand that a naive implementation of the colon operator runs into floating point problems. However, in many years of using Matlab I never ran into a situation where it didn't do the expected thing. I now managed to find an archived blog post that explains part of their implementation, with an attached complete implementation. I tested it, and as far as I can see it does exactly what the colon operator does.
Would you be open to including an implementation of this algorithm in NumPy? It could be called
colon
to make it easy for people switching, orirange
where thei
stands for "inclusive". For example with a signatureirange(start, stop, step=1)
.