Open scott-griffiths opened 11 months ago
Probably should be called astype
to copy numpy.
The numpy method has a casting
parameter which can be one of:
‘no’ means the data types should not be cast at all. [Not sure what the point of this option is!] ‘equiv’ means only byte-order changes are allowed. [Reasonable I guess] ‘safe’ means only casts which can preserve values are allowed. [Only widening casts or unsigned to signed?] ‘same_kind’ means only safe casts or casts within a kind, like float64 to float32, are allowed. [ ‘unsafe’ means any data conversions may be done.
From experimentation, if it doesn't have room to store the full value it simple truncates the binary representation, so for example an int of 2000 becomes an uint8 of 208, which is not exactly obvious or helpful (but admittedly will be fast!)
If you ask for safe
casting it just exits with a TypeError
.
Maybe our options should be:
clip
- values that are too large get clipped to the nearest representable value.
safe
- If values can't be preserved a ValueError is raised (but it still tries).
The others are more checks on the dtypes, rather than the data, which the user can easily do themselves. If there are two options that boils down to a flag:
clip
: If True
out of range values are clipped to the nearest representable value, otherwise a ValueError will be raised. Defaults to False
.
Which is back to where we started.
It might be cool to allow the clip to happen as a function call. This would allow it to be used more widely, for example when performing other ops on Arrays. Right now it's hard to add a flag to a y = x*5
command, and y = Array.multiply(x, 5, clip=True)
is pretty ugly. Not sure how it actually works in practice though.
a = b*1000 # Throws a ValueError
a = clip(b*1000) # Magically doesn't and clips instead. Somehow.
Perhaps better would be (b*1000).clip()
, but I it's not obvious how it can be implemented.
If we could, the astype would be just c = b.astype('u8').clip()
with Array.Clipping:
a = b*1000
is perhaps more obvious and easier to actually code.
astype
method added in 4.1.2. No alternative casting methods yet, so leaving this open.
Changing the dtype of an Array just changes the interpretation of the underlying data. This is fine, and is a O(1) operation which fits with changing a property, but some users might want or expect it to recast the data to the new type.
To cast to a new dtype you need to do this:
which is OK, and explicit, but adding a new method could make it clearer and give more options:
I don't think it's good to do it in place - there's no performance gain. We can now also deal with things like overflows better:
so the user can choose whether to get a
ValueError
or to clip values or whatever (divide by zero would be another one).