Open JukkaL opened 4 years ago
This seems to be a good implementation order:
Issue tracking always defined attributes: #836 (with more detail)
We should also update the status for other items. For example, fixed-width integers and record types are now supported to some degree.
Yeah, I'll write more about some of the other items. Here's an issue about fixed-width integers with some more detail: #837
I've started prototyping some of these and the implementation of always defined attributes, fixed-width integers and packed arrays doesn't seem very difficult (but it's still quite a lot of work). I'd probably postpone record types until later.
Packed arrays, in particular, seem important if we want to make mypyc faster while adding more optimizations. We can probably make dataflow analyses a lot faster if we have fixed-width integers and packed arrays, in particular. Similarly, if we want to implement a native back end, packed arrays are going to be pretty critical for high compilation speeds, I think.
My thoughts about packed arrays: #840
More details about record types: #841
For certain use cases mypyc is still much worse than an extension written in C (or Cython). Here are some low-level features that I believe could together improve various interesting use cases by a factor of 2 or more, in either runtime performance or memory use. (These are currently not a priority for the core team, but contributors are welcome to work on these.)
Fixed-width integers (implemented). Types such as
i16
,i32
andi64
(for 16-bit, 32-bit, or 64-bit integers, respectively) could be much faster than the arbitrary-precisionint
type we have -- we'd avoid overflow checks and reference count management. Memory use would also be lower for the smaller types. For convenience, integer literals would be implicitly coerced to these types based on type context. The main drawback is that on overflow, compiled semantics would differ, as these would be represented as normal integers in interpreted mode. (We can also support 32-bit and 64-bit floats as value types.)Native floats (implemented). Store
float
values as unboxed C doubles instead of heap-allocated CPythonfloat
objects.Packed arrays (in progress). Currently lists are inefficient when storing many small objects such as integers or floats. We could introduce a packed array type as a primitive type. It would resemble
array.array
and Java arrays. It would be parametrized by the item type at runtime (we can't use type erasure). For example, a packed array ofint16
could be more memory efficient thanList[int]
by up to a factor of 16, I think. (NumPy arrays would also be interesting, but they would be harder to support.)Record types. Native instances with only a few attributes have quite a lot of fixed overhead -- they require a pointer and an object header, and they are heap allocated. In some cases we can avoid this by using value record (or struct) types. A record type with two attributes would only require storage for the two attributes if used for an attribute of a native class or in a packed array. This would improve performance by requiring fewer allocations and less pointer dereferencing (locality would also be better). Memory use could be improved by a big factor in some cases. Consider a packed array of records with two
int32
attributes -- a list of native instances could use up to 6x more memory. Here the main drawback is that these would be immutable, like other value types, to preserve compatibility with interpreted semantics.Always defined attributes (implemented). Perform static analysis to determine attributes that are always defined (and not overridden in subclasses). These can be accessed using direct C struct access without extra checks. For pure value types, such as fixed-width integers and bools, attribute access won't require reference count manipulation either. To make this work, we can restrict the del statement to only allow attribute deletion via
self
. This would allow us to statically determine which attributes can ever be deleted.I'll add more detail about these ideas in separate issues.