Add low-level features to improve speed and reduce memory use

JukkaL commented 4 years ago

For certain use cases mypyc is still much worse than an extension written in C (or Cython). Here are some low-level features that I believe could together improve various interesting use cases by a factor of 2 or more, in either runtime performance or memory use. (These are currently not a priority for the core team, but contributors are welcome to work on these.)

Fixed-width integers (implemented). Types such as i16, i32 and i64 (for 16-bit, 32-bit, or 64-bit integers, respectively) could be much faster than the arbitrary-precision int type we have -- we'd avoid overflow checks and reference count management. Memory use would also be lower for the smaller types. For convenience, integer literals would be implicitly coerced to these types based on type context. The main drawback is that on overflow, compiled semantics would differ, as these would be represented as normal integers in interpreted mode. (We can also support 32-bit and 64-bit floats as value types.)

Native floats (implemented). Store float values as unboxed C doubles instead of heap-allocated CPython float objects.

Packed arrays (in progress). Currently lists are inefficient when storing many small objects such as integers or floats. We could introduce a packed array type as a primitive type. It would resemble array.array and Java arrays. It would be parametrized by the item type at runtime (we can't use type erasure). For example, a packed array of int16 could be more memory efficient than List[int] by up to a factor of 16, I think. (NumPy arrays would also be interesting, but they would be harder to support.)

Record types. Native instances with only a few attributes have quite a lot of fixed overhead -- they require a pointer and an object header, and they are heap allocated. In some cases we can avoid this by using value record (or struct) types. A record type with two attributes would only require storage for the two attributes if used for an attribute of a native class or in a packed array. This would improve performance by requiring fewer allocations and less pointer dereferencing (locality would also be better). Memory use could be improved by a big factor in some cases. Consider a packed array of records with two int32 attributes -- a list of native instances could use up to 6x more memory. Here the main drawback is that these would be immutable, like other value types, to preserve compatibility with interpreted semantics.

Always defined attributes (implemented). Perform static analysis to determine attributes that are always defined (and not overridden in subclasses). These can be accessed using direct C struct access without extra checks. For pure value types, such as fixed-width integers and bools, attribute access won't require reference count manipulation either. To make this work, we can restrict the del statement to only allow attribute deletion via self. This would allow us to statically determine which attributes can ever be deleted.

I'll add more detail about these ideas in separate issues.

JukkaL commented 4 years ago

This seems to be a good implementation order:

Always defined attributes. These are widely useful and also help prevent runtime errors. Fixed-width integer attributes would be a bit pointless if we'd need to store even one extra "defined" bit, as a single bit can round up heap block size by 8 bytes. Record types work much better with always defined attributes.
Fixed-width integers. These are generally helpful alone and also work well with packed arrays and records. C compilers don't seem to deal well with code using arbitrary-precision integers.
Packed arrays. These have a potential for big gains and should be fairly simple to implement, but not all programs will benefit much.
Record types. These are more situationally useful and fairly complex.