Implementing individual, small math functions in C is a no-go; calling into C has an incredibly large overhead. The time saved in C must be greater than the
Jumping into C once and performing a bunch of math ops before returning control to Lua is very fast.
Implementing the same thing in pure Lua with ffi-backed data structures is just as fast.
C API is somewhat clumsy because userdata also has a huge overhead.
C is only optimal in the bulk-op case, how would it fit in a world where there are e.g. many scattered vectors with no data locality?
Around C boundaries, LuaJIT 2.0 aborts traces and 2.1 does stitched traces leading to performance degradation.
vector2 backed by cdef is the same speed as pure Lua in the trivial case and much faster for many operations over multiple vectors
cdef allows users to instantiate dense arrays of vector2 which is useful for data-oriented designs