Closed datnamer closed 6 years ago
Indeed, new kernels can be dynamically inserted into the lookup table. One of the goals is to add kernel jit compilation + insertion to Numba.
Function composition is a bit of an orthogonal issue: For jit compiled functions it seems less useful (just write a new function), for precompiled C kernels it could be added, but making it fast without temporary xnd containers is not trivial.
A couple of points about PEP 484:
1) Mypy is more of a powerful linter than a real type checker. We need 100% accuracy, which mypy does not provide.
2) Datashape is far more suitable for low-level C types than PEP 484, which does not mention them at all.
3) Datashape types include array sizes, which makes them dependent types. Static array bounds checking in general is undecidable, so mypy cannot help, even if PEP 484 did include C types.
More points about PEP 484:
4: Datashape types (ndt_t
) contain full memory access information, including alignment. They are used to traverse memory much in the way a compiler generates code based on the types of data structures. Thus, the types need to be a) in C to be fast and b) not bloated so the code is readable.
Thanks for your replies.
Sure. The other problem, aside from standards proliferation, is that many real programs have abstraction and type hierarchy, rather than just a collection of functions.
For example, the distribution hierarchy in Pymc3.
I don't see how one can do fast type hierarchy programming (use dispatch to organize code on type lattice) with ndtypes and gumath. It seems like somehow interfacing with pep 484 ( and later typing peps that include variable annotations, not mypy) is one way to do this.
Regarding the obstacles you mentioned:
Same with a bayesian package with functions over "abstractparametricdistribution" which I will call with "myconcretedistribution".
I understanding you aren't re-implementing an object oriented type system, I'm just giving feedback on how this would work with my uses-cases where I'd want to use it in the host language.
Gufuncs are multimethods. If f()
should take {int16, int32, ...}
, one has to add kernels for all signatures.
Datashape also had the Signed
kind for all signed integers, so if a function has that signature, Numba could possibly generate kernels on the fly.
I'm not sure what you mean by "fast type hierarchy programming". Selecting the multimethod is done purely by switch statements on the C level, using __annotations__
on the Python level should be much slower than that.
If people insist on shoehorning datashape into the (IMO bulky) PEP 484 syntax, this is a start:
from ndtypes import *
class NdtInt(object):
def __init__(self):
self.t = ndt("Signed")
class NdtInt64(object):
def __init__(self):
self.t = ndt("int64")
class NdtTuple(object):
def __init__(self, *args):
s = '(' + ', '.join(str(x.t) for x in args) + ')'
self.t = ndt(s)
But I don't think that such a syntax is appropriate for huge nested types like the Lahman database:
http://matthewrocklin.com/blog/work/2014/11/19/Blaze-Datasets
It's not, but I'm not talking about a data use case, which is already perfectly defined. How would the user write a custom type? How would you deal with the distribution example I provided or defining gufuncs on something like the pymc type hierarchy : https://github.com/pymc-devs/pymc3/blob/master/pymc3/distributions/continuous.py
Maybe this sort of thing is not in scope of these packages, in which case this issue can be closed.
@datnamer Here's a concrete example for defining new functions and types:
This is great.
We could also use examples of how to define a new primitive type -- like a bfloat8 or fixed width float.
Travis
On Fri, Mar 30, 2018, 3:33 PM Stefan Krah notifications@github.com wrote:
@datnamer https://github.com/datnamer Here's a concrete example for defining new functions and types:
4 https://github.com/plures/gumath/issues/4
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/plures/gumath/issues/1#issuecomment-377616627, or mute the thread https://github.com/notifications/unsubscribe-auth/AAPjoGgAZPSUaTzGTbxRyZX2uVem_48qks5tjpavgaJpZM4SQM6X .
Can the map function be overloaded with a kernel from a language like numba? It seems that function composition is a more general form of broadcasting and map.
Also, can kernels be selected based on other host type systems besides ndtypes? @sklam talked about translation, but it seems like 484, protocols etc have more granularity than ndypes, so you'd be losing data.
Otherwise we have yet another array type system in python, when there is some talk on standardizing on mypy stuff: https://github.com/python/typing/issues/516