Open shashi opened 3 years ago
I cursorily looked at what would be required to do this, it seems rather complicated, could you ramble a bit about what would be a good approach?
Ideally we would not have closures, but the Eval mechanism seems to require it. What are the alternatives?
I'd suggest defining callable struct types, to which you can pass type info to make them construct the correct expressions.
Yes that seems like a good approach! It wouldn't require much fundamental change to the current code.
I take that back, there's too much information in the intermediate representation, I am not sure if it can go into the type very easily. But it might be possible to use the same trick as RuntimeGeneratedFunctions.jl itself where you store a dictionary of some kinda function ID to the function body (in this case that could be the intermediate representation of Tullio) and then generate the body when a struct containing only the ID in the type parameter is called.
So, what if we took some inspiration from the way that broadcasting is implemented. After all, it seems to work somewhat nicely and it is fully compatible with generated functions. E.g. if I write
@tullio C[i, j] :=A[i, k] * B[k, j] + d[j]
we could macro-expand this to something like
begin
local _1 = contracted(*, +, along(A, Dim{2}), along(B, Dim{1})
local _2 = casted(+, along(_1, Dim{2}), along(d, Dim{1}))
C = materialize(_2)
end
and then materialize
can do all the fusion and threading and whatnot on lazy graphs that are generated by casted
and contracted
.
It could potentially even share an IR with Transducers.jl and/or LoopVectorization.jl, since all three of these packages are really concerned with the same thing: modelling loops.
And KernelAbstractions.jl
Cool representation. There were Map Reduce and Idx types in in ArrayMeta which accomplished the same. But I like doing something similar to broadcast.
I think practically speaking this may be pretty harsh on compilation so I think we should support it as an option, I'm sure @yingboma would agree haha.
Implementation wise I think it's a lot easier if the generated function had access to the dd
which means we need to write something that can construct it from these types (and from Expr). That would make the rest of it pretty easy. The problem is though that currently the metadata heavily assumes we are dealing with nice little Symbols. It will be a major surgery but not impossible or deadly.
@vchuravy @mcabbott @chriselrod @tkf let me know if all 3 of you have any interest in standardizing something like this, or alternatively have better ideas/objections.
There are a lot of options that @tullio
takes! One way to represent at least the Boolean options is probably a Union of types representing the options that are turned on, that would generate the right number of functions.
Also cc @Wimmerer
There was a similar discussion with @Wimmerer and @mcabbott in #114 (although focusing more on sparse arrays side).
I agree with @MasonProtter that representation like https://github.com/mcabbott/Tullio.jl/issues/129#issuecomment-962525490 would be nice. Kind of like "more advanced" Base.Broadcast
facility. I've brought up idea somewhat similar in #114. @mcabbott pointed out it was a bit trickier:
The reason Tullio doesn't do that is that it wants to allow e.g.
A[i+1] - A[i-1]
, for which it shifts & intersects these axes, so the iteration space doesn't correspond to any of the originals.--- https://github.com/mcabbott/Tullio.jl/issues/114#issuecomment-899992392
My hunch is that starting and extending the representation like @MasonProtter suggested is possible and much more composable than generated functions. But I'd guess people working on "extended index notation" space have more insight to this. Also, at the lowest level, I think it'd be desirable to have a proper IR like MLIR affine dialect. But I think a layer that is extensible by packages like this is very useful.
A possible extension to @MasonProtter's suggestion may be to further decompose C = materialize(_2)
into
sch = scheduler(_2)
C = materialize(sch, _2)
s.t. the user can specify the scheduler/style/executor sch
to override the default decision by the scheduler
function. The scheduler sch
may have parameters for strategies like loop ordering, tiling, parallelization etc. This is similar to Halide and Tiramisu.
Do we want to schedule something to talk over the ecosystem and what everyone aims to achieve?
Yeah, I don't think it would be such a big problem to support things like shifted indices. You could lower
@tullio B[i] := (A[2i] + A[2i+1])/2
to something like
begin
local dim1 = multiples(2, Dim{1})
local dim2 = shift(+1, multiples(2, Dim{1})
local _1 = casted(+, along(A, dim1), along(A, dim2))
local _2 = casted(/, _2, 2)
local sch = scheduler(_2, options...)
B = materialize(sch, _2)
end
That said, all of this discussion about moving to something more like a broadcasting-like interface makes me wonder if such an overhaul should be targeting TensorCast.jl rather than Tullio.jl
@shashi How does the symbolics.jl IR representation for array expressions compare to a hypothetical perfect Julia array IR or something like MLIR affine dialect? Maybe we already have an array IR and Tullio, FLoops etc should be targeting it?
By the way, just thought I'd mention that opaque closures are allowed to be created and returned in generated functions, so if the closures made in Tullio were @opaque
, then this'd work.
That might be neat. Can you have more than one method though?
julia> using Base.Experimental: @opaque
julia> f = @opaque x::Int -> x^2
(::Int64)::Any->◌
julia> (::typeof(f))(x::Float64) = 1/x
ERROR: cannot add methods to a builtin function
Tullio at present dispatches to methods like Act!(::Type{CuArray}, ...)
. Not sure how easy it would be to remove those.
No, they can't have multiple methods. :(
Instead of dispatch, could you just switch on the type? i.e.
if T isa CuArray
...
elseif T isa Array
...
end
I think the @tullio
macro needs to be re-run anyways if someone loads CUDA
after defining the methods anyways, so I'm not sure there'd be any loss of generality. Maybe just doing callable structs is the way to go though.
Yes this is true, could generate different bodies & sew them together (in the right order). I don't promise it soon though!
As a MWE:
In theory there is no reason this shouldn't work, but in practice, Julia groans. If this is fixed, it should also work with RuntimeGeneratedFunctions.jl, which would allow its use in Symbolics/ModelingToolkit in generating code from array symbolics since the Symbolic array operations actually construct a chain of tullio-like expressions. (It would also allow the completion of https://github.com/JuliaParallel/DaggerArrays.jl)