mcabbott / Tullio.jl

MIT License
613 stars 29 forks source link

Unlock usage inside generated functions #129

Open shashi opened 3 years ago

shashi commented 3 years ago

As a MWE:

julia> @generated foo(A) = :(@tullio B[i] := A[i])
foo (generic function with 1 method)

julia> foo([1])
ERROR: The function body AST defined by this @generated function is not pure. This likely means it contains a closure or comprehension.
Stacktrace:
 [1] top-level scope
   @ REPL[49]:1

In theory there is no reason this shouldn't work, but in practice, Julia groans. If this is fixed, it should also work with RuntimeGeneratedFunctions.jl, which would allow its use in Symbolics/ModelingToolkit in generating code from array symbolics since the Symbolic array operations actually construct a chain of tullio-like expressions. (It would also allow the completion of https://github.com/JuliaParallel/DaggerArrays.jl)

shashi commented 3 years ago

I cursorily looked at what would be required to do this, it seems rather complicated, could you ramble a bit about what would be a good approach?

Ideally we would not have closures, but the Eval mechanism seems to require it. What are the alternatives?

chriselrod commented 3 years ago

I'd suggest defining callable struct types, to which you can pass type info to make them construct the correct expressions.

shashi commented 3 years ago

Yes that seems like a good approach! It wouldn't require much fundamental change to the current code.

shashi commented 3 years ago

I take that back, there's too much information in the intermediate representation, I am not sure if it can go into the type very easily. But it might be possible to use the same trick as RuntimeGeneratedFunctions.jl itself where you store a dictionary of some kinda function ID to the function body (in this case that could be the intermediate representation of Tullio) and then generate the body when a struct containing only the ID in the type parameter is called.

MasonProtter commented 3 years ago

So, what if we took some inspiration from the way that broadcasting is implemented. After all, it seems to work somewhat nicely and it is fully compatible with generated functions. E.g. if I write

@tullio C[i, j] :=A[i, k] * B[k, j] + d[j]

we could macro-expand this to something like

begin
    local _1 = contracted(*, +, along(A, Dim{2}), along(B, Dim{1})
    local _2 = casted(+, along(_1, Dim{2}), along(d, Dim{1}))
    C = materialize(_2)
end

and then materialize can do all the fusion and threading and whatnot on lazy graphs that are generated by casted and contracted.

MasonProtter commented 3 years ago

It could potentially even share an IR with Transducers.jl and/or LoopVectorization.jl, since all three of these packages are really concerned with the same thing: modelling loops.

AriMKatz commented 3 years ago

And KernelAbstractions.jl

shashi commented 3 years ago

Cool representation. There were Map Reduce and Idx types in in ArrayMeta which accomplished the same. But I like doing something similar to broadcast.

I think practically speaking this may be pretty harsh on compilation so I think we should support it as an option, I'm sure @yingboma would agree haha.

Implementation wise I think it's a lot easier if the generated function had access to the dd which means we need to write something that can construct it from these types (and from Expr). That would make the rest of it pretty easy. The problem is though that currently the metadata heavily assumes we are dealing with nice little Symbols. It will be a major surgery but not impossible or deadly.

shashi commented 3 years ago

@vchuravy @mcabbott @chriselrod @tkf let me know if all 3 of you have any interest in standardizing something like this, or alternatively have better ideas/objections.

shashi commented 3 years ago

There are a lot of options that @tullio takes! One way to represent at least the Boolean options is probably a Union of types representing the options that are turned on, that would generate the right number of functions.

AriMKatz commented 3 years ago

Also cc @Wimmerer

tkf commented 3 years ago

There was a similar discussion with @Wimmerer and @mcabbott in #114 (although focusing more on sparse arrays side).

I agree with @MasonProtter that representation like https://github.com/mcabbott/Tullio.jl/issues/129#issuecomment-962525490 would be nice. Kind of like "more advanced" Base.Broadcast facility. I've brought up idea somewhat similar in #114. @mcabbott pointed out it was a bit trickier:

The reason Tullio doesn't do that is that it wants to allow e.g. A[i+1] - A[i-1], for which it shifts & intersects these axes, so the iteration space doesn't correspond to any of the originals.

--- https://github.com/mcabbott/Tullio.jl/issues/114#issuecomment-899992392

My hunch is that starting and extending the representation like @MasonProtter suggested is possible and much more composable than generated functions. But I'd guess people working on "extended index notation" space have more insight to this. Also, at the lowest level, I think it'd be desirable to have a proper IR like MLIR affine dialect. But I think a layer that is extensible by packages like this is very useful.

A possible extension to @MasonProtter's suggestion may be to further decompose C = materialize(_2) into

sch = scheduler(_2)
C = materialize(sch, _2)

s.t. the user can specify the scheduler/style/executor sch to override the default decision by the scheduler function. The scheduler sch may have parameters for strategies like loop ordering, tiling, parallelization etc. This is similar to Halide and Tiramisu.

rayegun commented 3 years ago

Do we want to schedule something to talk over the ecosystem and what everyone aims to achieve?

MasonProtter commented 3 years ago

Yeah, I don't think it would be such a big problem to support things like shifted indices. You could lower

@tullio B[i] := (A[2i] + A[2i+1])/2

to something like

begin
    local dim1 = multiples(2, Dim{1})
    local dim2 = shift(+1, multiples(2, Dim{1})   
    local _1 = casted(+, along(A, dim1), along(A, dim2))
    local _2 = casted(/, _2, 2)
    local sch = scheduler(_2, options...)
    B = materialize(sch, _2)
end
MasonProtter commented 3 years ago

That said, all of this discussion about moving to something more like a broadcasting-like interface makes me wonder if such an overhaul should be targeting TensorCast.jl rather than Tullio.jl

AriMKatz commented 2 years ago

@shashi How does the symbolics.jl IR representation for array expressions compare to a hypothetical perfect Julia array IR or something like MLIR affine dialect? Maybe we already have an array IR and Tullio, FLoops etc should be targeting it?

MasonProtter commented 1 year ago

By the way, just thought I'd mention that opaque closures are allowed to be created and returned in generated functions, so if the closures made in Tullio were @opaque, then this'd work.

mcabbott commented 1 year ago

That might be neat. Can you have more than one method though?

julia> using Base.Experimental: @opaque

julia> f = @opaque x::Int -> x^2
(::Int64)::Any->◌

julia> (::typeof(f))(x::Float64) = 1/x
ERROR: cannot add methods to a builtin function

Tullio at present dispatches to methods like Act!(::Type{CuArray}, ...). Not sure how easy it would be to remove those.

MasonProtter commented 1 year ago

No, they can't have multiple methods. :(

MasonProtter commented 1 year ago

Instead of dispatch, could you just switch on the type? i.e.

if T isa CuArray
    ...
elseif T isa Array
   ...
end

I think the @tullio macro needs to be re-run anyways if someone loads CUDA after defining the methods anyways, so I'm not sure there'd be any loss of generality. Maybe just doing callable structs is the way to go though.

mcabbott commented 1 year ago

Yes this is true, could generate different bodies & sew them together (in the right order). I don't promise it soon though!