rprechelt / Vectorize.jl

Cross-platform vectorization of Julia code using Accelerate, VML, and Yeppp!
http://vectorizejl.readthedocs.io/en/latest/
Other
19 stars 7 forks source link

Initial Design Discussion #2

Closed rprechelt closed 8 years ago

rprechelt commented 8 years ago

@ArchRobison Welcome to the (currently empty) home of Vectorize.jl. I wanted to start an issue to discuss the initial plan of attack in more detail. The vast majority of my experience with Julia has been in pure Julia with only a few forays into the codegen and LLVM sections of the code base so my understanding of some of the inner workings of the Julia language is a little rusty; I could definitely appreciate some guidance in this department.

The current plan (at least as far as I understand it) for the most basic implementation of @vectorize is as follows (this is the straight up naive implementation to get me started): 1) @vectorize annotates a function call/comprehension/loop. (similar to what you did in simdloop.jl) 2) We can then call into our LLVM code using codegen.cpp (similar to what you did for @simd codegen.cpp) 3) As far as I understand the process so far, there are two ways we can go from here: we directly construct the vectorized call during codegen, or we do another layer of annotation and then use an LLVM pass to do the actual vectorization. What are your thoughts?

While the original proposal was to structure this as a Julia package (of the Pkg.add() variety), I have been unable to think of a way to build the desired functionality without adding the above code to Julia and rebuilding.

Apologies if any of the above are naive questions, I'm still wrapping my head around the inner workings of Julia and LLVM.

rprechelt commented 8 years ago

Followup:

When I was writing the proposal, I implemented a rough prototype of Vectorize that was able to meet all of the design goals laid out in the proposal and could be installed via Pkg.add(). This worked as follows.

src/Vectorize.jl was completely empty. When the package was installed, deps/build.jl was automatically run. This undertook a complete benchmarking suite. build.jl also had the information to directly construct C calls (via ccall) for any function from any of the libraries. Once the benchmarking process was finished for a function, build.jl would write a direct ccall implementation into src/Vectorize.jl (including comments and docstrings). In this way, src/Vectorize.jl was interactively constructed function by function by deps/build.jl.

Every function resulted in a direct ccall to the corresponding library without any additional overheard beside the function call (the package was also precompiled). I could not reliably measure the difference between calling Vectorize.sum() or writing the corresponding C call directly.

To implement the macro however, there would have to be some conversion/lookup/mapping between Base.sum() and Vectorize.sum() that might cause us to suffer a greater performance hit.

This was the way that I originally had in mind for generating this package. While I understand the performance and flexibility benefits (I'm not sure how performant we could make our block reductions and optimizations in this way, although they are definitely possible) of implementing this code in a similar fashion to @simd, I'm concerned about asking Julia users to rebuild Julia to get this package working. The parable presented in my proposal was to make accessing vectorization easy, convenient, and portable - I know a good number of (non-CS) researchers who are starting to use Julia but would never attempt a complete rebuild (they even download the binary versions of the language) to install a package. Using deps/build.jl also provides a platform to easily install and build the corresponding libraries (Yeppp! etc.) that might be less clear if compiling as part of Julia as a whole.

Please don't take the above as a dismissal of the advice presented in your email regarding the Julia/LLVM implementation method. I just want to make sure we get this right from the beginning. I've already started work on the Julia/LLVM method to help me wrap my head around it, however there is still a lot of code from the above pure-Julia implementation that works well.

@simonbyrne any thoughts on the original post or this followup?

ArchRobison commented 8 years ago

It's good to hear you are diving into LLVM. My general advice is to carefully analyze where the right place is to put a analysis or transform, not just put it somewhere because you understand that part better or because of business arrangements. (Latter is how I once ended up in the insane position of writing a modulo scheduling software pipeliner that ran before instruction selection.)

I appreciate the concern about making the work easily available. Even I download the pre-built Julia for my Windows machine at home. As far as the final product, I don't see a problem with having part of it in the Julia codegen, which could become part of the standard Julia distribution, and having part of it as a separate package. My recent work on vecelement for SIMD may end up in this situation.

Another possible avenue to explore is to add LLVM passes as shared libraries. I don't know if that's possible, but it seems plausible.

rprechelt commented 8 years ago

In response to your final comment, it is possible to develop a pass outside of the LLVM source tree and register it against a binary or system-install LLVM; I just got it working this morning after following some guidelines in the LLVM docs. So that is an option.

rprechelt commented 8 years ago

I think after some thought and analysis that the implementation direction with the most future flexibility would be the original LLVM line suggested by you; however, it would be possible to keep most of the development outside of Julia code base. i.e.

Package: 1) @vectorize macro that annotates expressions 3) LLVM VectorizePass that converts LLVM IR into calls to vectorized functions

Julia: 2) codegen.cpp which takes annotated Julia expression and converts it to annotated LLVM expression

Thoughts?

My only gripe with this method is that it requires Julia users to download a copy of the source, for Vectorize.jl to then modify codegen.cpp (and potentially other files), and then rebuild Julia. I have no concerns over working in LLVM, but this installation strategy is time-consuming (compared to most Julia procedures) while the changes to codegen.cpp are not in Julia itself. Furthermore, if this functionality is not included into Base in the future, then we are left with package that few may install and use and that may require constant updating to track any changes in codegen.cpp that may affect our installation process.

simonbyrne commented 8 years ago

I think Arch is saying that your changes to codegen.cpp could be merged into the Julia repo itself once ready. Obviously this is not something we would do for all packages, but I think this is certainly worth the exception.

ArchRobison commented 8 years ago

Right. If you run grep -i src/codgen.cpp, you'll find only five lines. If we can keep the changes small, and the package is important, it's not a big deal to have them in the Julia repo. Or for that matter, the @simd changes could, in principle, be in a completely separate package as long as there was a way to register the hook with codegen.cpp. Maybe we should be thinking about a general hook. Julia is a dynamic language, so why not a dynamic compiler :-).

Another example is #15244, where the changes seem headed to incorporation into the Julia repo, even though they exist solely to support external packages. seems to be headed that way.