Benchmark S4TF against Julia

INF800 commented 4 years ago

Hi,

I don't know if this is right place to ask.

While benchmarking training and inference-time can you please compare S4TF against Julia STOA frameworks as well? As Julia still remain as a potential goto option, I hope the benchmarks will help community understand how powerful S4TF and X10 actually is.

Thanks, Rakesh

oxinabox commented 4 years ago

As someone deeply involved in Julia AutoDiff, I don't think "official" benchmarks prepared by either the Swift teams or Julia AD community would be that productive, at this stage.

You want your benchmarks written and run by a third party. A blog post is good. Because anyone so involved as the authors will be far too good at using their tool and avoiding it's edge cases, and will have subconscious bias towards not trying as hard with the other. A third party doesn't have to be all that unbiased to be more realistic than the authors of the tools.

Furthermore: benchmarks that the authors of the tools take too seriously can result in distortion of priorities, and people chasing that extra 10% of speed, when they could be better spending the time getting that 20% easier to use. No tool in Julia or (AFAIK) Swift has hit a level of maturity that speed would be the only priority left. There are more important things to do that write and chase benchmarks.

I think we can all agree that both the Swift and Julia work is enhanced by the existence of the other. From a cross pollination of ideas, and from increasing the awareness in the wider world of the desirability a general purpose language that is differentiable. It would be a harsh blow to all if any of the projects stopped, this playground is more than big enough for all of us.

(NB: I can't speak for whole Julia community, only my opinions etc etc)

BradLarson commented 4 years ago

I totally agree with @oxinabox, that's a great assessment of the situation. Language comparisons only make sense in specific, targeted cases and even then truly fair benchmarks are hard to do right.

The language itself may not even be your bottleneck. For ML workloads, what accelerators you use and the stack from high-level APIs down to ML compilers, drivers, and low-level libraries can often play a larger role in your overall performance. For example, even with Swift and our higher-level APIs remaining constant, using the XLA-based X10 backed can be significantly faster in many cases than our eager-mode backend, but slower in others. Likewise, Julia has a wide range of frameworks and accelerator backends, all with their own distinct capabilities.

Our goal with the benchmark suite we've been building has been to pick out some common models that have existing benchmark implementations in other frameworks, as well as other cases that stress different parts of our stack, and then use those to track our own performance improvements and regressions. Our primary comparison is against ourselves over time.

The kinds of applications where I feel that Julia and Swift can really shine will be so heterogeneous (combining complex logic with accelerator-friendly computations, differentiating through physics simulations, and so on) that head-to-head comparisons might be hard to make. We're entering a really exciting time in the field of computing, with the emergence of diverse hardware architectures and powerful yet accessible tools.

tensorflow / swift

Benchmark S4TF against Julia #506