tensorflow / java

Java bindings for TensorFlow
Apache License 2.0
810 stars 200 forks source link

Significant Performance Variability Across Nodes in Spark Cluster with Version 0.5.0 #552

Open ewan0x79 opened 1 month ago

ewan0x79 commented 1 month ago

I've been using version 0.5.0 and observed some performance inconsistencies across different nodes in my Spark cluster. Specifically, some nodes execute tasks significantly faster than others, with the difference in execution times ranging from tens to thousands of times slower on certain nodes. Given this situation, I'm curious to know if there are any CPU-specific optimizations made during the compilation of this library. For instance, are there optimizations that favor Intel CPUs over AMD CPUs, which might explain the observed performance disparity? Any insights or suggestions on this matter would be greatly appreciated.

Craigacp commented 1 month ago

TensorFlow will optimize things based on the available CPU instructions, so if you have Intel Xeons with AVX-512 and older AMD Epycs without AVX-512 then you'll get a lot faster matrix multiplies and convolution operations on the Intel CPUs. I think we compile against AVX 1, but it pulls in MKL for matrix operations and that has fast paths for more complicated vector instructions. As MKL is made by Intel it might also favour their CPUs in other ways, but we don't have much control over that.

Tens to thousands of times slower doesn't sound right though, typically I'd expect AVX-512 to result in at most a 2x speedup over AVX 2. Are there other differences between these nodes beyond the CPU?