tensorflow / java

Java bindings for TensorFlow
Apache License 2.0
811 stars 200 forks source link

Windows build fails on GitHub Actions #125

Closed karllessard closed 4 months ago

karllessard commented 3 years ago

This is not a new topic but I want to start a thread on it so we can get closer to a complete solution.

We always had trouble building our Windows platforms with extensions mkl, gpu or mkl-gpu on GitHub Actions because the operation takes too long (i.e. beyond the 6 hours limit).

Adding this new option to the Visual Studio compiler reduces drastically the compilation time of MKL functions, as I've tested locally. Still, all platforms with GPU support do not complete in time, as you can see in this workflow.

Strangely, we can observe that the preparation of the environment of these builds took 20 minutes (i.e. x2) the time of the non-GPU builds. But we install the same software, regardless if we are building for GPU or not. Can we investigate what is the cause of that delay?

Also, I have been told by SIG Build that disabling Eigen inlining helps reducing the compilation time even more, but this time at the price of some performance loss. Still, should we give it a try?

saudet commented 3 years ago

Strangely, we can observe that the preparation of the environment of these builds took 20 minutes (i.e. x2) the time of the non-GPU builds. But we install the same software, regardless if we are building for GPU or not. Can we investigate what is the cause of that delay?

We would need to create and use custom images to avoid the installation time. Unfortunately, GitHub Actions doesn't support custom images, but it does support custom runners, which is the next best thing. We just need someone to sponsor the machines somewhere on the cloud (and take responsibility for potential security risks too), but it doesn't look like that's going to be Google...

Also, I have been told by SIG Build that disabling Eigen inlining helps reducing the compilation time even more, but this time at the price of some performance loss. Still, should we give it a try?

That's --define=override_eigen_strong_inline=true and it's already there in build.sh. Removing it roughly doubles the build time, see https://github.com/bytedeco/javacpp-presets/issues/568.