pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
84.71k stars 22.81k forks source link

Concerning default confiugration for distribution packages #37332

Open cdluminate opened 4 years ago

cdluminate commented 4 years ago

Following the work towards a distro-friendly build system, a question spontaneously comes up in my mind: what's the recommended default configuration for a pytorch package provided by linux distribution?

Currently I'm building debian packages of pytorch (cpu-only) with the following configurations:

export BLAS = OpenBLAS                                                                                                                                                                                                                                      
export NO_CUDA = 1                                                                                     
export REL_WITH_DEB_INFO = ON                                                                                                                                            
export USE_CUDA = OFF                                                                                                                                                    
export USE_CUDNN = OFF
export USE_DISTRIBUTED = ON
export USE_FBGEMM = OFF
export USE_FFMPEG = ON
export USE_GFLAGS= ON
export USE_GLOG = ON
export USE_LEVELDB = ON
export USE_LMDB = ON
export USE_MIOPEN = OFF
export USE_MKLDNN = OFF
export USE_NNPACK = OFF
export USE_OPENCV = ON
export USE_QNNPACK = OFF
export USE_REDIS = OFF
export USE_ROCM = OFF
export USE_SYSTEM_LIBS = ON
export USE_SYSTEM_NCCL = OFF
export USE_XNNPACK = ON
export USE_ZMQ = ON
export USE_ZSTD = OFF

Does that look like a good default?

We can consider preparing a CUDA version (and possibly ROCM version) in the future.

cc @ezyang @seemethere @malfet @vincentqb @fritzo @neerajprad @alicanb @vishwakftw

fritzo commented 4 years ago

@mrsalehi I'm pretty sure this issue means a different sense of 'distributions' 😄

cdluminate commented 4 years ago

cc @ezyang

cdluminate commented 4 years ago

BTW, MKLDNN is (temporarily) not enabled because ideep contains an ancient copy of mkldnn (0.X version), while I've already uploaded onednn (2.X version, renamed from dnnl <- mkl-dnn) to debian archive. I don't whether ideep compiles against onednn 2.X.

ezyang commented 4 years ago

Some of these look suspect:

export USE_FBGEMM = OFF
export USE_MKLDNN = OFF
export USE_NNPACK = OFF
export USE_QNNPACK = OFF

You're going to lose a lot of CPU performance by having these disabled. Check the build logs for our CPU binary releases; they should let you know more about what we make sure to have enabled for our binary builds. The end goal you're looking for is to have comparable CPU performance between the distro version and our packaged versions.

cdluminate commented 4 years ago

Some of these look suspect:

export USE_FBGEMM = OFF
export USE_MKLDNN = OFF
export USE_NNPACK = OFF
export USE_QNNPACK = OFF

You're going to lose a lot of CPU performance by having these disabled. Check the build logs for our CPU binary releases; they should let you know more about what we make sure to have enabled for our binary builds. The end goal you're looking for is to have comparable CPU performance between the distro version and our packaged versions.

Thank you for the feedback. I can gradually add the build dependencies to debian archive, then enable the corresponding options. I've got the information I wanted, that disabling these depedencies (not yet packaged for Debian) won't harm the functionality.

QNNPACK: According to https://github.com/pytorch/pytorch/issues/14699#issuecomment-618405138 this is not going to be packaged. NNPACK+FBGEMM: planned MKLDNN: I've uploaded onednn-2.0beta5 to the archive ... and the latest 1.4 release will be rejected from the archive because 1.4 ls numerically less than 2.0beta5 .... I don't know whether ideep (branch: pytorch_dnnl) works with 2.0beta5. Will try it out in the future.

stemann commented 2 years ago

Check the build logs for our CPU binary releases; they should let you know more about what we make sure to have enabled for our binary builds. The end goal you're looking for is to have comparable CPU performance between the distro version and our packaged versions.

I am working on cross-compiling libtorch for JuliaPackaging, cf. https://github.com/JuliaPackaging/Yggdrasil/pull/4554 - for as many platforms as possible, including CUDA, macOS and Windows. Thank you, @cdluminate, for paving the way for Debian.

@ezyang Could you please confirm that this is the set of build logs to study to find the configuration of the (linux, cxx11) binary release v1.11.0: https://github.com/pytorch/pytorch/actions/runs/1952446963

Which I found by manually searching through GitHub Actions for the workflow linux-binary-libtorch-cxx11-abi to find the execution for the commit for the v1.11.0 tag (which was luckily only 13 days ago...): https://github.com/pytorch/pytorch/commit/bc2c6edaf163b1a1330e37a6e34caf8c553e4755 - which happened to be not on branch:v1.11.0, not on release/1.11, not on branch:v1.11.0-rc7, but on branch:v1.11.0-rc6(!)

ezyang commented 2 years ago

@malfet or @seemethere would know better but I think you are right.

seemethere commented 2 years ago

Check the build logs for our CPU binary releases; they should let you know more about what we make sure to have enabled for our binary builds. The end goal you're looking for is to have comparable CPU performance between the distro version and our packaged versions.

I am working on cross-compiling libtorch for JuliaPackaging, cf. JuliaPackaging/Yggdrasil#4554 - for as many platforms as possible, including CUDA, macOS and Windows. Thank you, @cdluminate, for paving the way for Debian.

@ezyang Could you please confirm that this is the set of build logs to study to find the configuration of the (linux, cxx11) binary release v1.11.0: pytorch/pytorch/actions/runs/1952446963

Which I found by manually searching through GitHub Actions for the workflow linux-binary-libtorch-cxx11-abi to find the execution for the commit for the v1.11.0 tag (which was luckily only 13 days ago...): bc2c6ed - which happened to be not on branch:v1.11.0, not on release/1.11, not on branch:v1.11.0-rc7, but on branch:v1.11.0-rc6(!)

This looks right for which got released as 1.11. We currently use this view for viewing the build jobs for 1.11: https://hud.pytorch.org/hud/pytorch/pytorch/release%2F1.11/0?name_filter=binary