Closed cdluminate closed 5 years ago
we only enable AVX/AVX2 for functions that have dynamic dispatch based on the CPU SIMD capabilities: https://github.com/torch/torch7/blob/master/lib/TH/CMakeLists.txt#L211-L230 We had to do this to ship portable pytorch binaries. For SSE though, we ship with all SSE versions, and these days we think it's fine.
I took a closer look at the code. And found runtime dispatcher part in generic/simd/convolve.c
, THVectorDispatcher.c
and simd/simd.h
. Since there is runtime CPU capability detection, I think it is ok just to put the AVX/AVX2 code in binaries.
I recalled the reason why I added the two patches in the Debian package. Several months ago I uploaded torch7 and it fails to run unit test because of an SIGILL (illegal instruction). Then I patched the cmake file of libTH to remove AVX/AVX2 and NEON detection, so that SIGILL will never be raised. I don't know if the dispatcher part is becoming better recently.
If you are confident about the dispatcher, please close this issue since no action is needed. And I will remove the patches applied to Debian packages.
yes we're pretty confident about the dispatcher now, as pytorch ran on 50k+ machines and no reports so far except for very old AMD processors that do not have SSE 4.3
Several months ago I uploaded torch7 and it fails to run unit test because of an SIGILL (illegal instruction).
Yes, this seems correct. Several months ago this dispatcher was not present.
Note that fixing dispatcher problems did not solve the AMD bugs. The SIGILL was sent when executing THRandom code, that was automatically vectorized by the compiler. For this reason, these flags might actually be useful.
I didn't look into the depth of Torch internal. Please decide whether to reopen this issue.
I wouldn't mind merging a patch from @CDLuminate as the patch will be guarded behind a flag. However, the patch needs to guard things somewhere here: https://github.com/torch/torch7/blob/master/lib/TH/CMakeLists.txt#L200-L209
I tried this patch locally but require torch
failed due to the missing of a xxx_AVX
symbol in libTH.
diff --git a/lib/TH/CMakeLists.txt b/lib/TH/CMakeLists.txt
index c4e6694..986b864 100644
--- a/lib/TH/CMakeLists.txt
+++ b/lib/TH/CMakeLists.txt
@@ -192,42 +192,47 @@ ENDIF()
##### sources section
######################################################################
-# IF ANY SIMD FOUND
-IF(C_AVX2_FOUND OR C_AVX_FOUND OR C_SSE4_2_FOUND OR C_SSE4_1_FOUND)
- SET(simd generic/simd/convolve.c)
-ENDIF(C_AVX2_FOUND OR C_AVX_FOUND OR C_SSE4_2_FOUND OR C_SSE4_1_FOUND)
-
-# IF SSE4 FOUND
-IF(C_SSE4_1_FOUND AND C_SSE4_2_FOUND)
- SET(CMAKE_C_FLAGS "${C_SSE4_1_FLAGS} -DUSE_SSE4_1 ${C_SSE4_2_FLAGS} -DUSE_SSE4_2 ${CMAKE_C_FLAGS}")
- IF(MSVC)
- SET_SOURCE_FILES_PROPERTIES(generic/simd/convolve5x5_sse.c PROPERTIES COMPILE_FLAGS "/Ox /fp:fast")
- ELSE(MSVC)
- SET_SOURCE_FILES_PROPERTIES(generic/simd/convolve5x5_sse.c PROPERTIES COMPILE_FLAGS "-O3 -ffast-math")
- ENDIF(MSVC)
- SET(simd ${simd} generic/simd/convolve5x5_sse.c)
-ENDIF(C_SSE4_1_FOUND AND C_SSE4_2_FOUND)
-
-# IF AVX FOUND
-IF(C_AVX_FOUND)
- IF(MSVC)
- SET_SOURCE_FILES_PROPERTIES(generic/simd/convolve5x5_avx.c PROPERTIES COMPILE_FLAGS "/Ox /fp:fast ${C_AVX_FLAGS}")
- SET_SOURCE_FILES_PROPERTIES(vector/AVX.c PROPERTIES COMPILE_FLAGS "/Ox /arch:AVX ${C_AVX_FLAGS}")
- ELSE(MSVC)
- SET_SOURCE_FILES_PROPERTIES(generic/simd/convolve5x5_avx.c PROPERTIES COMPILE_FLAGS "-O3 -ffast-math ${C_AVX_FLAGS}")
- SET_SOURCE_FILES_PROPERTIES(vector/AVX.c PROPERTIES COMPILE_FLAGS "-O3 ${C_AVX_FLAGS}")
- ENDIF(MSVC)
- SET(simd ${simd} vector/AVX.c generic/simd/convolve5x5_avx.c)
-ENDIF(C_AVX_FOUND)
-
-IF(C_AVX2_FOUND)
- IF(MSVC)
- SET_SOURCE_FILES_PROPERTIES(vector/AVX2.c PROPERTIES COMPILE_FLAGS "/Ox /arch:AVX2 ${C_AVX2_FLAGS}")
- ELSE(MSVC)
- SET_SOURCE_FILES_PROPERTIES(vector/AVX2.c PROPERTIES COMPILE_FLAGS "-O3 ${C_AVX2_FLAGS}")
- ENDIF(MSVC)
- SET(simd ${simd} vector/AVX2.c)
-ENDIF(C_AVX2_FOUND)
+# IF NOT DISABLE SIMD
+IF(NOT TH_DISABLE_SIMD)
+ # IF ANY SIMD FOUND
+ IF(C_AVX2_FOUND OR C_AVX_FOUND OR C_SSE4_2_FOUND OR C_SSE4_1_FOUND)
+ SET(simd generic/simd/convolve.c)
+ ENDIF(C_AVX2_FOUND OR C_AVX_FOUND OR C_SSE4_2_FOUND OR C_SSE4_1_FOUND)
+
+ # IF SSE4 FOUND
+ IF(C_SSE4_1_FOUND AND C_SSE4_2_FOUND)
+ SET(CMAKE_C_FLAGS "${C_SSE4_1_FLAGS} -DUSE_SSE4_1 ${C_SSE4_2_FLAGS} -DUSE_SSE4_2 ${CMAKE_C_FLAGS}")
+ IF(MSVC)
+ SET_SOURCE_FILES_PROPERTIES(generic/simd/convolve5x5_sse.c PROPERTIES COMPILE_FLAGS "/Ox /fp:fast")
+ ELSE(MSVC)
+ SET_SOURCE_FILES_PROPERTIES(generic/simd/convolve5x5_sse.c PROPERTIES COMPILE_FLAGS "-O3 -ffast-math")
+ ENDIF(MSVC)
+ SET(simd ${simd} generic/simd/convolve5x5_sse.c)
+ ENDIF(C_SSE4_1_FOUND AND C_SSE4_2_FOUND)
+
+ # IF AVX FOUND
+ IF(C_AVX_FOUND)
+ IF(MSVC)
+ SET_SOURCE_FILES_PROPERTIES(generic/simd/convolve5x5_avx.c PROPERTIES COMPILE_FLAGS "/Ox /fp:fast ${C_AVX_FLAGS}")
+ SET_SOURCE_FILES_PROPERTIES(vector/AVX.c PROPERTIES COMPILE_FLAGS "/Ox /arch:AVX ${C_AVX_FLAGS}")
+ ELSE(MSVC)
+ SET_SOURCE_FILES_PROPERTIES(generic/simd/convolve5x5_avx.c PROPERTIES COMPILE_FLAGS "-O3 -ffast-math ${C_AVX_FLAGS}")
+ SET_SOURCE_FILES_PROPERTIES(vector/AVX.c PROPERTIES COMPILE_FLAGS "-O3 ${C_AVX_FLAGS}")
+ ENDIF(MSVC)
+ SET(simd ${simd} vector/AVX.c generic/simd/convolve5x5_avx.c)
+ ENDIF(C_AVX_FOUND)
+
+ IF(C_AVX2_FOUND)
+ IF(MSVC)
+ SET_SOURCE_FILES_PROPERTIES(vector/AVX2.c PROPERTIES COMPILE_FLAGS "/Ox /arch:AVX2 ${C_AVX2_FLAGS}")
+ ELSE(MSVC)
+ SET_SOURCE_FILES_PROPERTIES(vector/AVX2.c PROPERTIES COMPILE_FLAGS "-O3 ${C_AVX2_FLAGS}")
+ ENDIF(MSVC)
+ SET(simd ${simd} vector/AVX2.c)
+ ENDIF(C_AVX2_FOUND)
+ELSE(NOT TH_DISABLE_SIMD)
+ SET(simd "")
+ENDIF(NOT TH_DISABLE_SIMD)
SET(hdr
THGeneral.h THHalf.h THAllocator.h THStorage.h THTensor.h THTensorApply.h THBlas.h THMath.h
The illegal instruction problem still exists.
I still need the original patches to avoid build failure, i.e. don't even detect these SIMD instruction sets.
When compiling a portable instance of libTH, the AVX2 and SSE4.x codes are not favorable.
This is already done in the torch7 Debian package.
For arm there is no SIMD at all: https://anonscm.debian.org/cgit/debian-science/packages/lua-torch-torch7.git/tree/debian/patches/cmake-disable-simd-on-arm.patch
For amd64 there should be nothing higher than SSE2: https://anonscm.debian.org/cgit/debian-science/packages/lua-torch-torch7.git/tree/debian/patches/cmake-disable-simd-for-compatibility.patch
I can slightly change the Debian patch then submit a PR if you like.