torch / torch7

http://torch.ch
Other
8.97k stars 2.38k forks source link

[TH/cmake] provide DISABLE_SIMD flag #1029

Closed cdluminate closed 5 years ago

cdluminate commented 7 years ago

When compiling a portable instance of libTH, the AVX2 and SSE4.x codes are not favorable.

This is already done in the torch7 Debian package.

For arm there is no SIMD at all: https://anonscm.debian.org/cgit/debian-science/packages/lua-torch-torch7.git/tree/debian/patches/cmake-disable-simd-on-arm.patch

For amd64 there should be nothing higher than SSE2: https://anonscm.debian.org/cgit/debian-science/packages/lua-torch-torch7.git/tree/debian/patches/cmake-disable-simd-for-compatibility.patch

I can slightly change the Debian patch then submit a PR if you like.

soumith commented 7 years ago

we only enable AVX/AVX2 for functions that have dynamic dispatch based on the CPU SIMD capabilities: https://github.com/torch/torch7/blob/master/lib/TH/CMakeLists.txt#L211-L230 We had to do this to ship portable pytorch binaries. For SSE though, we ship with all SSE versions, and these days we think it's fine.

cdluminate commented 7 years ago

I took a closer look at the code. And found runtime dispatcher part in generic/simd/convolve.c, THVectorDispatcher.c and simd/simd.h. Since there is runtime CPU capability detection, I think it is ok just to put the AVX/AVX2 code in binaries.

I recalled the reason why I added the two patches in the Debian package. Several months ago I uploaded torch7 and it fails to run unit test because of an SIGILL (illegal instruction). Then I patched the cmake file of libTH to remove AVX/AVX2 and NEON detection, so that SIGILL will never be raised. I don't know if the dispatcher part is becoming better recently.

If you are confident about the dispatcher, please close this issue since no action is needed. And I will remove the patches applied to Debian packages.

soumith commented 7 years ago

yes we're pretty confident about the dispatcher now, as pytorch ran on 50k+ machines and no reports so far except for very old AMD processors that do not have SSE 4.3

soumith commented 7 years ago

Several months ago I uploaded torch7 and it fails to run unit test because of an SIGILL (illegal instruction).

Yes, this seems correct. Several months ago this dispatcher was not present.

apaszke commented 7 years ago

Note that fixing dispatcher problems did not solve the AMD bugs. The SIGILL was sent when executing THRandom code, that was automatically vectorized by the compiler. For this reason, these flags might actually be useful.

cdluminate commented 7 years ago

I didn't look into the depth of Torch internal. Please decide whether to reopen this issue.

soumith commented 7 years ago

I wouldn't mind merging a patch from @CDLuminate as the patch will be guarded behind a flag. However, the patch needs to guard things somewhere here: https://github.com/torch/torch7/blob/master/lib/TH/CMakeLists.txt#L200-L209

cdluminate commented 7 years ago

I tried this patch locally but require torch failed due to the missing of a xxx_AVX symbol in libTH.

diff --git a/lib/TH/CMakeLists.txt b/lib/TH/CMakeLists.txt
index c4e6694..986b864 100644
--- a/lib/TH/CMakeLists.txt
+++ b/lib/TH/CMakeLists.txt
@@ -192,42 +192,47 @@ ENDIF()
 ##### sources section
 ######################################################################

-# IF ANY SIMD FOUND
-IF(C_AVX2_FOUND OR C_AVX_FOUND OR C_SSE4_2_FOUND OR C_SSE4_1_FOUND)
-  SET(simd generic/simd/convolve.c)
-ENDIF(C_AVX2_FOUND OR C_AVX_FOUND OR C_SSE4_2_FOUND OR C_SSE4_1_FOUND)
-
-# IF SSE4 FOUND
-IF(C_SSE4_1_FOUND AND C_SSE4_2_FOUND)
-  SET(CMAKE_C_FLAGS "${C_SSE4_1_FLAGS} -DUSE_SSE4_1 ${C_SSE4_2_FLAGS} -DUSE_SSE4_2 ${CMAKE_C_FLAGS}")
-  IF(MSVC)
-    SET_SOURCE_FILES_PROPERTIES(generic/simd/convolve5x5_sse.c PROPERTIES COMPILE_FLAGS "/Ox /fp:fast")
-  ELSE(MSVC)
-    SET_SOURCE_FILES_PROPERTIES(generic/simd/convolve5x5_sse.c PROPERTIES COMPILE_FLAGS "-O3 -ffast-math")
-  ENDIF(MSVC)
-  SET(simd ${simd} generic/simd/convolve5x5_sse.c)
-ENDIF(C_SSE4_1_FOUND AND C_SSE4_2_FOUND)
-
-# IF AVX FOUND
-IF(C_AVX_FOUND)
-  IF(MSVC)
-    SET_SOURCE_FILES_PROPERTIES(generic/simd/convolve5x5_avx.c PROPERTIES COMPILE_FLAGS "/Ox /fp:fast ${C_AVX_FLAGS}")
-    SET_SOURCE_FILES_PROPERTIES(vector/AVX.c PROPERTIES COMPILE_FLAGS "/Ox /arch:AVX ${C_AVX_FLAGS}")
-  ELSE(MSVC)
-    SET_SOURCE_FILES_PROPERTIES(generic/simd/convolve5x5_avx.c PROPERTIES COMPILE_FLAGS "-O3 -ffast-math ${C_AVX_FLAGS}")
-    SET_SOURCE_FILES_PROPERTIES(vector/AVX.c PROPERTIES COMPILE_FLAGS "-O3 ${C_AVX_FLAGS}")
-  ENDIF(MSVC)
-  SET(simd ${simd} vector/AVX.c generic/simd/convolve5x5_avx.c)
-ENDIF(C_AVX_FOUND)
-
-IF(C_AVX2_FOUND)
-  IF(MSVC)
-    SET_SOURCE_FILES_PROPERTIES(vector/AVX2.c PROPERTIES COMPILE_FLAGS "/Ox /arch:AVX2 ${C_AVX2_FLAGS}")
-  ELSE(MSVC)
-    SET_SOURCE_FILES_PROPERTIES(vector/AVX2.c PROPERTIES COMPILE_FLAGS "-O3 ${C_AVX2_FLAGS}")
-  ENDIF(MSVC)
-  SET(simd ${simd} vector/AVX2.c)
-ENDIF(C_AVX2_FOUND)
+# IF NOT DISABLE SIMD
+IF(NOT TH_DISABLE_SIMD)
+  # IF ANY SIMD FOUND
+  IF(C_AVX2_FOUND OR C_AVX_FOUND OR C_SSE4_2_FOUND OR C_SSE4_1_FOUND)
+    SET(simd generic/simd/convolve.c)
+  ENDIF(C_AVX2_FOUND OR C_AVX_FOUND OR C_SSE4_2_FOUND OR C_SSE4_1_FOUND)
+  
+  # IF SSE4 FOUND
+  IF(C_SSE4_1_FOUND AND C_SSE4_2_FOUND)
+    SET(CMAKE_C_FLAGS "${C_SSE4_1_FLAGS} -DUSE_SSE4_1 ${C_SSE4_2_FLAGS} -DUSE_SSE4_2 ${CMAKE_C_FLAGS}")
+    IF(MSVC)
+      SET_SOURCE_FILES_PROPERTIES(generic/simd/convolve5x5_sse.c PROPERTIES COMPILE_FLAGS "/Ox /fp:fast")
+    ELSE(MSVC)
+      SET_SOURCE_FILES_PROPERTIES(generic/simd/convolve5x5_sse.c PROPERTIES COMPILE_FLAGS "-O3 -ffast-math")
+    ENDIF(MSVC)
+    SET(simd ${simd} generic/simd/convolve5x5_sse.c)
+  ENDIF(C_SSE4_1_FOUND AND C_SSE4_2_FOUND)
+  
+  # IF AVX FOUND
+  IF(C_AVX_FOUND)
+    IF(MSVC)
+      SET_SOURCE_FILES_PROPERTIES(generic/simd/convolve5x5_avx.c PROPERTIES COMPILE_FLAGS "/Ox /fp:fast ${C_AVX_FLAGS}")
+      SET_SOURCE_FILES_PROPERTIES(vector/AVX.c PROPERTIES COMPILE_FLAGS "/Ox /arch:AVX ${C_AVX_FLAGS}")
+    ELSE(MSVC)
+      SET_SOURCE_FILES_PROPERTIES(generic/simd/convolve5x5_avx.c PROPERTIES COMPILE_FLAGS "-O3 -ffast-math ${C_AVX_FLAGS}")
+      SET_SOURCE_FILES_PROPERTIES(vector/AVX.c PROPERTIES COMPILE_FLAGS "-O3 ${C_AVX_FLAGS}")
+    ENDIF(MSVC)
+    SET(simd ${simd} vector/AVX.c generic/simd/convolve5x5_avx.c)
+  ENDIF(C_AVX_FOUND)
+  
+  IF(C_AVX2_FOUND)
+    IF(MSVC)
+      SET_SOURCE_FILES_PROPERTIES(vector/AVX2.c PROPERTIES COMPILE_FLAGS "/Ox /arch:AVX2 ${C_AVX2_FLAGS}")
+    ELSE(MSVC)
+      SET_SOURCE_FILES_PROPERTIES(vector/AVX2.c PROPERTIES COMPILE_FLAGS "-O3 ${C_AVX2_FLAGS}")
+    ENDIF(MSVC)
+    SET(simd ${simd} vector/AVX2.c)
+  ENDIF(C_AVX2_FOUND)
+ELSE(NOT TH_DISABLE_SIMD)
+  SET(simd "")
+ENDIF(NOT TH_DISABLE_SIMD)

 SET(hdr
   THGeneral.h THHalf.h THAllocator.h THStorage.h THTensor.h THTensorApply.h THBlas.h THMath.h
cdluminate commented 7 years ago

The illegal instruction problem still exists.

https://buildd.debian.org/status/fetch.php?pkg=lua-torch-torch7&arch=amd64&ver=0%7E20170511-gae1a805-1&stamp=1495278078&raw=0

cdluminate commented 7 years ago

I still need the original patches to avoid build failure, i.e. don't even detect these SIMD instruction sets.