microsoft / antares

Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.
Other
449 stars 47 forks source link

[BUG] Tune a bert-base-fp16 failed #353

Closed LeiWang1999 closed 2 years ago

LeiWang1999 commented 2 years ago

device: v100 16G antares server startup command: BACKEND=c-cuda nohup antares rest-server > antares.log 2>&1 &

bert-base-fp16 microsoft nnfusion tuning progress:

 NOTE: the tuning progress (N/M) means that the current best kernel is searched at the N-th step of the total M steps. 

 |                   OP |                       NAME |     STATUS |   PROGRESS |     PERFORMANCE |
 | --------------------------------------------------------------------------------------------- |
 |                  Sum |                    Sum_355 |  completed |     81/1000  |   0.00260336 ms |
 |                  Dot |                        274 |  completed |    975/1000  |   0.00946126 ms |
 |                  Dot |                        359 |  completed |    630/1000  |    0.0138723 ms |
 |      Matched_Pattern |       Matched_Pattern_1913 |  completed |    657/1000  |   0.00821746 ms |
 |      Matched_Pattern |       Matched_Pattern_1914 |  completed |    905/1000  |   0.00294883 ms |
 |      Matched_Pattern |       Matched_Pattern_1915 |  completed |    689/1000  |   0.00302956 ms |
 |      Matched_Pattern |       Matched_Pattern_1916 |  completed |    890/1000  |   0.00960382 ms |
 |      Matched_Pattern |       Matched_Pattern_1917 |  completed |    946/1000  |   0.00693972 ms |
 |      Matched_Pattern |       Matched_Pattern_1918 |  completed |    748/1000  |   0.00690129 ms |
 |      Matched_Pattern |       Matched_Pattern_1920 |  completed |     56/1000  |   0.00569697 ms |
 |      Matched_Pattern |       Matched_Pattern_1984 |  completed |    825/1000  |   0.00819351 ms |
 |      Matched_Pattern |       Matched_Pattern_2059 |  completed |    730/1000  |   0.00573633 ms |
 |      Matched_Pattern |       Matched_Pattern_2060 |  completed |    680/1000  |   0.00690383 ms |
 |              Softmax |                        328 |  submitted |      0/1000  |           -1 ms |
 |      Matched_Pattern |       Matched_Pattern_1919 |  submitted |      0/1000  |           -1 ms |
 |      Matched_Pattern |       Matched_Pattern_1921 |  submitted |      0/1000  |           -1 ms |
 |      Matched_Pattern |       Matched_Pattern_2061 |  submitted |      0/1000  |           -1 ms |
 |      Matched_Pattern |       Matched_Pattern_2062 |  submitted |      0/1000  |           -1 ms |
 |      Matched_Pattern |       Matched_Pattern_2063 |  submitted |      0/1000  |           -1 ms |

anatares' log output :

/bin/bash: line 1: 21668 Aborted                 (core dumped) sh -c "cd /root/.cache/antares/cache/199 && BACKEND=c-cuda  /root/.cache/antares/evaluator.c-cuda my_kernel.cc --dev 2 --timeout 33.0"
/bin/bash: line 1: 21718 Aborted                 (core dumped) sh -c "cd /root/.cache/antares/cache/201 && BACKEND=c-cuda  /root/.cache/antares/evaluator.c-cuda my_kernel.cc --dev 0 --timeout 33.0"
/bin/bash: line 1: 21693 Aborted                 (core dumped) sh -c "cd /root/.cache/antares/cache/200 && BACKEND=c-cuda  /root/.cache/antares/evaluator.c-cuda my_kernel.cc --dev 3 --timeout 33.0"
/bin/bash: line 1: 21743 Aborted                 (core dumped) sh -c "cd /root/.cache/antares/cache/202 && BACKEND=c-cuda  /root/.cache/antares/evaluator.c-cuda my_kernel.cc --dev 1 --timeout 33.0"
.antares-module-tempfile.1.cu(77): error: more than one instance of overloaded function "tanh" matches the argument list:
            function "std::tanh(long double)"
            function "std::tanh(float)"
            argument types are: (__half)

.antares-module-tempfile.1.cu(78): error: more than one instance of overloaded function "tanh" matches the argument list:
            function "std::tanh(long double)"
            function "std::tanh(float)"
            argument types are: (__half)

.antares-module-tempfile.1.cu(79): error: more than one instance of overloaded function "tanh" matches the argument list:
            function "std::tanh(long double)"
            function "std::tanh(float)"
            argument types are: (__half)

.antares-module-tempfile.1.cu(80): error: more than one instance of overloaded function "tanh" matches the argument list:
            function "std::tanh(long double)"
            function "std::tanh(float)"
            argument types are: (__half)

.antares-module-tempfile.1.cu(81): error: more than one instance of overloaded function "tanh" matches the argument list:
            function "std::tanh(long double)"
            function "std::tanh(float)"
            argument types are: (__half)

.antares-module-tempfile.1.cu(82): error: more than one instance of overloaded function "tanh" matches the argument list:
            function "std::tanh(long double)"
            function "std::tanh(float)"
            argument types are: (__half)

.antares-module-tempfile.1.cu(83): error: more than one instance of overloaded function "tanh" matches the argument list:
            function "std::tanh(long double)"
            function "std::tanh(float)"
            argument types are: (__half)

.antares-module-tempfile.1.cu(84): error: more than one instance of overloaded function "tanh" matches the argument list:
            function "std::tanh(long double)"
            function "std::tanh(float)"
            argument types are: (__half)

.antares-module-tempfile.1.cu(85): error: more than one instance of overloaded function "tanh" matches the argument list:
            function "std::tanh(long double)"
            function "std::tanh(float)"
            argument types are: (__half)

.antares-module-tempfile.1.cu(86): error: more than one instance of overloaded function "tanh" matches the argument list:
            function "std::tanh(long double)"
            function "std::tanh(float)"
            argument types are: (__half)

.antares-module-tempfile.1.cu(87): error: more than one instance of overloaded function "tanh" matches the argument list:
            function "std::tanh(long double)"
            function "std::tanh(float)"
            argument types are: (__half)

.antares-module-tempfile.1.cu(88): error: more than one instance of overloaded function "tanh" matches the argument list:
            function "std::tanh(long double)"
            function "std::tanh(float)"
            argument types are: (__half)

12 errors detected in the compilation of ".antares-module-tempfile.1.cu".
terminate called after throwing an instance of 'std::runtime_error'
  what():  Failed to execute command: sh -c '/usr/local/cuda/bin/nvcc .antares-module-tempfile.1.cu --fatbin -O2 -gencode arch=compute_70,code=sm_70 -o .antares-module-tempfile.1.cu.out'
ghostplant commented 2 years ago

It is a problem that shoube be fix by IR, can you cast your input to either one of int64 or float32 according to your requrement? e.g.

x.cast(`float32`).call(`tanh`)