Closed s-debadri closed 3 months ago
:memo: TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2609. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.
It's good to read your contributions in GPU enablement. One quick question. Do you have a plan to further improve the kernels? e.g.,
sgemv_cl_kernel
's parallel level is one thread per one component of out vector, which can be further parallelized. It would be great to know the current speed-up status compared to CPU.
Yes kernels will be further improved going forward depending on the extent of optimizations we can achieve. Currently we are focusing on implementing the initial skeleton of running LLM on GPU.
Please identify the changes in blas_kernels.cpp before merging. It appears unrelated to other changes.
PTAL: @skykongkong8 @lhs8928
It was one of my suggestions to use terms like lda, ldb, or ldc from previous reviews, although it might have been better to separate feature-implementation commit and bugfix commit. I could confirm current implementation is more desirable than before
Not this PR, blas_kernel code needs to be under the tensor directory for better maintenance.
FC Layer GPU kernels added for
fp16
operation:blas_kernels_fp16.cpp
for BLASfp16
OpenCL kernels.lda
for SGEMV computation for generalization.fp16
support on GPU.Self evaluation:
Signed-off-by: Debadri Samaddar s.debadri@samsung.com