issues
search
mratsim
/
laser
The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for floats and integers
Apache License 2.0
281
stars
15
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
I can't use openMP in nim_2.0, and it needs to put dll files ,like libgomp-1, the same folder to built exe file to execute it.
#43
kiyoken1594
opened
2 months ago
0
performance of avx512 bit ops and popcounts
#41
brentp
opened
5 years ago
4
Mysterious 2x perf regression on GEMM
#40
mratsim
opened
5 years ago
2
Add float32 implementation of min/max/sum
#39
mratsim
closed
5 years ago
0
[Benchmarks] Cleanup fp_reduction_latency benchmarks
#38
mratsim
opened
5 years ago
0
[Showstopper regression] emit does not generate proper symbol
#37
mratsim
closed
5 years ago
1
parallel reduction
#36
brentp
closed
5 years ago
6
Added Win32 executable memory support
#35
awr1
closed
5 years ago
0
Lux refactor v3 - Frontend
#34
mratsim
closed
5 years ago
0
[Gemm] Nim devel compiler gets stuck when compiling older commits
#33
mratsim
closed
5 years ago
1
[GEMM] Significant performance regression (divided by 5)
#32
mratsim
closed
5 years ago
2
[Lux] Multithreading for JIT code
#31
mratsim
opened
5 years ago
0
NUMA-aware memory allocation and computation
#30
mratsim
opened
5 years ago
0
Lux AST refactor - frontend done
#29
mratsim
closed
5 years ago
0
WIP - Fix 27 and 26
#28
mratsim
closed
5 years ago
0
Regression on GEMM allocation
#27
mratsim
closed
5 years ago
3
Prepacked gemm
#26
mratsim
closed
5 years ago
0
System Profile Dual Xeon Gold 6154
#25
Laurae2
closed
5 years ago
0
Optimize serial gemm + Fix parallel result
#24
mratsim
closed
5 years ago
2
performance of gemm_strided vs numpy
#23
timotheecour
opened
5 years ago
1
gemm_strided: error: always_inline function '_mm256_setzero_pd' requires target feature 'xsave'
#22
timotheecour
opened
5 years ago
1
[GEMM] Enhance serial implementation
#21
mratsim
opened
5 years ago
1
Improve gemm threading
#20
mratsim
closed
5 years ago
1
Fast vectorized exponential float32 implementation (SSE2, AVX2, AVX512)
#19
mratsim
closed
5 years ago
0
Fused assignation shortcut
#18
mratsim
opened
5 years ago
0
Fast image loading primitives
#17
mratsim
opened
5 years ago
0
Try to workaround static generic regression with static param
#16
mratsim
closed
5 years ago
1
Devel regression "object constructor needs an object type"
#15
mratsim
closed
5 years ago
1
AVX512 GEMM kernel
#14
mratsim
closed
5 years ago
1
Transpose does not scale well with multithread
#13
Laurae2
opened
5 years ago
0
Create a benchmark script
#12
mratsim
opened
5 years ago
0
Exponential: Dual Xeon Gold 6154 result
#11
mratsim
opened
5 years ago
3
Benchmark example using Intel MKL (for history)
#10
Laurae2
opened
5 years ago
1
Matrix multiplication: Nested parallelism
#9
mratsim
opened
5 years ago
1
Optimised random sampling methods
#8
mratsim
opened
5 years ago
1
Jit assembler
#7
mratsim
closed
5 years ago
0
Generalize gemm
#6
mratsim
closed
6 years ago
0
Parallel strided iteration does not scale linearly
#5
mratsim
opened
6 years ago
0
Introduce forEach multi-stage domain specific language
#4
mratsim
closed
6 years ago
0
Update for devel OpenMP
#3
mratsim
closed
6 years ago
0
[Design] Error model
#2
mratsim
opened
6 years ago
0
Iteration code size comparison
#1
mratsim
closed
6 years ago
0