issues
search
google
/
XNNPACK
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Other
1.89k
stars
376
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Fix bytes calculation in x8-packw benchmark
#7459
copybara-service[bot]
closed
1 week ago
0
Update the KleidiAI version to `r0.4.0`.
#7458
copybara-service[bot]
closed
1 week ago
0
Copybara import of the project:
#7457
copybara-service[bot]
closed
5 days ago
0
[CMake] add switch for libm
#7456
xuhancn
opened
1 week ago
1
Fix bytes calculation in x16-packw benchmark
#7455
copybara-service[bot]
closed
1 week ago
0
Batch Matrix Multiply use GEMM config GIO packing function
#7454
copybara-service[bot]
closed
1 week ago
0
Copybara import of the project:
#7453
copybara-service[bot]
closed
6 days ago
0
Copybara import of the project:
#7452
copybara-service[bot]
closed
6 days ago
0
Insert pack lh node for convolution which are compatible with gemm microkernels.
#7451
copybara-service[bot]
closed
1 week ago
0
Internal build change.
#7450
copybara-service[bot]
closed
2 weeks ago
0
Store kernel_zero_point as uint8_t
#7449
copybara-service[bot]
closed
1 week ago
0
Fix optimal `nc` computation in `batch-matrix-multiply`, `convolution-nhwc`, and `dynamic-fully-connected` ops.
#7448
copybara-service[bot]
opened
2 weeks ago
0
Disable `aarch64` `sve2` for `gcc` versions below `10`, for which the compiler flag does not exist.
#7447
copybara-service[bot]
closed
2 weeks ago
0
Replace lut yaml with table header
#7446
pratham-mcw
opened
2 weeks ago
3
Replace conv-hwc yaml with table header
#7445
RahulSundarMCW
opened
2 weeks ago
0
Replace lut32norm yaml with table header
#7444
RahulSundarMCW
opened
2 weeks ago
2
Handle zero dimensions in `static_constant_pad` more elegantly.
#7443
copybara-service[bot]
closed
2 weeks ago
0
Copybara import of the project:
#7442
copybara-service[bot]
closed
2 weeks ago
0
Check cpuinfo returns cache information to avoid dereferencing null
#7441
copybara-service[bot]
closed
1 week ago
1
Cast to float to avoid compile error
#7440
copybara-service[bot]
closed
2 weeks ago
0
Remove u32-f32-cvt kernels
#7439
copybara-service[bot]
closed
2 weeks ago
0
Fix overflow for uint32_t inputs to unary ops
#7438
copybara-service[bot]
closed
2 weeks ago
0
Internal changes to non-public code.
#7437
copybara-service[bot]
closed
2 weeks ago
0
X32-packw AVX GIO use maskload for remainder handling
#7436
copybara-service[bot]
closed
2 weeks ago
0
Copybara import of the project:
#7435
copybara-service[bot]
closed
1 week ago
0
Copybara import of the project:
#7434
copybara-service[bot]
closed
2 weeks ago
0
install microkernels-prod along with XNNPACK
#7433
mcr229
closed
2 weeks ago
3
Use `xnn_create_batch_matrix_multiply_nc_f32_const_weights` instead of `xnn_create_batch_matrix_multiply_nc_f32` in benchmarks to avoid including the cost of packing the weights.
#7432
copybara-service[bot]
closed
2 weeks ago
0
Remove integer support for square difference op
#7431
copybara-service[bot]
closed
2 weeks ago
0
Refactor reduce parameters.
#7430
copybara-service[bot]
closed
2 weeks ago
0
use Python_EXECUTABLE to generate microkernels.cmake
#7429
mcr229
closed
2 weeks ago
2
X32-packw AVX GIO remove maskload for remainder handling
#7428
copybara-service[bot]
closed
2 weeks ago
0
X32-packw AVX512 GIO
#7427
copybara-service[bot]
closed
1 week ago
0
Initialize extra bytes to fix msan
#7426
copybara-service[bot]
closed
2 weeks ago
0
Remove generator for bf16-vabs
#7425
copybara-service[bot]
closed
2 weeks ago
0
Batch Matrix Multiply use GEMM config GIO packing function
#7424
copybara-service[bot]
closed
1 week ago
0
F32-GEMM AVX512 generate up to 16x64
#7423
copybara-service[bot]
closed
1 week ago
0
X32-packw AVX GIO kblock 8
#7422
copybara-service[bot]
closed
2 weeks ago
0
Add tests for `fully-connected` `qp8` inputs and a kernel zero point of `8` (unsigned weights) as this is now supported by the underlying KleidiAI kernels.
#7421
copybara-service[bot]
closed
2 weeks ago
0
Move the `s8-vclamp` and `u8-vclamp` tests to the `SHARDED_TESTS` since they are very slow on `riscv64-rvv`.
#7420
copybara-service[bot]
closed
2 weeks ago
0
Clean up includes in `f32-gemm/avx-broadcast.c.in`.
#7419
copybara-service[bot]
closed
2 weeks ago
0
Changes to the `BatchMatrixMultiply` benchmarking code:
#7418
copybara-service[bot]
closed
2 weeks ago
0
Avoid hardcoding python3
#7417
fiberflow
closed
2 weeks ago
2
Copybara import of the project:
#7416
copybara-service[bot]
closed
3 weeks ago
0
F32-GEMM avx512 fix remainder handling when nc > 16
#7415
copybara-service[bot]
closed
3 weeks ago
0
F32-GEMM avx512 fix remainder handling when nc > 16
#7414
copybara-service[bot]
closed
3 weeks ago
0
Add missing binary operator benchmarks.
#7413
copybara-service[bot]
closed
2 weeks ago
0
Since the reduction axes will be sorted anyway, remove the requirement that they are already sorted.
#7412
copybara-service[bot]
closed
3 weeks ago
0
VMulCAddC-Replaced yaml files with header table
#7411
nitheshsrikanth-mcw
opened
3 weeks ago
3
xRaddextexp - replaced yaml files with header table
#7410
nitheshsrikanth-mcw
opened
3 weeks ago
1
Previous
Next