google XNNPACK issues - Githubissues

google / XNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web

Other

1.89k stars 376 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Fix bytes calculation in x8-packw benchmark

#7459 copybara-service[bot] closed 1 week ago
0
Update the KleidiAI version to `r0.4.0`.

#7458 copybara-service[bot] closed 1 week ago
0
Copybara import of the project:

#7457 copybara-service[bot] closed 5 days ago
0
[CMake] add switch for libm

#7456 xuhancn opened 1 week ago
1
Fix bytes calculation in x16-packw benchmark

#7455 copybara-service[bot] closed 1 week ago
0
Batch Matrix Multiply use GEMM config GIO packing function

#7454 copybara-service[bot] closed 1 week ago
0
Copybara import of the project:

#7453 copybara-service[bot] closed 6 days ago
0
Copybara import of the project:

#7452 copybara-service[bot] closed 6 days ago
0
Insert pack lh node for convolution which are compatible with gemm microkernels.

#7451 copybara-service[bot] closed 1 week ago
0
Internal build change.

#7450 copybara-service[bot] closed 2 weeks ago
0
Store kernel_zero_point as uint8_t

#7449 copybara-service[bot] closed 1 week ago
0
Fix optimal `nc` computation in `batch-matrix-multiply`, `convolution-nhwc`, and `dynamic-fully-connected` ops.

#7448 copybara-service[bot] opened 2 weeks ago
0
Disable `aarch64` `sve2` for `gcc` versions below `10`, for which the compiler flag does not exist.

#7447 copybara-service[bot] closed 2 weeks ago
0
Replace lut yaml with table header

#7446 pratham-mcw opened 2 weeks ago
3
Replace conv-hwc yaml with table header

#7445 RahulSundarMCW opened 2 weeks ago
0
Replace lut32norm yaml with table header

#7444 RahulSundarMCW opened 2 weeks ago
2
Handle zero dimensions in `static_constant_pad` more elegantly.

#7443 copybara-service[bot] closed 2 weeks ago
0
Copybara import of the project:

#7442 copybara-service[bot] closed 2 weeks ago
0
Check cpuinfo returns cache information to avoid dereferencing null

#7441 copybara-service[bot] closed 1 week ago
1
Cast to float to avoid compile error

#7440 copybara-service[bot] closed 2 weeks ago
0
Remove u32-f32-cvt kernels

#7439 copybara-service[bot] closed 2 weeks ago
0
Fix overflow for uint32_t inputs to unary ops

#7438 copybara-service[bot] closed 2 weeks ago
0
Internal changes to non-public code.

#7437 copybara-service[bot] closed 2 weeks ago
0
X32-packw AVX GIO use maskload for remainder handling

#7436 copybara-service[bot] closed 2 weeks ago
0
Copybara import of the project:

#7435 copybara-service[bot] closed 1 week ago
0
Copybara import of the project:

#7434 copybara-service[bot] closed 2 weeks ago
0
install microkernels-prod along with XNNPACK

#7433 mcr229 closed 2 weeks ago
3
Use `xnn_create_batch_matrix_multiply_nc_f32_const_weights` instead of `xnn_create_batch_matrix_multiply_nc_f32` in benchmarks to avoid including the cost of packing the weights.

#7432 copybara-service[bot] closed 2 weeks ago
0
Remove integer support for square difference op

#7431 copybara-service[bot] closed 2 weeks ago
0
Refactor reduce parameters.

#7430 copybara-service[bot] closed 2 weeks ago
0
use Python_EXECUTABLE to generate microkernels.cmake

#7429 mcr229 closed 2 weeks ago
2
X32-packw AVX GIO remove maskload for remainder handling

#7428 copybara-service[bot] closed 2 weeks ago
0
X32-packw AVX512 GIO

#7427 copybara-service[bot] closed 1 week ago
0
Initialize extra bytes to fix msan

#7426 copybara-service[bot] closed 2 weeks ago
0
Remove generator for bf16-vabs

#7425 copybara-service[bot] closed 2 weeks ago
0
Batch Matrix Multiply use GEMM config GIO packing function

#7424 copybara-service[bot] closed 1 week ago
0
F32-GEMM AVX512 generate up to 16x64

#7423 copybara-service[bot] closed 1 week ago
0
X32-packw AVX GIO kblock 8

#7422 copybara-service[bot] closed 2 weeks ago
0
Add tests for `fully-connected` `qp8` inputs and a kernel zero point of `8` (unsigned weights) as this is now supported by the underlying KleidiAI kernels.

#7421 copybara-service[bot] closed 2 weeks ago
0
Move the `s8-vclamp` and `u8-vclamp` tests to the `SHARDED_TESTS` since they are very slow on `riscv64-rvv`.

#7420 copybara-service[bot] closed 2 weeks ago
0
Clean up includes in `f32-gemm/avx-broadcast.c.in`.

#7419 copybara-service[bot] closed 2 weeks ago
0
Changes to the `BatchMatrixMultiply` benchmarking code:

#7418 copybara-service[bot] closed 2 weeks ago
0
Avoid hardcoding python3

#7417 fiberflow closed 2 weeks ago
2
Copybara import of the project:

#7416 copybara-service[bot] closed 3 weeks ago
0
F32-GEMM avx512 fix remainder handling when nc > 16

#7415 copybara-service[bot] closed 3 weeks ago
0
F32-GEMM avx512 fix remainder handling when nc > 16

#7414 copybara-service[bot] closed 3 weeks ago
0
Add missing binary operator benchmarks.

#7413 copybara-service[bot] closed 2 weeks ago
0
Since the reduction axes will be sorted anyway, remove the requirement that they are already sorted.

#7412 copybara-service[bot] closed 3 weeks ago
0
VMulCAddC-Replaced yaml files with header table

#7411 nitheshsrikanth-mcw opened 3 weeks ago
3
xRaddextexp - replaced yaml files with header table

#7410 nitheshsrikanth-mcw opened 3 weeks ago
1

Previous Next