issues
search
pytorch
/
FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Other
1.11k
stars
446
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Break up `fbgemm_cuda_utils.cuh`, pt 8
#2807
q10
opened
1 day ago
2
Break up `fbgemm_cuda_utils.cuh`, pt 7
#2806
q10
closed
2 days ago
3
Break up `fbgemm_cuda_utils.cuh`, pt 6
#2805
q10
closed
3 days ago
6
FBGEMM Triton MX4 Quantize and Dequantize V2
#2804
jwfromm
closed
3 days ago
3
Break up `fbgemm_cuda_utils.cuh`, pt 5
#2803
q10
closed
4 days ago
5
ah[Pyre][PTT] Remove unused ignores from deeplearning
#2802
MaggieMoss
opened
5 days ago
4
Only scaling at boundary.
#2801
htyu
closed
5 days ago
6
Add Cutlass Blockwise Kernel to Quantize Benchmark
#2800
jwfromm
closed
5 days ago
3
Break up `fbgemm_cuda_utils.cuh`, pt 4
#2799
q10
closed
5 days ago
3
Fix the subnormal subnormal adjustment for FP8 stochastic rounding
#2798
jianyuh
closed
3 days ago
7
Break up `fbgemm_cuda_utils.cuh`, pt 3
#2797
q10
closed
6 days ago
3
Add FP8 Triton rowwise kernel and cudagraph coverage
#2796
jiawenliu64
closed
5 days ago
5
Add an autotune config to FP8 rowwise kernel
#2795
htyu
closed
5 days ago
5
Revert D59034299
#2794
sryap
opened
1 week ago
2
Implement some custom fb op out variant kernels
#2793
qxy11
opened
1 week ago
4
[fbgemm_gpu] Fix the missing `libcuda` linking
#2792
q10
closed
1 week ago
4
Add FP8 CUTLASS blockwise kernel and cudagraph coverage
#2791
jiawenliu64
closed
1 week ago
3
Fix FP8 persistent GEMM kernel issue
#2790
htyu
closed
6 days ago
6
Replace Triton persistent row-wise kernels with non-persistent
#2789
jiawenliu64
closed
1 week ago
5
Flip default in PACKAGE file in deeplearning/PACKAGE
#2788
MaggieMoss
opened
1 week ago
2
Silence existing errors in deeplearning
#2787
MaggieMoss
closed
1 week ago
5
Add Cublas FP8 + Rowwise Scaling Kernel
#2786
jwfromm
closed
1 week ago
3
Break up `fbgemm_cuda_utils.cuh`, pt 2
#2785
q10
closed
1 week ago
5
Enable in-place output tensor for FP8 Rowwise Kernels
#2784
jwfromm
closed
1 week ago
5
[fbgemm_gpu] Fix missing -lcuda linker flag in OSS
#2783
q10
closed
1 week ago
6
Pyre Faster Type Checking By Default Mode Headers] [batch:27/245] [shard:4/N]
#2782
connernilsen
closed
1 week ago
3
fix naming collision
#2781
IvanKobzarev
opened
1 week ago
2
Block-wise FP8 matmul
#2780
lw
closed
1 week ago
6
Update scaled_mm signature in quantize benchmark.
#2779
jwfromm
closed
1 week ago
3
Break up `fbgemm_cuda_utils.cuh`, pt 1
#2778
q10
closed
1 week ago
6
MX4 ops front-end API
#2777
spcyppt
opened
1 week ago
4
support gpu to cpu gather in merge pooled emb
#2776
842974287
closed
1 week ago
3
[fbgemm_gpu] Update pytorch-triton version
#2775
q10
closed
1 week ago
3
KJT custom op for 1d lengths input
#2774
TroyGarden
closed
1 week ago
3
add functional CPU VBE support
#2773
jspark1105
closed
1 week ago
11
implementation of fbgemm op - regroup_keyed_tensor
#2772
TroyGarden
opened
2 weeks ago
3
benchmark of fbgemm op - permute_multi_embedding
#2771
TroyGarden
opened
2 weeks ago
3
move memory copy into one_shot_all_reduce
#2770
xw285cornell
closed
2 weeks ago
7
[fbgemm_gpu] Change rtol
#2769
q10
closed
2 weeks ago
5
add a new function to update sparse delta
#2768
842974287
opened
2 weeks ago
2
Fix torch_dispatch issue in group_index_select_dim0_gpu_backward
#2767
egienvalue
closed
2 weeks ago
3
Regression: Persistent kernels make Triton FP8 matmul much slower
#2766
rosario-purple
closed
1 week ago
3
FBGEMM Unified FP8 Benchmarking Script
#2765
jwfromm
closed
2 weeks ago
5
Make CUTLASS rowwise fp8 faster
#2764
lw
opened
2 weeks ago
2
Organize `fbgemm_gpu.tbe.utils`
#2763
q10
closed
2 weeks ago
4
Additional Tuning for Cutlass FP8 Rowwise Kernel
#2762
jwfromm
opened
2 weeks ago
2
[fbgemm_gpu] Expand test timeouts for CUDA
#2761
q10
opened
2 weeks ago
2
Revert D58372476
#2760
q10
closed
2 weeks ago
9
MX4 benchmark
#2759
spcyppt
closed
1 week ago
5
FBGEMM CK Blockwise FP8 Kernel
#2758
jwfromm
closed
2 weeks ago
5
Next