issues
search
pytorch
/
FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Other
1.15k
stars
466
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Tuning for 405B/70B Prefill with small seqlen
#3042
zjing14
opened
9 hours ago
2
Add support for int64_t indices in TBE inference [1/N]
#3041
q10
opened
11 hours ago
7
Optimize MX4 padding to minimize need for tuning
#3040
jwfromm
opened
13 hours ago
13
Time each step on Nova
#3039
spcyppt
opened
14 hours ago
2
Print the exact variable values triggering the alert
#3038
yumin829928
closed
13 hours ago
4
Work around offsets and indices type mismatch int TBE training
#3037
sryap
closed
12 hours ago
4
add set_async in background thread
#3036
duduyi2013
opened
3 days ago
2
add cache mem stats
#3035
duduyi2013
opened
3 days ago
2
attach eviction filling logic to set_cache
#3034
duduyi2013
opened
3 days ago
2
move set_cache and set_async to background thread
#3033
duduyi2013
opened
3 days ago
2
parallelizing L2 cache lookup
#3032
duduyi2013
opened
3 days ago
2
add ods logging for l2 cache perf
#3031
duduyi2013
opened
3 days ago
2
uvm_to_device expose device as interface
#3030
dracifer
closed
2 days ago
3
Test FBGEMM CI
#3029
embg
opened
3 days ago
2
Consolidate repeat code in TBE inference
#3028
q10
closed
3 days ago
10
Triton PR#4179
#3027
embg
closed
19 hours ago
12
Back out "Fix pack_segments backward when grad is non-contig"
#3026
spcyppt
closed
3 days ago
5
Add method to update internal hyperparameters for FBGEMM TBE
#3025
csmiler
opened
4 days ago
2
Fix failures_dict_fast.json in TBE inference test
#3024
sryap
closed
4 days ago
3
[fbgemm_gpu] Fix installation of build wheel
#3023
q10
closed
4 days ago
3
Add an input debug function in TBE training
#3022
sryap
closed
4 days ago
5
Move remaining code out of sparse_ops_utils.h
#3021
q10
closed
4 days ago
4
Remove redundant torch.abs
#3020
spcyppt
closed
5 days ago
1
Move tensor utilities out of sparse_ops_utils.h
#3019
q10
closed
5 days ago
3
Change rocm version and update documentation
#3018
spcyppt
closed
5 days ago
3
Fix CK Profiler Build and Tune Small CK FP8 Shapes
#3017
jwfromm
closed
5 days ago
3
Fix test skipping for UVM tests
#3016
q10
closed
6 days ago
4
Add bounds check in prefetch in TBE training
#3015
sryap
closed
6 days ago
7
Move cuda block count functions out of sparse_ops_utils.h
#3014
q10
closed
6 days ago
6
Add bounds check in SSD-TBE
#3013
sryap
closed
1 week ago
4
Fix the dev non-san fbgemm op loading issue
#3012
jianyuh
closed
1 week ago
3
move mqa code
#3011
jianyuh
closed
1 week ago
7
MX4 Row-Based Padding
#3010
jwfromm
closed
3 days ago
5
Decode and Prefill support
#3009
Aya-ZIbra
closed
1 week ago
5
Marlin Mixed Input Kernel Productionization
#3008
jwfromm
closed
6 days ago
6
Move ops macros out of sparse_ops_utils.h
#3007
q10
closed
1 week ago
12
Fix pack_segments backward when grad is non-contig
#3006
spcyppt
closed
1 week ago
9
Add Meta backend/dispatcher for new_unified_tensor
#3005
sryap
closed
1 week ago
3
Use HBM as an L1 cache in the SSD training benchmark
#3004
sryap
closed
1 week ago
5
Add a host map option for a UVM tensor alloc
#3003
sryap
closed
1 week ago
5
[fbgemm_gpu] Remove Python 3.8
#3002
q10
closed
1 week ago
4
refactor kv cache ops
#3001
jianyuh
closed
1 week ago
3
- Add abstract func for HFP8QuantizedToFloat
#3000
flaviotruzzi
closed
1 week ago
11
Support original indices for FBGEMM block bucketization flag
#2999
dstaay-fb
closed
1 week ago
5
Test moving doc into TBE file
#2998
sryap
opened
1 week ago
2
Add unit test for int4 to int4 sequence CPU TBE
#2997
excelle08
closed
3 days ago
24
Add int4 to int4 CPU Sequence TBE kernel
#2996
excelle08
closed
3 days ago
16
Add a CPU nbit to float dequantization op that supports torch.quintMxN type and QuantizedCPU backend
#2995
excelle08
closed
3 days ago
18
Enable int4 to int4 CPU STBE in fbgemm_gpu TBE API
#2994
excelle08
closed
3 days ago
21
Fix get_unique_indices_v2 registration
#2993
sryap
closed
1 week ago
6
Next