issues
search
octoml
/
mlc-llm
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
https://mlc.ai/mlc-llm
Apache License 2.0
5
stars
8
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Merge with `mlc-ai/main` (`68cd794d02bbff9842f08b6b2ff37eb582f411c0`, 2024-08-01)
#277
sunggg
closed
1 month ago
0
Configure Renovate
#276
renovate[bot]
opened
1 month ago
0
Update worker.py for compatibility with upstream TVM
#275
Lunderberg
closed
1 month ago
1
Initialize all `local_top_k` values in `gating_softmax_topk`
#274
Lunderberg
closed
1 month ago
0
Remove the TVM submodule
#273
binarybana
closed
2 months ago
1
Merge with `mlc-ai/main` (`adc6ee6ae2de97a507291aaff6279af4e3d16a83`, July 2nd 2024)
#272
sunggg
closed
2 months ago
0
[moe_misc] Vectorized version of scatter_output
#271
shtinsa
closed
3 months ago
6
Extend cutlass API for `cutlass.group_gemm_scale_fp16_sm90`
#270
sunggg
closed
4 months ago
0
Use enum for quant types instead of strings
#269
JosephTheOctonaut
closed
4 months ago
0
[FP8] Modify PTQ dequantize function for cuBLAS offloading
#268
ibsidorenko
closed
4 months ago
0
[Question] Smoothquant for other models
#267
ponytaill
closed
3 months ago
0
[SmoothQuant] Remove unused functions
#266
ibsidorenko
closed
3 months ago
4
Merge with `mlc-ai/main` (`d3d264d4b05d73e9757375013b842254f052c6ed`, April 29th 2024)
#265
sunggg
closed
4 months ago
0
Single batch specialization for FP8
#264
sunggg
closed
4 months ago
1
Skip moe gate layer in PTQ
#263
csullivan
closed
4 months ago
0
Move delta in `preshard`, `LiftGlobalBufferAlloc` and `HuggingFaceLoader` to ollm `slm/` and clean-up
#262
sunggg
closed
4 months ago
3
Add PTQ Linear e4m3 calibration support
#261
csullivan
closed
4 months ago
1
Merge with `mlc-ai/main` (`835223541d4135e511a50cba1deca06731b03abd`, April 18th 2024)
#260
sunggg
closed
4 months ago
0
Smooth_quant update with Iterator
#259
farshidsp
closed
4 months ago
1
Fix `setup.py`
#258
sunggg
closed
5 months ago
0
[SmoothQuant] Initial implementation for FP8/Int8
#257
ibsidorenko
closed
5 months ago
0
Clean-up and keep the exact copy of FP8 flow
#256
sunggg
closed
5 months ago
0
Hotfix for missing change
#255
sunggg
closed
5 months ago
0
Provide well-formed TIR in AttachMetadataWithMemoryUsage
#254
Lunderberg
closed
4 months ago
2
Merge with latest `mlc-ai/main` (`5bc3ffa6f682a4cf42fdeba3a4c505d0e7c08c3c`)
#253
sunggg
closed
5 months ago
1
[FP8] Shard activation scale factor.
#252
csullivan
closed
5 months ago
0
[FP8] Add config to enable/disable linear quantization.
#251
csullivan
closed
5 months ago
0
[MLC-LLM] Separate function for generating sharding PrimFunc
#250
Lunderberg
closed
5 months ago
1
Provide well-formed TIR in moe_misc
#249
Lunderberg
closed
5 months ago
0
[FP8][Config] Use uint8 packing for fp8_e4m3_e4m3_max
#248
csullivan
closed
5 months ago
0
[SLM] Add ShardScalar tensor_parallel sharding strategy
#247
csullivan
closed
5 months ago
0
[FP8][PTQ] Support packing fp8 into uint8
#246
csullivan
closed
5 months ago
0
[FP8][PTQ] Support float16 weight and activation dtype
#245
csullivan
closed
5 months ago
0
[FP8][TP] All reduce scales before quantizing activation
#244
csullivan
closed
5 months ago
0
[FP8] Deelvin/smoothquant integration
#243
shtinsa
closed
5 months ago
1
[FP8] Fix invalid reinterpret during dequantize
#242
vinx13
closed
5 months ago
2
[FP8] Use explicit TE cast op for e5m2 PTQ
#241
csullivan
closed
5 months ago
0
[FP8] Bring fp8 support to OLLM tracking branch
#240
csullivan
closed
5 months ago
0
Add missing files for #235
#239
masahi
closed
5 months ago
0
[SLM] Allow non-sharded quantized params when sharding.
#238
csullivan
closed
5 months ago
0
[FP8] Use max allreduce to sync calibration param across GPUs
#237
csullivan
closed
5 months ago
0
[FP8] Speed up quantizing weights for e4m3 by using extern max_abs kernel
#236
vinx13
closed
5 months ago
0
Protect against malformed JSON schema
#235
masahi
closed
5 months ago
0
[FP8] Add shard strategy to from_linear for PTQ.
#234
csullivan
closed
5 months ago
1
[FP8] Fix sharding for per-tensor-quantization
#233
vinx13
closed
5 months ago
1
[FP8] Implemented PerTensorQuantization and fp16 calibration for e4m3
#232
vinx13
closed
5 months ago
2
Fix versioning in setup.py
#231
mehrdadh
closed
5 months ago
0
Protect against errors raised when adding a request to the engine
#230
masahi
closed
6 months ago
1
Use custom nd to scalar absolute max reduce kernel in max calibration runtime to improve perf
#229
csullivan
closed
6 months ago
0
[FP8] Perform calibration in fp16
#228
vinx13
closed
6 months ago
0
Next