octoml mlc-llm issues - Githubissues

octoml / mlc-llm

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

https://mlc.ai/mlc-llm

Apache License 2.0

5 stars 8 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Merge with `mlc-ai/main` (`68cd794d02bbff9842f08b6b2ff37eb582f411c0`, 2024-08-01)

#277 sunggg closed 1 month ago
0
Configure Renovate

#276 renovate[bot] opened 1 month ago
0
Update worker.py for compatibility with upstream TVM

#275 Lunderberg closed 1 month ago
1
Initialize all `local_top_k` values in `gating_softmax_topk`

#274 Lunderberg closed 1 month ago
0
Remove the TVM submodule

#273 binarybana closed 2 months ago
1
Merge with `mlc-ai/main` (`adc6ee6ae2de97a507291aaff6279af4e3d16a83`, July 2nd 2024)

#272 sunggg closed 2 months ago
0
[moe_misc] Vectorized version of scatter_output

#271 shtinsa closed 3 months ago
6
Extend cutlass API for `cutlass.group_gemm_scale_fp16_sm90`

#270 sunggg closed 4 months ago
0
Use enum for quant types instead of strings

#269 JosephTheOctonaut closed 4 months ago
0
[FP8] Modify PTQ dequantize function for cuBLAS offloading

#268 ibsidorenko closed 4 months ago
0
[Question] Smoothquant for other models

#267 ponytaill closed 3 months ago
0
[SmoothQuant] Remove unused functions

#266 ibsidorenko closed 3 months ago
4
Merge with `mlc-ai/main` (`d3d264d4b05d73e9757375013b842254f052c6ed`, April 29th 2024)

#265 sunggg closed 4 months ago
0
Single batch specialization for FP8

#264 sunggg closed 4 months ago
1
Skip moe gate layer in PTQ

#263 csullivan closed 4 months ago
0
Move delta in `preshard`, `LiftGlobalBufferAlloc` and `HuggingFaceLoader` to ollm `slm/` and clean-up

#262 sunggg closed 4 months ago
3
Add PTQ Linear e4m3 calibration support

#261 csullivan closed 4 months ago
1
Merge with `mlc-ai/main` (`835223541d4135e511a50cba1deca06731b03abd`, April 18th 2024)

#260 sunggg closed 4 months ago
0
Smooth_quant update with Iterator

#259 farshidsp closed 4 months ago
1
Fix `setup.py`

#258 sunggg closed 5 months ago
0
[SmoothQuant] Initial implementation for FP8/Int8

#257 ibsidorenko closed 5 months ago
0
Clean-up and keep the exact copy of FP8 flow

#256 sunggg closed 5 months ago
0
Hotfix for missing change

#255 sunggg closed 5 months ago
0
Provide well-formed TIR in AttachMetadataWithMemoryUsage

#254 Lunderberg closed 4 months ago
2
Merge with latest `mlc-ai/main` (`5bc3ffa6f682a4cf42fdeba3a4c505d0e7c08c3c`)

#253 sunggg closed 5 months ago
1
[FP8] Shard activation scale factor.

#252 csullivan closed 5 months ago
0
[FP8] Add config to enable/disable linear quantization.

#251 csullivan closed 5 months ago
0
[MLC-LLM] Separate function for generating sharding PrimFunc

#250 Lunderberg closed 5 months ago
1
Provide well-formed TIR in moe_misc

#249 Lunderberg closed 5 months ago
0
[FP8][Config] Use uint8 packing for fp8_e4m3_e4m3_max

#248 csullivan closed 5 months ago
0
[SLM] Add ShardScalar tensor_parallel sharding strategy

#247 csullivan closed 5 months ago
0
[FP8][PTQ] Support packing fp8 into uint8

#246 csullivan closed 5 months ago
0
[FP8][PTQ] Support float16 weight and activation dtype

#245 csullivan closed 5 months ago
0
[FP8][TP] All reduce scales before quantizing activation

#244 csullivan closed 5 months ago
0
[FP8] Deelvin/smoothquant integration

#243 shtinsa closed 5 months ago
1
[FP8] Fix invalid reinterpret during dequantize

#242 vinx13 closed 5 months ago
2
[FP8] Use explicit TE cast op for e5m2 PTQ

#241 csullivan closed 5 months ago
0
[FP8] Bring fp8 support to OLLM tracking branch

#240 csullivan closed 5 months ago
0
Add missing files for #235

#239 masahi closed 5 months ago
0
[SLM] Allow non-sharded quantized params when sharding.

#238 csullivan closed 5 months ago
0
[FP8] Use max allreduce to sync calibration param across GPUs

#237 csullivan closed 5 months ago
0
[FP8] Speed up quantizing weights for e4m3 by using extern max_abs kernel

#236 vinx13 closed 5 months ago
0
Protect against malformed JSON schema

#235 masahi closed 5 months ago
0
[FP8] Add shard strategy to from_linear for PTQ.

#234 csullivan closed 5 months ago
1
[FP8] Fix sharding for per-tensor-quantization

#233 vinx13 closed 5 months ago
1
[FP8] Implemented PerTensorQuantization and fp16 calibration for e4m3

#232 vinx13 closed 5 months ago
2
Fix versioning in setup.py

#231 mehrdadh closed 5 months ago
0
Protect against errors raised when adding a request to the engine

#230 masahi closed 6 months ago
1
Use custom nd to scalar absolute max reduce kernel in max calibration runtime to improve perf

#229 csullivan closed 6 months ago
0
[FP8] Perform calibration in fp16

#228 vinx13 closed 6 months ago
0