mlc-ai mlc-llm issues - Githubissues

mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation

https://llm.mlc.ai/

Apache License 2.0

19.23k stars 1.58k forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

[Bug][iOS/Swift SDK] Multiple image input to vision models will throw error from TVM

#3044 Neet-Nestor opened 4 hours ago
0
[Bug] internlm2_5-20b-q0f16-MLC模型对话胡言乱语

#3043 l241025097 opened 5 hours ago
1
[C++] Invoke storage allocation for CUDA Graph explicitly

#3042 MasterJH5574 closed 8 hours ago
0
[Python] Skip termination when engine is not initialized

#3040 MasterJH5574 closed 15 hours ago
0
[Bug] Inference with llava throws an error

#3039 HoiM opened 1 day ago
2
[Model] Add support for Xverse Model

#3038 tlopex closed 1 day ago
0
[Bench] Add support for multiple backend

#3037 cyx-6 opened 2 days ago
0
[Bug] macbook pro m4 max apple silicon mlc_llm compile qwen2.5 q4f32 mlc .so error

#3036 l241025097 closed 5 hours ago
3
[Bug] When initializing MLCEngine, getting AttributeError: 'MLCEngine' object has no attribute '_ffi'

#3035 lifelongeeek closed 2 hours ago
5
[Question] Does MLC_LLM MLCEngine have an equivalent API for `llm.generate` in VLLM or SGLang?

#3034 pjyi2147 opened 4 days ago
0
KV cache offloading to CPU RAM

#3033 shahizat opened 4 days ago
0
[JIT] Support overriding optimization flags in JIT

#3032 MasterJH5574 closed 5 days ago
0
[Feature Request] Add vision model flag to model record

#3031 Neet-Nestor opened 6 days ago
0
[Python] Add sentencepiece as installation requirement

#3030 MasterJH5574 closed 6 days ago
0
[3rdparty] Bump tokenizers-cpp for HuggingFace tokenizer update

#3029 MasterJH5574 closed 6 days ago
0
[Model] Keep vision encoder weights unquantized to maintain accuracy

#3028 mengshyu closed 6 days ago
0
[Question] Cannot serve model using multi GPU

#3027 ro99 closed 3 days ago
2
[Fix] Disable FlashInfer when sliding window is enabled

#3026 MasterJH5574 closed 1 week ago
0
[3rdparty] Bump tokenizer-cpp to enable SentencePiece by default

#3025 MasterJH5574 closed 1 week ago
0
[Question] Could not use multi GPU in chat

#3024 BobH233 closed 1 week ago
3
[Bug] Cannot run finetuned model of Mistral 7B with `mlc_llm convert_weights` with "data did not match any variant of untagged enum ModelWrapper"

#3023 pjyi2147 closed 4 days ago
12
Speculative mode for the LLaMA 3.1 70B model

#3022 shahizat opened 1 week ago
2
[Docs] Fix typo export command with compile.

#3021 pongib closed 1 week ago
0
[docs] Updated conversation template doc

#3020 Kaneki-x closed 1 week ago
0
[Docs] Update conversion template link address

#3019 Kaneki-x closed 1 week ago
0
Support of heterogeneous devices

#3018 musram closed 1 week ago
1
[Bug] flutter 跟安卓原生交互，调用engine.chatCompletion 就会发生anr

#3017 tdd102 opened 1 week ago
0
[Bug] internlm2_5模型mlc_llm serve执行异常

#3016 l241025097 closed 4 days ago
9
[Grammar] Migrate to XGrammar

#3015 Ubospica closed 1 week ago
0
[Question] How to show model progress download on WebLLM Javascript SDK?

#3014 DenisSergeevitch closed 1 week ago
1
[Question] How to export MLCChat .apk with weight bundled/included?

#3013 lifelongeeek closed 1 week ago
1
[Model] Add support for GPTJ architecture

#3012 tlopex opened 2 weeks ago
4
[Bug] Speculative decoding doesn't work on Vulkan (AMD iGPU)

#3011 SkyHeroesS opened 2 weeks ago
0
[Question] Android app issue

#3010 j0h0k0i0m opened 2 weeks ago
2
[Model] Update default prefill chunk size of Deepseek

#3009 tlopex closed 6 days ago
1
Fix relativ path comment

#3008 NWuensche closed 2 weeks ago
0
[Docs] Update model template link address

#3007 Kaneki-x closed 2 weeks ago
1
Simplify obvious choices in gen_cmake_config.py

#3006 jeethu closed 2 weeks ago
0
[Bug] large concurrency service broken

#3005 fan-niu opened 3 weeks ago
4
[Bug] Llama-3.1-70B-Instruct-q3f16_1-MLC model running across two GPUs with tensor_parallel_shards=2

#3004 shahizat opened 3 weeks ago
2
[Bug] DLL error

#3003 ArpanDhot opened 3 weeks ago
0
[Bug] Misalignment of Llama3.2 chat template

#3002 Hzfengsy opened 3 weeks ago
0
[Question] Error running prep_emcc_deps.sh - 'tvm/runtime/object.h' file not found

#3001 Big-Boy-420 opened 3 weeks ago
7
[Fix] Typo in serve/engine.py

#3000 shyeonn closed 3 weeks ago
0
[Question] Which models do you recommend for compiling on Mac Intel chip, metal gpu?

#2999 RINO-GAELICO opened 3 weeks ago
0
[Feature Request] openai v1/completion api support

#2998 lpb1 closed 3 weeks ago
1
[Bug] Llama 3.2 3B and 1B on MLC are significantly slower than Llama 3.1 8B (L40s, fp16)

#2997 chrisreese-if opened 3 weeks ago
1
[Bug] Bug with DeepSeek V2

#2996 0xLienid opened 4 weeks ago
1
[Question] TVM error on Mac Intel chip, Metal accelerator

#2995 RINO-GAELICO opened 4 weeks ago
0