mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
19.09k stars 1.56k forks source link

[Bug] When I turned on vulkan in tvm, the model was incorrectly compiled #1503

Closed lss0510 closed 8 months ago

lss0510 commented 10 months ago

🐛 Bug

When I opened vulkan in tvm and finished tvm compilation, there was an error when I used mlc-llm to compile RedPajama.

To Reproduce

Steps to reproduce the behavior:

1.run python3 -m mlc_llm.build --hf-path RedPajama-INCITE-Chat-3B-v1 --quantization q4f16_1 --target cuda, I get this error: Automatically using target for weight quantization: cuda -keys=cuda,gpu -arch=sm_75 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32 Get old param: 0%| | 0/388 [00:00<?, ?tensors/sSegmentation fault (core dumped) | 0/518 [00:00<?, ?tensors/s]

Expected behavior

Environment

Additional context

CharlieFRuan commented 8 months ago

Hi @lss0510! We migrated to a new workflow for model compilation. The flow under python3 -m mlc_llm.build is no longer maintained and will likely be removed soon.

Checkout the new three-step process: https://llm.mlc.ai/docs/compilation/compile_models.html

CharlieFRuan commented 8 months ago

Closing this one for now. Feel free to open another issue if issues persist!