Closed lss0510 closed 8 months ago
Hi @lss0510! We migrated to a new workflow for model compilation. The flow under python3 -m mlc_llm.build
is no longer maintained and will likely be removed soon.
Checkout the new three-step process: https://llm.mlc.ai/docs/compilation/compile_models.html
Closing this one for now. Feel free to open another issue if issues persist!
🐛 Bug
When I opened vulkan in tvm and finished tvm compilation, there was an error when I used mlc-llm to compile RedPajama.
To Reproduce
Steps to reproduce the behavior:
1.run python3 -m mlc_llm.build --hf-path RedPajama-INCITE-Chat-3B-v1 --quantization q4f16_1 --target cuda, I get this error: Automatically using target for weight quantization: cuda -keys=cuda,gpu -arch=sm_75 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32 Get old param: 0%| | 0/388 [00:00<?, ?tensors/sSegmentation fault (core dumped) | 0/518 [00:00<?, ?tensors/s]
Expected behavior
Environment
python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
, applicable if you compile models):Additional context