octoml / mlc-llm

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
https://mlc.ai/mlc-llm
Apache License 2.0
5 stars 8 forks source link

[FP8] Add shard strategy to from_linear for PTQ. #234

Closed csullivan closed 7 months ago

csullivan commented 7 months ago

We missed this in #232. Fixes param loader error

2024-03-19 16:23:37 [info     ] Loading parameters from dist/mixtral-8x7b-instruct-v0.1-fp8_e4m3_e5m2_max-vllm-2gpu. [mlc_serve.model.tvm_model] func_name=get_tvm_model lineno=66 pathname=/home/csullivan/scratch/ollm-2/mlc-serve/mlc_serve/model/tvm_model.py process=146792
[16:23:43] /home/csullivan/scratch/ollm-2/deps/tvm/src/runtime/disco/loader.cc:464: [Worker #0] Loading model to device: cuda:0
[16:23:43] /home/csullivan/scratch/ollm-2/deps/tvm/src/runtime/disco/loader.cc:464: [Worker #1] Loading model to device: cuda:1
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/csullivan/scratch/ollm-2/deps/mlc-llm/python/mlc_chat/cli/worker.py", line 51, in <module>
    main()
  File "/home/csullivan/scratch/ollm-2/deps/mlc-llm/python/mlc_chat/cli/worker.py", line 46, in main
    worker_func(worker_id, num_workers, reader, writer)
  File "/home/csullivan/scratch/ollm-2/deps/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 239, in __call__
    raise_last_ffi_error()
  File "/home/csullivan/scratch/ollm-2/deps/tvm/python/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
tvm._ffi.base.TVMError: _Map_base: :at
terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at
csullivan commented 7 months ago

cc @vinx13 let me know if this looks fine to you, fixes the problem I was seeing earlier, mirrors your changes in #233