Open the-crypt-keeper opened 5 months ago
Completed initial instruction eval at FP16, this is an excellent model at JavaScript especially. It used about 45GB of VRAM for inference during my testing runs so should work with 2x24GB setups.
This model also supports FIM, so will keep this issue open for that as well as any quants as they pop up.
Latest interview_cuda
supports torchrun and mistral-inference runtime in an MVP capacity:
torchrun --nproc-per-node 4 ./interview_cuda.py --runtime mistral --model_name ~/models/codestral-22B-v0.1 --params params/greedy-hf.json --input results/prepare_senior_python-javascript_chat-simple.ndjson,results/prepare_junior-v2_python-javascript_chat-simple.ndjson
Adjust 4 to the number of GPU, and --model_name in this case is a directory path and not an HF path
despite being hosted on HF, this model has no config.json and doesnt support inference with transformers library or any other library it seems, only their own custom mistral-inference runtime