Closed movchan74 closed 9 hours ago
One extra thing I would suggest to do for this issue is to add engine_args to vllm deployment config.
engine_args: CustomConfig
where CustomConfig is https://github.com/mobiusml/aana_sdk/blob/main/aana/core/models/custom_config.py
The reason is that some models like Phi-3 require extra args that are not in the config. For example, Phi-3 needs trust_remote_code=True
but we don't have trust_remote_code in the config. There are a lot of extra option, I wouldn't add them all to the vllm deployment config but we can pass them through custom dict parameter. We already do it in the HF Pipeline deployment.
Enhancement Description
Update the vLLM version from 0.3.2 to the latest available version. This update is necessary to support the phi-3 mini model, which is only compatible with vLLM 0.4.3 and later. The current deployment does not support this model due to the outdated vLLM version. A quick upgrade attempt was made but failed, possibly due to issues with numpy 2.0, which is not backward compatible with numpy 1.x.
Advantages
Possible Implementation
Modify the project requirements to request the latest vLLM version. Possibly add
numpy<2
to avoid compatibility issues with the latest numpy version.