Closed clefourrier closed 2 months ago
For dtype = float32/bfloat16/float16, we need to change the image creation to
image = { "health_route": "/health", "env": { # Documentaiton: https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/launcher "MAX_BATCH_PREFILL_TOKENS": "2048", "MAX_INPUT_LENGTH": "2047", "MAX_TOTAL_TOKENS": "2048", "MODEL_ID": "/repository", }, "url": "ghcr.io/huggingface/text-generation-inference:1.1.0", } if config.model_dtype is not None: image["env"]["DTYPE"] = str(config.model_dtype)
For quantization, it's --quantize bitsandbytes variations.
--quantize bitsandbytes
Full options are here.
Closed by #124
For dtype = float32/bfloat16/float16, we need to change the image creation to
For quantization, it's
--quantize bitsandbytes
variations.Full options are here.