huggingface / lighteval

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
MIT License
467 stars 54 forks source link

Add dtype management in inference endpoints #117

Closed clefourrier closed 2 months ago

clefourrier commented 3 months ago

For dtype = float32/bfloat16/float16, we need to change the image creation to

                image = {
                    "health_route": "/health",
                    "env": {
                        # Documentaiton: https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/launcher
                        "MAX_BATCH_PREFILL_TOKENS": "2048",
                        "MAX_INPUT_LENGTH": "2047",
                        "MAX_TOTAL_TOKENS": "2048",
                        "MODEL_ID": "/repository",
                    },
                    "url": "ghcr.io/huggingface/text-generation-inference:1.1.0",
                }
                if config.model_dtype is not None:
                    image["env"]["DTYPE"] = str(config.model_dtype) 

For quantization, it's --quantize bitsandbytes variations.

Full options are here.

clefourrier commented 2 months ago

Closed by #124