neuralmagic / guidellm

Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
Apache License 2.0
128 stars 7 forks source link

Integration Testing Enablement / Expansion #54

Open rgreenberg1 opened 1 week ago

rgreenberg1 commented 1 week ago

Description Ensure that the targeted files/packages have reasonable test coverage. This includes running the following tests:

Requirements

backend.openai Smoke Test: Utilize OpenAI token and API with model gpt-4o-mini. Ensure startup and test_connection pass. Submit a maximum of 3 requests testing minimal length prompts of various text with output_token_count set. If output_token_count is not set, iterate through a limited number of token responses. Check that the combined prompt in the text generation result is sensible and completed, and the prompt was passed through.

Sanity Test: Utilize a local vLLM server with a tiny model and a local Llama.cpp server with a tiny model. Run through the above-listed tests for OpenAI requests.

Regression Test: Perform more extensive requests by spinning up a vLLM server on a GPU worker with a quantized Llama 3.1 8b. Run the above tests for longer prompts and ensure completion.

request.emulated Smoke Test: Validate various config variations as shown in unit tests while combining with a llama-3.1 tokenizer (both instantiated and from str for a model name). Ensure they generate the proper token distributions as expected.

Sanity Test: None.

Regression Test: Test other text sources with the tokenizer, such as another language text, and ensure the tokenizer runs through correctly and generates the expected output. Additionally, test other tokenizers such as Mistral and Qwen.

request.file Smoke Test: Validate line-based file formats of text, jsonl, and csv with sample data while combining with a llama-3.1 tokenizer (both instantiated and from str for a model name). Ensure they generate the proper token distributions as expected.

Sanity Test: None.

Regression Test: Validate json and yaml file formats with sample data while expanding to other tokenizers such as Mistral and Qwen.

request.transformers Smoke Test: Validate a handful of popular, small datasets on HF datasets, both loaded and passed in, as well as specified through a string. Combine these with a llama-3.1 tokenizer (both instantiated and from str for a model name) and ensure they generate the proper token distributions as expected.

Sanity Test: None.

Regression Test: Test other tokenizers such as Mistral and Qwen.

utils.injector Smoke Test: Validate the injector functionality across the various stages of the guidellm UI (dev, staging, and prod). Ensure that the desired data can be properly injected into the files and results in loadable HTML with the data contained and accessible.

Sanity Test: None.

Regression Test: Test other tokenizers such as Mistral and Qwen.

Expectations Validation should confirm that all tests have sufficient coverage and meet the expectations as detailed above. Report any gaps or failures in the coverage and provide suggestions for improvement.