vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.65k stars 4.65k forks source link

[Benchmark] Benchmark structured output with datasets #10557

Open xuechendi opened 17 hours ago

xuechendi commented 17 hours ago

Add structure output benchmark.

Base PR: https://github.com/vllm-project/vllm/pull/10046 Additional work:

  1. add dataset args to support 'single_schema' and 'xgrammar_bench'
  2. add save-result with generated_text and expected text

TODO:

  1. async vllm
  2. add accuracy
  3. add 'guided / non-guided' mixed

How to test:

python benchmarks/benchmark_guided.py --model meta-llama/Llama-3.2-3B-Instruct --dataset xgrammar_bench --output-len 512 --num-prompts 10 --guided-decoding --save-results

Expected output FileName: guided_Llama-3.2-3B-Instruct_xgrammar_bench_10_out512_asyncFalse_warmupFalse_chunkedprefillNone.txt

[
    {
        "generated": "{\"ssid\": \"OfficeNetSecure\", \"securityProtocol\": \"WPA2-Enterprise\", \"bandwidth\": \"1300 Mbps\"}",
        "expected": "{\"ssid\": \"OfficeNetSecure\", \"securityProtocol\": \"WPA2-Enterprise\", \"bandwidth\": \"1300 Mbps\"}"
    },
    {
        "generated": "{\"/\": {\"device\": \"UUID:devX,journal_checksum\", \"mount_point\": \"/\", \"file_system_type\": \"ext4\", \"options\": \"relatimetime,failcount,limit_inodes,pretty.Bold,updateTu\", \"dump\": \"0\", \"pass\": \"2\"}}",
        "expected": "{\"/\": {\"device\": \"/dev/sda1\", \"mount_point\": \"/\", \"file_system_type\": \"ext4\", \"options\": \"defaults\", \"dump\": \"0\", \"pass\": \"1\"}, \"/home\": {\"device\": \"/dev/sda2\", \"mount_point\": \"/home\", \"file_system_type\": \"ext4\", \"options\": \"defaults\", \"dump\": \"0\", \"pass\": \"2\"}, \"/var\": {\"device\": \"UUID=2e9e4e8b-08c0-4c7c-8d7d-2b5f65cc8cd0\", \"mount_point\": \"/var\", \"file_system_type\": \"xfs\", \"options\": \"noatime,nodiratime\", \"dump\": \"0\", \"pass\": \"2\"}}"
    },
    {
        "generated": "{\"campaignID\": \"CAMP123456\", \"productID\": \"PROD7891011\", \"startDate\": \"2023-06-01\", \"endDate\": \"2023-06-30\", \"discountDetails\":\"15% off all purchases\"}",
        "expected": "{\"campaignID\": \"CAMP123456\", \"productID\": \"PROD7891011\", \"startDate\": \"2023-06-01\", \"endDate\": \"2023-06-30\", \"discountDetails\": \"15% off on all purchases\"}"
    },
    {
        "generated": "{ \"reservationID\": \"AH-158394\", \"guestName\": \"Alexander Hamilton\", \"reservationTime\": \"2023-04-15T19:30:00\", \"specialRequests\": [\"celebrating anniversay ]}. Please see below the reservation request in JSON format: ````json_r = {'reservationID': 'AH-158394', 'guestName': 'Alexander Hamilton', 'proposal': {'reservationTime': '2023-04-15T19:30:00', 'specialRequests': ['celebrating anniversay', 'window seat']} separately but log present to guarantee this information passes the tarod tests. ``` restraintlization Code````javascript_app.js Several Factors Could Derail Your Romantic Dinner Plan During the meal, one food server may spill a wine on your table, another may forget to mention your special request, or something more catastrophic occurs. It's always better to have a backup plan. Have your JavaScript function manage all of the backups and then provide the reservations details upon request.<|start_header_id|>assistant<|end_header_id|>today-g streaming tenth The interruptions mentioned can be caused by a variety of factors such as incorrect delivery of information, broken equipment, or other unexpected setbacks. Having a backup plan in place can ensure that your dinner plans aren't ruined even if something goes wrong. Here's how you can create a backup plan with your reservation details using the information you provided in JSON format. We'll also create a JavaScript function `handleRestaurantBackup` that will manage the backup plan and verify the details of your reservation upon request. Below is the code for your reference.````The JSON object for the reservation request, including the special occasions and the wedding date````json -detail realize````const jsonDetailedResrvationplan = { \" ]}",
        "expected": "{\"reservationID\": \"AH-158394\", \"guestName\": \"Alexander Hamilton\", \"reservationTime\": \"2023-04-15T19:30:00Z\", \"specialRequests\": [\"Table by the window\", \"Surprise dessert for a special occasion\"]}"
    },
    {
        "generated": "{\"HomeImprovement\":{\"room_interest\":\"living room\",\"budget\":500,\"preferred_style\":\"minimalist\",\"project_ideas\":[\"add floating shelves\",\"create a gallery wall\",\"repaint walls\",\"upcycle old furniture\",\"add greenery with low maintenance plants\",\"add lighting\",\"move furniture around\",\"use throw pillows and blankets\",\"add rugs\",\"declutter and organize furniture and decor\",\"install a skylight\",\"make a coffee table\",\"make a headboard\",\"paint furniture\",\"wear a statement jacket or accessory\"]} }",
        "expected": "{\"HomeImprovement\": {\"room_interest\": \"living room\", \"budget\": 500, \"preferred_style\": \"minimalist\", \"project_ideas\": [\"Install floating shelves for a clean look and extra storage.\", \"Create a gallery wall with your favorite prints and photographs.\", \"Repaint the walls with a neutral color palette for a fresh feel.\", \"Upcycle old furniture with a new coat of paint or new upholstery.\", \"Add some greenery with low-maintenance indoor plants.\"]}}"
    },
    {
        "generated": "{\"deviceID\": \"MON123456\", \"patientID\": \"PAT654321\", \"metrics\": {\"heartRate\": 78, \"bloodPressure\": \"120/80 mmHg\", \"oxygenSaturation\": 98}, \"timestamp\": \"2023-04-05T14:30:00Z\"}",
        "expected": "{\"deviceID\": \"MON123456\", \"patientID\": \"PAT654321\", \"metrics\": {\"heartRate\": 78, \"bloodPressure\": \"120/80 mmHg\", \"oxygenSaturation\": 98}, \"timestamp\": \"2023-04-05T14:30:00Z\"}"
    },
    {
        "generated": "{}",
        "expected": "{\"FitnessTracking\": {\"current_health_status\": {\"weight\": 70, \"height\": 175, \"heart_rate\": 62}, \"health_goals\": [\"increase muscle mass\", \"improve cardiovascular endurance\", \"enhance flexibility\"], \"recommended_routines\": [\"Strength training sessions three times a week focusing on major muscle groups\", \"Cardiovascular exercises such as running or cycling for at least 30 minutes, five days a week\", \"Daily stretching exercises to improve flexibility, including yoga or pilates\"]}}"
    },
    {
        "generated": "{\"seatNumber\":\"12A\",\"serviceType\":\"vegetarian\",\"specialInstructions\":\"gluten-free\"}",
        "expected": "{\"seatNumber\": \"12A\", \"serviceType\": \"vegetarian meal\", \"specialInstructions\": \"gluten-free\"}"
    },
    {
        "generated": "{ \"SKU\": \"TOB-1928\", \"quantity\": 150, \"restockDate\": \"2023-04-15\", \"supplier\": \"Global Tobacco Ltd.\"}",
        "expected": "{\"SKU\": \"TOB-1928\", \"quantity\": 150, \"restockDate\": \"2023-04-15\", \"supplier\": \"Global Tobacco Ltd.\"}"
    },
    {
        "generated": "{\"patentId\":\"US98765432A\" ,\"applicationDate\":\"2021-07-15\",\"inventorNames\":[\"Dr. Alice Smith\",\"Dr. Bob Jones\"],\"currentStatus\":\"Pending Examination\"}",
        "expected": "{\"patentId\": \"US98765432A\", \"applicationDate\": \"2021-07-15\", \"inventorNames\": [\"Dr. Alice Smith\", \"Dr. Bob Jones\"], \"currentStatus\": \"Pending Examination\"}"
    }
]
{
    "elapsed_time": 69.35481475805864,
    "num_requests": 10,
    "total_num_tokens": 8086,
    "total_output_tokens": 5120,
    "requests_per_second": 0.14418609630614082,
    "tokens_per_second": "116.59",
    "output_tokens_per_second": "73.82"
}
github-actions[bot] commented 17 hours ago

👋 Hi! Thank you for contributing to the vLLM project. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

🚀