Closed philschmid closed 4 weeks ago
Hi @philschmid,
Thanks for the feedback, and I'm glad you're finding the tool useful! Your request makes a lot of sense, and we can definitely prioritize adding this feature.
@parfeniukink, could you take the lead on this? Let's aim to have it ready by the end of the week. A rough outline that should enable minimal code changes:
-max-requests
argument, allowing users to pass in a string like "dataset"
.-max-requests
.This should provide a straightforward way for users to run benchmarks on their datasets without looping indefinitely.
Hello,
Really great tool! Thank you for releasing it. I am currently testing it using an HF dataset. I am wondering if you are planning to support for iterating through the dataset only once?
I have a dataset of 2,500 samples. And i would like to benchmark it once and if all sample are done stop.
Hey! Could you also provide the dataset that you've been using?
Hey, I created this dataset: https://huggingface.co/datasets/philschmid/text-to-sql-dataset-medusa-test-chatml
Hey @philschmid. Let's move to this PR, so we can fit the code better according to what do you want. Will write you a couple of messages there.
Thank you for working on this. While experimenting a bit more i noticed that we might have a different problem we need to solve.
Currently, guidellm uses the "user" role to send requests to the backend. Meaning if you have a dataset that has "conversation," e.g., system + user + assistant. And you want to benchmark it. You can only use the "user" content, which might be problematic if you want to benchmark speculative models or other inputs where the system message or previous turns are important, e.g. encoding time etc.
Thank you for working on this. While experimenting a bit more i noticed that we might have a different problem we need to solve.
Currently, guidellm uses the "user" role to send requests to the backend. Meaning if you have a dataset that has "conversation," e.g., system + user + assistant. And you want to benchmark it. You can only use the "user" content, which might be problematic if you want to benchmark speculative models or other inputs where the system message or previous turns are important, e.g. encoding time etc.
Yeah. Sounds as a completely another topic for a discussion. @markurtz following this one for you as well
Hello,
Really great tool! Thank you for releasing it. I am currently testing it using an HF dataset. I am wondering if you are planning to support for iterating through the dataset only once?
I have a dataset of 2,500 samples. And i would like to benchmark it once and if all sample are done stop.