triton-inference-server / model_analyzer

Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.
Apache License 2.0
423 stars 74 forks source link

Sending multiple shapes binary input data #863

Open eladamittai opened 5 months ago

eladamittai commented 5 months ago

Hey, I'm using a model with a dynamic shape input with float16 type, and I wanted to test it using grpc, so I have to use binary input data. I was wondering if there is a way to send multiple requests in different shapes like in a json input data but with binary data. Also, if there is a way to send the requests in a certain ratio. Like, sending 16000 shaped requests twice the amount of 32000 shaped requests.

tgerdesnv commented 5 months ago

Hi @eladamittai,

Under the hood, Model Analyzer uses Perf Analyzer. You can find documentation for passing in input data here: https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/perf_analyzer/docs/input_data.md

~~When using Model Analyzer, any args that you want to be passed on to Perf analyzer would go after a --. For example: model_analyzer <model analyzer args> -- --input-data /path/to/file~~ edit: My recollection was wrong. You will need to pass Perf Analyzer args via the perf_analyzer_flags section of the config yaml file. Let me know if you need help with this.

Now then, you had a number of specific asks. I'm trying to wrap my head around if they are possible.

You need binary data: Is this actually true? I see you mentioning GRPC, but are you sure it doesn't work by supplying normal fp16 data? I believe everything should work under the hood. Perf Analyzer should convert the data as needed before sending it to triton.

If you do need binary data, then there are a few possible options, although I'm not sure they are all compatible with the rest of your asks.

You want different shaped requests Go here and search for optional "shape". That paragraph and following example show how to provide data with different shapes.

You want a ratio of different shapes We have stories in our backlog to try to support cases like this, but for now you would need to do it yourself. If you wanted a 2:1 ratio of shape X to shape Y to be sent, then your input data file would need 3 entries: 2 with shape X and 1 with shape Y.

eladamittai commented 5 months ago

Hi @eladamittai,

Under the hood, Model Analyzer uses Perf Analyzer. You can find documentation for passing in input data here: https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/perf_analyzer/docs/input_data.md

~~When using Model Analyzer, any args that you want to be passed on to Perf analyzer would go after a --. For example: model_analyzer <model analyzer args> -- --input-data /path/to/file~~ edit: My recollection was wrong. You will need to pass Perf Analyzer args via the perf_analyzer_flags section of the config yaml file. Let me know if you need help with this.

Now then, you had a number of specific asks. I'm trying to wrap my head around if they are possible.

You need binary data: Is this actually true? I see you mentioning GRPC, but are you sure it doesn't work by supplying normal fp16 data? I believe everything should work under the hood. Perf Analyzer should convert the data as needed before sending it to triton.

If you do need binary data, then there are a few possible options, although I'm not sure they are all compatible with the rest of your asks.

You want different shaped requests Go here and search for optional "shape". That paragraph and following example show how to provide data with different shapes.

You want a ratio of different shapes We have stories in our backlog to try to support cases like this, but for now you would need to do it yourself. If you wanted a 2:1 ratio of shape X to shape Y to be sent, then your input data file would need 3 entries: 2 with shape X and 1 with shape Y.

Hey, thank you for answering. I checked the perf analyzer documentation, and I managed to send the requests in multiple shapes for a float32 compiled version I have of the model using a json file, but as you can see from this older issue I opened about sending float16 input using json, it's not possible when using grpc. Unless something changed in the later releases of the model/perf analyzer. From your response I didn't understand if I can send multiple shapes using a binary data. I tried to combine the binary dir with the json file, as such: { Data: [ Input name: { Content: binary_input_dir Shape: [16000] } ] } But it didn't work. Is there a different way to send multiple binary input files in multiple shapes?

eladamittai commented 5 months ago

Hey, is there an answer?

tgerdesnv commented 4 months ago

Apologies for the delay. I'm looking into this.

tgerdesnv commented 4 months ago

I believe you can use base64 for binary data. Then you can stick to the normal input_data format and provide shapes. There is an example on this page, although I can't link directly to it. You'll have to scroll down. I've cut and pasted it here:

{
  "data":
    [
      {
        "INPUT":
          {
            "content": {"b64": "/9j/4AAQSkZ(...)"},
            "shape": [7964]
          }
      },
      {
        "INPUT":
          {
            "content": {"b64": "/9j/4AAQSkZ(...)"},
            "shape": [7964]
          }
      }
    ]
}

Using that as a basis, you could provide 3 inputs, 2 of one shape and 1 of another, to accomplish the goal of a 2:1 ratio of input shapes.