Is there the way of parallel prompt ?

mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

https://arxiv.org/abs/2309.17453

MIT License

6.38k stars 355 forks source link

Open DavideHe opened 8 months ago

DavideHe commented 8 months ago

as (run_streaming_llama.py#L61)[https://github.com/mit-han-lab/streaming-llm/blob/main/examples/run_streaming_llama.py#L61] see, prompt must be send to model one by one. that will take the high GPU usage time. Is there the way of parallel prompt ?