Open DavideHe opened 8 months ago
as (run_streaming_llama.py#L61)[https://github.com/mit-han-lab/streaming-llm/blob/main/examples/run_streaming_llama.py#L61] see, prompt must be send to model one by one. that will take the high GPU usage time. Is there the way of parallel prompt ?
as (run_streaming_llama.py#L61)[https://github.com/mit-han-lab/streaming-llm/blob/main/examples/run_streaming_llama.py#L61] see, prompt must be send to model one by one. that will take the high GPU usage time. Is there the way of parallel prompt ?