Closed david-macleod closed 10 months ago
Hi @david-macleod, thanks for bringing this up. I don't see Triton needs both of the input and output states to be available at the same time. I think we may be able to use the same memory allocation for input and output states. I've filed a ticket(DLIS-5335) for this optimization. Meanwhile, feel free to make updates to the code. We encourage external contributions for this project!
This has been added in 23.11 release. Please see the following model configuration option for more details:
SequenceStates
objects have separate allocations forinput_states_
andoutput_states_
.output_states_
is written to and theinput_states_
is read from. After a batch is executed they are swapped inSetStateUpdateCallback
so we can correctly read the updated state in the next iteration, and write the next output to the allocation previously used forinput_states_
.My question is why do we need both? In the scenario where the states correspond to multiple GB it is quite costly to maintain both, naively I would expect to be able to write to the the same memory for the output and then pass that back as the input, but is there some reason this should be avoided (perhaps an edge case).
Put another way, If I were to update
SetStateUpdateCallback
to be a no-op and haveinput_states_
andoutput_states
always point to the same memory should I expect issues?