I don't know if this is more of a documentation thing or a functionality thing, but here's my issue. I think I see what getState is intended to do, and while getState() does capture all the tensors, it doesn't store lastToken, so for instance from the documentation I would expect output_1 and output_2 to have the same distribution.
I don't know if this is more of a documentation thing or a functionality thing, but here's my issue. I think I see what getState is intended to do, and while getState() does capture all the tensors, it doesn't store lastToken, so for instance from the documentation I would expect output_1 and output_2 to have the same distribution.
If that's the behaviour I really want, I would need to do
So I don't know what is preferable, to do something like
Or to just put in the documentation that there's an additional type of state, or something else entirely.