google / jetstream-pytorch

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
Apache License 2.0
33 stars 14 forks source link

Issues with prefill & generate #173

Open qihqi opened 3 weeks ago

qihqi commented 3 weeks ago

As reported by @tengomucho

Currently there are a few issues with prefill / generate implemention:

  1. Prefill does not use self._sample to do sampling.
  2. Prefill returns a token, so first time generate calls it should return the second generated token, but now it returns the first token again. This is historical but quite unintuitive.