Closed beatty closed 4 months ago
Hi @beatty. Thanks for letting us know. I'll take a look to see what's going on. Can you share an ID of a prediction that had truncated output?
The run
method doesn't actually use the streaming interface. Instead, it conditionally returns an iterator over the list of tokens once the prediction finishes. If you're seeing this behavior consistently, you might try calling the stream
method instead.
@beatty If you're still seeing this behavior, please let me know, and I'd be happy to take a look. Thanks!
With replicate 0.24.0 Python client and "mistralai/mistral-7b-instruct-v0.2" (which is a model that supports streaming), the iterator I get back from client.run() is truncating output frequently, perhaps 1/50 times. I checked on the Replicate dashboard at the request IDs I'm seeing truncation with and observe that the full response was recorded for all of them.
This behavior was consistent with replicate 0.23 as well.
I looked for easy workarounds and couldn't find any (can I disable streaming?).