Randomly getting truncated output from client.run() for streaming models using replicate 0.24.0

replicate / replicate-python

Python client for Replicate

https://replicate.com

Apache License 2.0

767 stars 220 forks source link

Randomly getting truncated output from client.run() for streaming models using replicate 0.24.0 #252

Closed beatty closed 4 months ago

beatty commented 8 months ago

With replicate 0.24.0 Python client and "mistralai/mistral-7b-instruct-v0.2" (which is a model that supports streaming), the iterator I get back from client.run() is truncating output frequently, perhaps 1/50 times. I checked on the Replicate dashboard at the request IDs I'm seeing truncation with and observe that the full response was recorded for all of them.

This behavior was consistent with replicate 0.23 as well.

I looked for easy workarounds and couldn't find any (can I disable streaming?).

mattt commented 8 months ago

Hi @beatty. Thanks for letting us know. I'll take a look to see what's going on. Can you share an ID of a prediction that had truncated output?

The run method doesn't actually use the streaming interface. Instead, it conditionally returns an iterator over the list of tokens once the prediction finishes. If you're seeing this behavior consistently, you might try calling the stream method instead.

mattt commented 4 months ago

@beatty If you're still seeing this behavior, please let me know, and I'd be happy to take a look. Thanks!