[Question] Any chance `replicate.createPrediction` with `wait: true` is slower then a web-hook approach?

replicate / replicate-swift

Swift client for Replicate

https://replicate.com

Apache License 2.0

164 stars 33 forks source link

[Question] Any chance `replicate.createPrediction` with `wait: true` is slower then a web-hook approach? #28

Closed roimulia2 closed 1 year ago

roimulia2 commented 1 year ago

First of all, thank you for this great repo!

I'm using the following code to create a prediction:

 let prediction = try await replicate.createPrediction(version: modelVersion ?? "",
                                                               input: ["prompt": text,
                                                                       "num_inference_steps" : 50],
                                                               wait: true)

Benchmarking against the actual inference time in Replicate itself (Web GUI), it seems like the iOS client is slower. I wonder if maybe it's related to the retry policy when the wait: true is set? Maybe the re-tries are too slow? Or it should be the same as using a webhook?

mattt commented 1 year ago

Hi @roimulia2. Yes, polling for completion is almost always going to be slower than responding to pushed web hooks. But polling can be made more or less responsive.

By default, the client uses exponential backoff retry logic:

https://github.com/mattt/replicate-swift/blob/050b260a3b0a0d06edb0b6684a8f24c90b42119e/Sources/Replicate/Client.swift#L423-L426

You can override this to use constant backoff (i.e. "wait n seconds each time") by setting the retryPolicy of the client.

Looking at this now, I think this functionality is under-documented, and defaulting to exponential is questionable. And it'd be nice to make this adaptive to some Retry-After value sent by the server. Or even better, we should create a paved path for updating predictions from web hooks delivered through push notifications.

roimulia2 commented 1 year ago

Hey @mattt, thank you for the fast response. What I've done in the past in my own Replicate client is to re-send a request every time I get a response from your servers until it's either failing or succeeding. I'll try to use an aggressive retryPolicy to see if it feels better.

roimulia2 commented 1 year ago

Can you recommend me an appropriate RetrtPolicy that might fit my needs?

mattt commented 1 year ago

@roimulia2 I can't make any specific recommendations, but it all depends on the performance characteristics of the models you're running and the constraints of your mobile client. If predictions are significantly longer to complete compared to web hooks, you could try setting a constant retry to 1s.

roimulia2 commented 1 year ago

Hey @mattt ! I'm using it mainly for Stable Diffusion which is short on the web (2-3s), and also user expectations is around that time frame. Is setting the constant to 0.2 too low? replicate.retryPolicy = .init(strategy: .constant(duration: 0.2, jitter: 0), timeout: nil, maximumInterval: nil, maximumRetries: nil)

mattt commented 1 year ago

@roimulia2 In that situation, I think a 0.5s timeout would be a good fit.

roimulia2 commented 1 year ago

Got it, thanks!