replicate / replicate-python

Python client for Replicate
https://replicate.com
Apache License 2.0
770 stars 222 forks source link

Reliably uploading local files as inputs #396

Open enzokro opened 1 week ago

enzokro commented 1 week ago

Hi everyone,

I'm struggling a bit with uploading/running local files to a hosted model. I am following the python instructions on this page: Input Files.

So far I have the input argument in my model's predict signature set to type of cog.Path. Next, I'll process my local file as so: contents = open("my_file.pdf", "rb").

But when I try sending that _io.BufferedReader as part of replicate.run("model_name", input={"file": contents}), it seems that only an empty file goes through. It's definitely not a Path that's arriving to the function.

The main question: what exactly happens to this buffered reader when it's sent to the model from a local file? How should I be processing the object in the Predictor's predict to appropriately handle this opened reader? Do I need to save it back out to a temp file?

Thanks in advance for the help!

aron commented 1 week ago

So far I have the input argument in my model's predict signature set to type of cog.Path. Next, I'll process my local file as so: contents = open("my_file.pdf", "rb").

When you say the input argument, does it look like this:

def predict(self, file: Path = Input("a file")):

The main question: what exactly happens to this buffered reader when it's sent to the model from a local file?

The Replicate client library will upload that file to a temporary location on the Replicate service and use the URL in the input to the model. So your local file will be transformed into something like: https://api.replicate.com/v1/files/123/download?expiry=1731497450&owner=xyz&signature=abc. The cog model will then download this file and write it to disk, providing the predict with a path on disk to that file.

How should I be processing the object in the Predictor's predict to appropriately handle this opened reader? Do I need to save it back out to a temp file?

Path is a subclass of pathlib.Path, and can be accessed using methods like file.absolute() or file.open().

Could you explain a little more about how you are using the replicate client to work with a local cog server? I have a feeling the issue you are seeing is that the files API will not exist as part of cog.

One thing to try immediately would be to use data URLs instead of the default files API:

output = replicate.run("my/model:xyz", input={}, file_encoding_strategy="base64")

Though, I've just found a bug in that path so let me look into this some more.