replicate / cog

Containers for machine learning
https://cog.run
Apache License 2.0
8.05k stars 561 forks source link

Let model authors specify filetypes for inputs and outputs (audio, video, image, etc) #1341

Open zeke opened 1 year ago

zeke commented 1 year ago

The cog.Path object is used to get files in and out of models. It represents a path to a file on disk. Path is used for all files, regardless of whether they're text files, zip files, videos, images, audio files, etc.

What kind of file does the model want? 🤷🏼

When looking at the schema for a model, it's not easy to tell what type of file is expected:

$ curl -s -H "Authorization: Token $REPLICATE_API_TOKEN" \
  https://api.replicate.com/v1/models/stability-ai/sdxl | jq ".latest_version.openapi_schema.components.schemas.Input.properties.mask"

SDXL's mask input expects an image file, but that's not clear from the schema. Unless the model author writes a description that says what kind of file is expected, users of the model can't reliably know what's expected:

{
  "type": "string",
  "title": "Mask",
  "format": "uri",
  "x-order": 3,
  "description": "Input mask for inpaint mode. Black areas will be preserved, white areas will be inpainted."
}

Being explicit about file types

What if, instead of defining the mask in the predictor as a Path, it could be an ImagePath, which would really just be a Path under the hood with some extra constraints?

from cog import BasePredictor, Input, ImagePath

class Predictor(BasePredictor):
    def predict(
        self,
        mask: ImagePath = Input(
            description="Input mask for inpaint mode. Black areas will be preserved, white areas will be inpainted.",
            default=None,
        )
    )

This may be a naive suggestion about how to approach making input and output types more apparent to model consumers, but I'm open to other ideas that address the issue.

Related issues:

zeke commented 11 months ago

Maybe it could be a property of the existing Path, like a list of mimetypes or something.

zeke commented 2 weeks ago

Related: https://github.com/replicate/cog/pull/2014