replicate / cog

Containers for machine learning
https://cog.run
Apache License 2.0
7.88k stars 549 forks source link

Tell me if the model doesn't exist before wasting time building it #1678

Closed zeke closed 3 months ago

zeke commented 4 months ago

It's currently possible to run cog push r8.im/zeke/oops-typo-in-model-name, then wait for a very long time while Docker builds the image, before eventually seeing this error:

unknown: {"errors":[{"code":"NAME_UNKNOWN","message":"The model https://replicate.com/zeke/oops-typo-in-model-name does not exist. Head to https://replicate.com/ to create the model"}]}

It would be nice if Cog could do this check before wasting time building.

Etelis commented 3 months ago

I'm on this

Etelis commented 3 months ago

After looking into this, it appears that finding a general and easy solution for verifying push permissions to a repository before building the image is challenging due to a few reasons:

1.Checking Push Privileges: There isn't a straightforward way to verify if push privileges exist for a repository without attempting a push. The concept of a 'dry push'—pushing a small, temporary image to check permissions—seems to be the only practical solution here.

2.Replicate-Specific Check: Even if we implement a specific check for Replicate by using docker manifest inspect to see if a model exists, this method won't cover cases where users lack permissions to push to certain repositories. It only verifies the existence of the repository, not the user's ability to push to it.

Given these points, it seems that implementing a dry push to validate permissions before committing to the main push operation is the best approach. What are your thoughts on this?

@zeke

mattt commented 3 months ago

@zeke I agree that a preflight check for the existence of a model would be an improvement. But I'd stop short of saying that time spent building the image was wasted. If you cog push r8.im/zeke/typo, running r8.im/zeke/real should use the build cache and complete quickly.

mattt commented 3 months ago

Resolved by https://github.com/replicate/cog/pull/1733. Thanks again, @Etelis!