iterative / mlem

🐶 A tool to package, serve, and deploy any ML model on any platform. Archived to be resurrected one day🤞
https://mlem.ai
Apache License 2.0
716 stars 44 forks source link

`link` won't accept --rev flag / unexpected 404 #663

Open igordertigor opened 1 year ago

igordertigor commented 1 year ago

I keep having trouble with the --rev flag on commands like mlem link: What exactly is accepted as a revision? I would expect things like branch names, tags (not necessarily annotated) and commit-shas. Strangely, it seems that none of those is actually working. In any case, I get (illustrating here for a tag, which is what I would prefer):

$ mlem link  --rev feature-engine@v0.0.2 --source-project="https://github.com/myorganization/myproject/" models/feature-engine feature-engine
❌ Revision 'feature-engine@v0.0.2' wasn't found in
path=https://github.com/myorganization/myproject,
fs=None

I feel that omitting the --rev is a bad idea, because the target of the link would not be immutable. However, I tried that for testing:

$ mlem link --source-project="https://github.com/myorganization/myproject/" models/feature-engine feature-engine
❌ Unexpected error: 404 Client Error: Not Found for url:
https://api.github.com/repos/myorganization/myproject
Please report it here: <https://github.com/iterative/mlem/issues>

I usually prefer ssh when interacting with git, but I re-authenticated with the github commandline app gh and set my preferred protocol to https to avoid potential issues with authentication. After all, this is trying to access a private repository. Either this is a bug, or there is something missing from the documentation. I'm happy to help with either of those.

igordertigor commented 1 year ago

Ok, seems that I found out how this should work. Might still be a good idea to have in the documentation:

export GITHUB_USERNAME=igordertigor
export GITHUB_TOKEN=$(gh auth token)
mlem link --rev feature-engine@v0.0.2 --source-project="https://github.com/myorganization/myproject/" models/feature_engine feature_engine

Not sure where a good place would be to add that to the documentation as it kind of refers to all commands as long as they refer to a remote repository in some way.

aguschin commented 1 year ago

Wow, I see you found it out already. We have one place in docs we explain this problem, but it definitely should be somewhere else as well. https://mlem.ai/doc/user-guide/dvc/#working-with-private-repositories

I think we simply need it as a separate section in User Guide. It can be called "Working with Git".

Besides, I think we can mention it in few places like https://mlem.ai/doc/user-guide/linking/

Finally, we can add more meaningful error, e.g.

$ mlem link  --rev feature-engine@v0.0.2 --source-project="https://github.com/myorganization/myproject/" models/feature-engine feature-engine
❌ Revision 'feature-engine@v0.0.2' wasn't found in path=https://github.com/myorganization/myproject,
fs=None. If the repo is private, please check you've set GITHUB_USERNAME and GITHUB_TOKEN

This was my first thought. And second thought: we have this issue that should support authorization like DVC does it. https://github.com/iterative/mlem/issues/616

@igordertigor, I'm happy to help you with contributing this in either docs or MLEM codebase, especially if you'd decide to help MLEM support ssh auth.

igordertigor commented 1 year ago

Thank you @aguschin for your quick reply. Here are a few thoughts:

  1. I looked at the section you linked about private repositories and added the option to set GITHUB_TOKEN from the command line as a pull request
  2. In addition to that, I agree that a separate section in the User Guide might be useful. I wouldn't necessarily call it "Working with Git" but rather "Working with Github". After all, the authentication issues (at least via http) are not git itself, but the github service built on top of git.
  3. Yes, more meaningful errors are always great.
  4. I like your second thought. I believe that most users would use mlem alongside dvc and needing separate authentication methods for the two feels kind of awkward.
  5. I could imagine helping with mlem support for ssh auth, but I can't quite judge how much effort it would be. Looking at mlem.contrib.github, it seems that the whole http github interface is very much baked into the GithubResolver. I feel that ssh auth would need to re-implement the full UriResolver. However, the nice part might be that it would not potentially work with every git repository. Does that sound like I'm correctly rephrasing the problem? I could probably spare a few hours for this but not necessarily a few days. Let me know what you think @aguschin.
aguschin commented 1 year ago

@igordertigor thanks!

  1. I guess the best bet here would be to adapt DVC's approach https://github.com/iterative/dvc/blob/main/dvc/external_repo.py . This should require implementing something on top of CloudGitResolver which will work for any GH provider. We can probably drop what we currently have implemented then.

Re 2.-3. -- if you feel like contributing 5. will be too much, making PRs for these items should be fast enough and will fix it. I'll get to 5. once I have time then (not soon unfortunately).

igordertigor commented 1 year ago
  1. I guess the best bet here would be to adapt DVC's approach https://github.com/iterative/dvc/blob/main/dvc/external_repo.py . This should require implementing something on top of CloudGitResolver which will work for any GH provider. We can probably drop what we currently have implemented then.

As far as I can see, dvc/external_repo.py clones the repo to a local tmp-dir. I'll take a closer look at this in the next days to decide if this seems like something I could do (given that you won't have time soon).

Re 2.-3. -- if you feel like contributing 5. will be too much, making PRs for these items should be fast enough and will fix it. I'll get to 5. once I have time then (not soon unfortunately).

I'm not super eager to contribute, but I feel that mlem being open source, allows me to fix things my self if I really need them fixed rather than waiting for others who may not need them fixed. I hope that makes sense. That doesn't mean I would under no circumstances fix error messages or documentation, just a matter of where I put my work time.