iterative / gto

🏷️ Git Tag Ops. Turn your Git repository into Artifact Registry or Model Registry.
https://dvc.org/doc/gto
Apache License 2.0
140 stars 16 forks source link

Make `show` and other commands work with remote repos #25

Closed aguschin closed 1 year ago

aguschin commented 2 years ago

Right now they work on local repos only, that's a limitation of gitpython.

francesco086 commented 1 year ago

Should we just error out on the 2nd command then? ("You have uncommitted changes in artifacts.yaml, please commit them before running --auto-commit"?)

I cannot think of any better solution... and great that you thought of that, I didn't 😅

francesco086 commented 1 year ago

Here the next PR that adds auto-commit support to annotate and remove: https://github.com/iterative/gto/pull/290 I must say that I struggled quite a lot for this one.

francesco086 commented 1 year ago

This is the last PR necessary to enable all gto commands to work on remote repos: https://github.com/iterative/gto/pull/292

francesco086 commented 1 year ago

Hi @aguschin , since I have a little bit of spare time, I would like to collect here my thoughts about how I would like to refactor the module git_utils. Working TDD, it is quite some time that tests were pretty much screaming at me "please change the design". But, for the sake of delivering functionalities quickly, I ignored them. Also naming difficulties were a warning sign.

Take the decorator clone_on_remote_repo. Why I didn't want to re-name it simply clone? I find this decorator, as all others, a little tricky, because they silently take the arguments of the decorated function. In theory, if one wants to properly abstract this decorator, one should pass the argument "repo", the name of the argument to look for. Then one could change its signature as clone(repo_arg: str). I find this way clearer than clone(). We have a hint that this decorator will look at the argument repo and probably clone it if needed. Then

@clone(repo_arg="repo")
def register(repo: Union[str, Repo], ...)

is more understandable than

@clone
def register(repo: Union[str, Repo], ...)

This is even more true for other decorators, for example commit_produced_changes_on_auto_commit. If one renames it simply commit and reads

@commit
def annotate(...)

will very likely be confused: commit? always? what's happening? But if you read

@commit(controller_arg="commit")
def annotate(..., commit: bool)

then it is probably way easier to guess what's happening.

This is the readability / naming point of view. Now I would like to take the TDD point of view.

Tests currently are a mess. First I test all functionalities of the git_utils module. But then I cannot simply "trust" that the decorators will work on the api. Why? Because currently the taken variable names are hard-coded. De-facto, the decorators have been written to extend only the api, and there is a tight coupling. So there is no other way to check that everything will work fine than inspecting what happens if you pass e.g. commit=True (first stash, then commit, then de-stash) or commit=False. If the decorator were taking the argument name (e.g. commit), then I could simply check that when the api function is called, then the decorator has been called with the expected argument (controller_arg="commit"), which is part of the api function signature. Currently I cannot simplify the tests in this way, because otherwise if I, for example, rename the argument commit in the api function into self_commit, then the tests will still pass, whereas the code would not work.

Finally, one last point of view. Currently the code is rigid. If I want to rename the argument "repo" in the api functions, I need to change the git_utils module (same for commit and push). This is a bad sign for code. Ideally, I should be able to change the module api without touching any other.

Because of all this, I would advise to refactor the git_utils repo to make it more of an independent helper module, as outlined above.

This is one of the suggestions I have, others after we can agree on this one :)

aguschin commented 1 year ago

Interesting! Didn't think of that. Those changes should make things more decoupled and more easily tested, on that I agree. At the same time, renaming wrappers + adding args should keep things understandable even with shorter names. Curious to see how that will changes tests btw :)

francesco086 commented 1 year ago

I will start with a little PR to rename auto_* as simply * (e.g. auto_push -> push).

Then I will refactor the git_utils module following https://github.com/iterative/gto/issues/25#issuecomment-1289375173

francesco086 commented 1 year ago

As promised :)

After this one, in principle one could also start extending the cli help and gto documentation to notify that remote repos are supported, because the user interface is as desired.

francesco086 commented 1 year ago

Hi @aguschin a short update: I have started refactoring, but I encountered a major difficulty. I cannot figure out a reasonable way to assert that a function is decorated. For example, assert that gto.api.show is decorated with @clone(repo_arg="repo"). I find it ridiculous, but the problem seems to stem from the fact that the decorator is applied at import time. However, even leveraging on that, I could not find a way through.

In case you have any idea about it please let me know.

aguschin commented 1 year ago

As @mike0sv suggested, maybe you can get the function code with inspect.getsource and check if there is correct @decorator in the beginning.

And if decorator could set some attribute of the function, you can check that attribute 🤔

francesco086 commented 1 year ago

Mmm... this is what I meant when I said "a reasonable way"... this was one of the option I could find by googling, but I find parsing the code a little extreme. Well, it seems I have no other choice, so I will give it a try.

It's the first time I am so heavily using decorators. Not the best experience ever so far... I start to wonder if it was the right choice.

aguschin commented 1 year ago

Now it's funny we have a label named good first issue here. Removing it 😂

aguschin commented 1 year ago

As per https://github.com/iterative/gto/pull/309, I consider this issue to be closed. show and other commands are fully supported, and optimization (e.g. caching) is an improvement for the future.

Thank you so much @francesco086, this was a really big effort from your side - I hope you are satisfied with the result! 🙌🏻

francesco086 commented 1 year ago

Thank you so much @francesco086, this was a really big effort from your side - I hope you are satisfied with the result! 🙌🏻

Sorry for the late answer 😅 I somehow missed the notification...

Thank you @aguschin , most credit goes to you, for the many not-easy design decisions and in the end for all the re-factoring! I am indeed very happy of the final result, moreover it was a great opportunity to learn! Looking forward for the next contribution 😃