zenml-io / zenml

ZenML 🙏: The bridge between ML and Ops. https://zenml.io.
https://zenml.io
Apache License 2.0
3.9k stars 427 forks source link

[BUG]: "zenml go" broken for me #2720

Closed raymundlin closed 3 months ago

raymundlin commented 3 months ago

System Information

zenml info -a -s ZENML_LOCAL_VERSION: 0.57.1 ZENML_SERVER_VERSION: 0.57.1 ZENML_SERVER_DATABASE: sqlite ZENML_SERVER_DEPLOYMENT_TYPE: other ... ZENML_ACTIVE_REPOSITORY_ROOT: None PYTHON_VERSION: 3.10.12 ENVIRONMENT: native SYSTEM_INFO: {'os': 'mac', 'mac_version': '14.4.1'} ACTIVE_WORKSPACE: default ACTIVE_STACK: default ACTIVE_USER: default TELEMETRY_STATUS: disabled ANALYTICS_CLIENT_ID: 8ae99a1d-f153-4abc-b868-78e89b46b674 ANALYTICS_USER_ID: a40a4228-9895-4896-88c7-19d5f82f85cb ANALYTICS_SERVER_ID: 8ae99a1d-f153-4abc-b868-78e89b46b674 INTEGRATIONS: ['bitbucket', 'kaniko'] PACKAGES: {'certifi': '2024.2.2', 'tzdata': '2024.1', 'pytz': '2024.1', 'jsonschema-specifications': '2023.12.1', 'setuptools': '65.5.0', 'cryptography': '42.0.7', 'pyzmq': '26.0.3', 'packaging': '24.0', 'pip': '24.0', 'attrs': '23.2.0', 'azure-mgmt-resource': '23.1.1', 'argon2-cffi': '23.1.0', 'argon2-cffi-bindings': '21.2.0', 'isoduration': '20.11.0', 'rich': '13.7.1', 'websockets': '12.0', 'ipython': '8.24.0', 'jupyter-client': '8.6.2', 'click': '8.1.3', 'ipywidgets': '8.1.2', 'nbconvert': '7.16.4', 'overrides': '7.7.0', 'notebook': '7.2.0', 'ipykernel': '6.29.4', 'tornado': '6.4', 'docker': '6.1.3', 'bleach': '6.1.0', 'multidict': '6.0.5', 'pyyaml': '6.0.1', 'traitlets': '5.14.3', 'nbformat': '5.10.4', 'psutil': '5.9.8', 'jupyter-core': '5.7.2', 'cachetools': '5.3.3', 'decorator': '5.1.1', 'smmap': '5.0.1', 'ipinfo': '5.0.1', 'jsonschema': '4.22.0', 'beautifulsoup4': '4.12.3', 'typing-extensions': '4.11.0', 'pexpect': '4.9.0', 'anyio': '4.3.0', 'platformdirs': '4.2.2', 'jupyterlab': '4.2.1', 'gitdb': '4.0.11', 'widgetsnbextension': '4.0.10', 'async-timeout': '4.0.3', 'bcrypt': '4.0.1', 'orjson': '3.10.3', 'aiohttp': '3.9.5', 'idna': '3.7', 'charset-normalizer': '3.3.2', 'gitpython': '3.1.43', 'jinja2': '3.1.4', 'prompt-toolkit': '3.0.43', 'jupyterlab-widgets': '3.0.10', 'mistune': '3.0.2', 'markdown-it-py': '3.0.0', 'requests': '2.32.2', 'jupyterlab-server': '2.27.2', 'pycparser': '2.22', 'fastjsonschema': '2.19.1', 'pygments': '2.18.0', 'babel': '2.15.0', 'jupyter-server': '2.14.0', 'types-python-dateutil': '2.9.0.20240316', 'python-dateutil': '2.9.0.post0', 'pyjwt': '2.7.0', 'soupsieve': '2.5', 'pyparsing': '2.4.7', 'asttokens': '2.4.1', 'jsonpointer': '2.4', 'jupyter-lsp': '2.2.5', 'pandas': '2.2.2', 'cloudpickle': '2.2.1', 'urllib3': '2.2.1', 'markupsafe': '2.1.5', 'python-json-logger': '2.0.7', 'async-lru': '2.0.4', 'tomli': '2.0.1', 'executing': '2.0.1', 'azure-core': '1.30.1', 'numpy': '1.26.4', 'six': '1.16.0', 'cffi': '1.16.0', 'webcolors': '1.13', 'pydantic': '1.10.15', 'yarl': '1.9.4', 'distro': '1.9.0', 'send2trash': '1.8.3', 'debugpy': '1.8.1', 'alembic': '1.8.1', 'websocket-client': '1.8.0', 'passlib': '1.7.4', 'nest-asyncio': '1.6.0', 'fqdn': '1.5.1', 'pandocfilters': '1.5.1', 'sqlalchemy': '1.4.41', 'frozenlist': '1.4.1', 'azure-mgmt-core': '1.4.0', 'mako': '1.3.5', 'aiosignal': '1.3.1', 'sniffio': '1.3.1', 'arrow': '1.3.0', 'uri-template': '1.3.0', 'tinycss2': '1.3.0', 'exceptiongroup': '1.2.1', 'azure-common': '1.1.28', 'httpcore': '1.0.5', 'pymysql': '1.0.3', 'python-dotenv': '1.0.1', 'fastapi': '0.110.3', 'zenml': '0.57.1', 'sqlalchemy-utils': '0.38.3', 'starlette': '0.37.2', 'referencing': '0.35.1', 'uvicorn': '0.29.0', 'httpx': '0.27.0', 'watchfiles': '0.21.0', 'prometheus-client': '0.20.0', 'jedi': '0.19.1', 'httplib2': '0.19.1', 'uvloop': '0.19.0', 'validators': '0.18.2', 'terminado': '0.18.1', 'rpds-py': '0.18.1', 'h11': '0.14.0', 'nbclient': '0.10.0', 'jupyter-events': '0.10.0', 'json5': '0.9.25', 'parso': '0.8.4', 'defusedxml': '0.7.1', 'ptyprocess': '0.7.0', 'stack-data': '0.6.3', 'httptools': '0.6.1', 'isodate': '0.6.1', 'jupyter-server-terminals': '0.5.3', 'webencodings': '0.5.1', 'jupyterlab-pygments': '0.3.0', 'secure': '0.3.0', 'click-params': '0.3.0', 'wcwidth': '0.2.13', 'notebook-shim': '0.2.4', 'pure-eval': '0.2.2', 'comm': '0.2.2', 'fastapi-utils': '0.2.1', 'matplotlib-inline': '0.1.7', 'rfc3339-validator': '0.1.4', 'appnope': '0.1.4', 'mdurl': '0.1.2', 'rfc3986-validator': '0.1.1', 'python-multipart': '0.0.9', 'sqlmodel': '0.0.8', 'sqlalchemy2-stubs': '0.0.2a38'}

CURRENT STACK

Name: default ID: 0100c262-b936-4c7d-ae57-2162bed34afb Workspace: default / 5e7cd50f-21d0-4662-99a3-6c2c08cb1e63

ORCHESTRATOR: default

Name: default ID: 6112c84b-8e54-4165-94f3-d997b65d86fb Type: orchestrator Flavor: local Configuration: {} Workspace: default / 5e7cd50f-21d0-4662-99a3-6c2c08cb1e63

ARTIFACT_STORE: default

Name: default ID: fe3aaafe-0ade-4929-aefe-e54eb8f0f630 Type: artifact_store Flavor: local Configuration: {'path': ''} Workspace: default / 5e7cd50f-21d0-4662-99a3-6c2c08cb1e63

What happened?

"zenml go" broken with the following trace:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /Users/ray4ever/Mirror/Research/mlops/.venv/bin/zenml:8 in │ │ │ │ 5 from zenml.cli.cli import cli │ │ 6 if name == 'main': │ │ 7 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │ │ ❱ 8 │ sys.exit(cli()) │ │ 9 │ │ │ │ /Users/ray4ever/Mirror/Research/mlops/.venv/lib/python3.10/site-packages/click/core.py:1130 in │ │ call │ │ │ │ 1127 │ │ │ 1128 │ def call(self, *args: t.Any, kwargs: t.Any) -> t.Any: │ │ 1129 │ │ """Alias for :meth:main.""" │ │ ❱ 1130 │ │ return self.main(*args, kwargs) │ │ 1131 │ │ 1132 │ │ 1133 class Command(BaseCommand): │ │ │ │ /Users/ray4ever/Mirror/Research/mlops/.venv/lib/python3.10/site-packages/click/core.py:1055 in │ │ main │ │ │ │ 1052 │ │ try: │ │ 1053 │ │ │ try: │ │ 1054 │ │ │ │ with self.make_context(prog_name, args, extra) as ctx: │ │ ❱ 1055 │ │ │ │ │ rv = self.invoke(ctx) │ │ 1056 │ │ │ │ │ if not standalone_mode: │ │ 1057 │ │ │ │ │ │ return rv │ │ 1058 │ │ │ │ │ # it's not safe to ctx.exit(rv) here! │ │ │ │ /Users/ray4ever/Mirror/Research/mlops/.venv/lib/python3.10/site-packages/click/core.py:1657 in │ │ invoke │ │ │ │ 1654 │ │ │ │ super().invoke(ctx) │ │ 1655 │ │ │ │ sub_ctx = cmd.make_context(cmd_name, args, parent=ctx) │ │ 1656 │ │ │ │ with sub_ctx: │ │ ❱ 1657 │ │ │ │ │ return _process_result(sub_ctx.command.invoke(sub_ctx)) │ │ 1658 │ │ │ │ 1659 │ │ # In chain mode we create the contexts step by step, but after the │ │ 1660 │ │ # base command has been invoked. Because at that point we do not │ │ │ │ /Users/ray4ever/Mirror/Research/mlops/.venv/lib/python3.10/site-packages/click/core.py:1404 in │ │ invoke │ │ │ │ 1401 │ │ │ echo(style(message, fg="red"), err=True) │ │ 1402 │ │ │ │ 1403 │ │ if self.callback is not None: │ │ ❱ 1404 │ │ │ return ctx.invoke(self.callback, *ctx.params) │ │ 1405 │ │ │ 1406 │ def shell_complete(self, ctx: Context, incomplete: str) -> t.List["CompletionItem"]: │ │ 1407 │ │ """Return a list of completions for the incomplete value. Looks │ │ │ │ /Users/ray4ever/Mirror/Research/mlops/.venv/lib/python3.10/site-packages/click/core.py:760 in │ │ invoke │ │ │ │ 757 │ │ │ │ 758 │ │ with augment_usage_errors(self): │ │ 759 │ │ │ with ctx: │ │ ❱ 760 │ │ │ │ return callback(args, kwargs) │ │ 761 │ │ │ 762 │ def forward( │ │ 763 │ │ self, cmd: "Command", *args: t.Any, kwargs: t.Any # noqa: B902 │ │ │ │ /Users/ray4ever/Mirror/Research/mlops/.venv/lib/python3.10/site-packages/zenml/cli/base.py:458 │ │ in go │ │ │ │ 455 │ │ │ │ with console.status( │ │ 456 │ │ │ │ │ "Cloning tutorial. This sometimes takes a minute..." │ │ 457 │ │ │ │ ): │ │ ❱ 458 │ │ │ │ │ Repo.clone_from( │ │ 459 │ │ │ │ │ │ TUTORIAL_REPO, │ │ 460 │ │ │ │ │ │ tmp_cloned_dir, │ │ 461 │ │ │ │ │ │ branch=f"release/{zenml_version}", │ │ │ │ /Users/ray4ever/Mirror/Research/mlops/.venv/lib/python3.10/site-packages/git/repo/base.py:1525 │ │ in clone_from │ │ │ │ 1522 │ │ git = cls.GitCommandWrapperType(os.getcwd()) │ │ 1523 │ │ if env is not None: │ │ 1524 │ │ │ git.update_environment(env) │ │ ❱ 1525 │ │ return cls._clone( │ │ 1526 │ │ │ git, │ │ 1527 │ │ │ url, │ │ 1528 │ │ │ to_path, │ │ │ │ /Users/ray4ever/Mirror/Research/mlops/.venv/lib/python3.10/site-packages/git/repo/base.py:1396 │ │ in _clone │ │ │ │ 1393 │ │ │ cmdline = remove_password_if_present(cmdline) │ │ 1394 │ │ │ │ │ 1395 │ │ │ _logger.debug("Cmd(%s)'s unused stdout: %s", cmdline, stdout) │ │ ❱ 1396 │ │ │ finalize_process(proc, stderr=stderr) │ │ 1397 │ │ │ │ 1398 │ │ # Our git command could have a different working dir than our actual │ │ 1399 │ │ # environment, hence we prepend its working dir if required. │ │ │ │ /Users/ray4ever/Mirror/Research/mlops/.venv/lib/python3.10/site-packages/git/util.py:504 in │ │ finalize_process │ │ │ │ 501 │ """Wait for the process (clone, fetch, pull or push) and handle its errors │ │ 502 │ accordingly.""" │ │ 503 │ # TODO: No close proc-streams?? │ │ ❱ 504 │ proc.wait(**kwargs) │ │ 505 │ │ 506 │ │ 507 @overload │ │ │ │ /Users/ray4ever/Mirror/Research/mlops/.venv/lib/python3.10/site-packages/git/cmd.py:834 in wait │ │ │ │ 831 │ │ │ if status != 0: │ │ 832 │ │ │ │ errstr = read_all_from_possibly_closed_stream(p_stderr) │ │ 833 │ │ │ │ _logger.debug("AutoInterrupt wait stderr: %r" % (errstr,)) │ │ ❱ 834 │ │ │ │ raise GitCommandError(remove_password_if_present(self.args), status, err │ │ 835 │ │ │ return status │ │ 836 │ │ │ 837 │ # END auto interrupt │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ GitCommandError: Cmd('git') failed due to: exit code(128) cmdline: git clone -v --branch=release/0.57.1 -- https://github.com/zenml-io/zenml /var/folders/xc/gx4cfh7n6jj6g6vsfwhwfhp80000gn/T/tmp3vzp8n1t/zenml_repo stderr: 'Cloning into '/var/folders/xc/gx4cfh7n6jj6g6vsfwhwfhp80000gn/T/tmp3vzp8n1t/zenml_repo'... POST git-upload-pack (412 bytes) POST git-upload-pack (gzip 19117 to 8961 bytes) error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: CANCEL (err 8) error: 4408 bytes of body are still expected fetch-pack: unexpected disconnect while reading sideband packet fatal: early EOF fatal: fetch-pack: invalid index-pack output

Reproduction steps

  1. zenml go

Relevant log output

No response

Code of Conduct

strickvl commented 3 months ago

Thanks for the report. This looks like a network issue, but we'll try to reproduce it on our end. I assume you've tried the command more than once and it fails every time?

raymundlin commented 3 months ago

It also failed zenml/examples/llm_finetuning

pip install -r requirements.txt Collecting lightning@ git+https://github.com/Lightning-AI/lightning@ed367ca675861cdf40dbad2e4d66f7eee2ec50af (from -r requirements.txt (line 3)) Cloning https://github.com/Lightning-AI/lightning (to revision ed367ca675861cdf40dbad2e4d66f7eee2ec50af) to /private/var/folders/xc/gx4cfh7n6jj6g6vsfwhwfhp80000gn/T/pip-install-gnv792cu/lightning_ae44e99e1c0248d087f677af8de8eede Running command git clone --filter=blob:none --quiet https://github.com/Lightning-AI/lightning /private/var/folders/xc/gx4cfh7n6jj6g6vsfwhwfhp80000gn/T/pip-install-gnv792cu/lightning_ae44e99e1c0248d087f677af8de8eede error: 3320 bytes of body are still expected fetch-pack: unexpected disconnect while reading sideband packet fatal: early EOF fatal: fetch-pack: invalid index-pack output fatal: could not fetch 664f3e8a89e755cc06f584de09595c06a68fd244 from promisor remote warning: Clone succeeded, but checkout failed. You can inspect what was checked out with 'git status' and retry with 'git restore --source=HEAD :/'

error: subprocess-exited-with-error

× git clone --filter=blob:none --quiet https://github.com/Lightning-AI/lightning /private/var/folders/xc/gx4cfh7n6jj6g6vsfwhwfhp80000gn/T/pip-install-gnv792cu/lightning_ae44e99e1c0248d087f677af8de8eede did not run successfully. │ exit code: 128 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error

× git clone --filter=blob:none --quiet https://github.com/Lightning-AI/lightning /private/var/folders/xc/gx4cfh7n6jj6g6vsfwhwfhp80000gn/T/pip-install-gnv792cu/lightning_ae44e99e1c0248d087f677af8de8eede did not run successfully. │ exit code: 128 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

raymundlin commented 3 months ago

Thanks for the report. This looks like a network issue, but we'll try to reproduce it on our end. I assume you've tried the command more than once and it fails every time?

Yes. It failed every time. However I can manually clone with

git clone -v --branch=release/0.57.1 -- [zenml-io/zenml](https://github.com/zenml-io/zenml)

Just skip the strange /var/folders thing at the end.

SInce this happens everywhere, making my testing very difficult.

raymundlin commented 3 months ago

I suspect that my new network router is causing the problem. I tested it again with WIFI, and the problem magically disappeared. I don't know how a router can impede the cloning, but I am closing the issue now.

strickvl commented 3 months ago

Actually I was able to reproduce it @raymundlin. Maybe we both have slow internet :) I made a fix which only clones the specific branch instead of all branches on the repo etc, which made it work for me.

If you wanted to test it out, you could do:

git clone -b bugfix/zenml-go-depth https://github.com/zenml-io/zenml.git
cd zenml
pip install .

Which then would be the editable install for the bugfix branch I just created. I'm pretty sure it'd work on your previous internet settings with my change. In any case, thank you for the report and it'll be part of the next release!

Also note that the zenml/examples/llm_finetuning template has just been updated so the very latest version (maybe it'll reach there this week or early next week) is quite different from whatever you're currently looking at. Just FYI.

raymundlin commented 3 months ago

Actually I was able to reproduce it @raymundlin. Maybe we both have slow internet :) I made a fix which only clones the specific branch instead of all branches on the repo etc, which made it work for me.

If you wanted to test it out, you could do:

git clone -b bugfix/zenml-go-depth https://github.com/zenml-io/zenml.git
cd zenml
pip install .

Which then would be the editable install for the bugfix branch I just created. I'm pretty sure it'd work on your previous internet settings with my change. In any case, thank you for the report and it'll be part of the next release!

Also note that the zenml/examples/llm_finetuning template has just been updated so the very latest version (maybe it'll reach there this week or early next week) is quite different from whatever you're currently looking at. Just FYI.

Thank you so much for your help and tips! I will give it another try tomorrow. Awesome!