Closed liguodongiot closed 3 years ago
Looks like you're missing installing the local package:
# run this in the ml-online folder
pip install --editable .
Let me know if that works.
Thanks, When the pip install --editable .
command is executed, this command can run normally.
soopervisor add training --backend argo-workflows
================================== Adding /home/guodong/projects/ml-online/training/Dockerfile... ===================================
=============================================================== Done ================================================================
but When I execute the soopervisor export training
command, an error occurs:
soopervisor export training
/home/guodong/miniconda3/envs/ml-online/lib/python3.8/site-packages/ploomber/spec/dagspec.py:312: UserWarning: The following placeholders are declared in the environment but unused in the spec: {'product_root'}
warnings.warn('The following placeholders are declared in the '
/home/guodong/miniconda3/envs/ml-online/lib/python3.8/site-packages/ploomber/spec/dagspec.py:312: UserWarning: The following placeholders are declared in the environment but unused in the spec: {'product_root'}
warnings.warn('The following placeholders are declared in the '
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 12671.61it/s]
============================================== Packaging code: python -m build --sdist ==============================================
/home/guodong/miniconda3/envs/ml-online/bin/python: No module named build
Error: An error occurred when executing command: python -m build --sdist
Original error message: Command '('python', '-m', 'build', '--sdist')' returned non-zero exit status 1.
I found that executing the python setup.py build sdist
command alone can be successful, but when executing the python -m build --sdist
command alone, an error occurs.
what should I do?
Ah, good catch! Looks like I missed adding that dependency!
Try this:
pip install build
That should make python -m build --sdist
work.
thank you very much。
When I continued to execute the soopervisor export training
command, an error occurred during the process of building the image,what should I do?
/home/guodong/miniconda3/envs/ml-online/lib/python3.8/site-packages/ploomber/spec/dagspec.py:312: UserWarning: The following placeholders are declared in the environment but unused in the spec: {'product_root'}
warnings.warn('The following placeholders are declared in the '
/home/guodong/miniconda3/envs/ml-online/lib/python3.8/site-packages/ploomber/spec/dagspec.py:312: UserWarning: The following placeholders are declared in the environment but unused in the spec: {'product_root'}
warnings.warn('The following placeholders are declared in the '
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 11935.98it/s]
============================================== Packaging code: python -m build --sdist ==============================================
Found existing installation: setuptools 56.0.0
Uninstalling setuptools-56.0.0:
Successfully uninstalled setuptools-56.0.0
Collecting wheel
Using cached wheel-0.36.2-py2.py3-none-any.whl (35 kB)
Collecting setuptools>=40.8.0
Using cached setuptools-57.1.0-py3-none-any.whl (818 kB)
Installing collected packages: wheel, setuptools
Successfully installed setuptools-57.1.0 wheel-0.36.2
WARNING: You are using pip version 21.1.1; however, version 21.1.3 is available.
You should consider upgrading via the '/tmp/build-env-87r_7tpv/bin/python -m pip install --upgrade pip' command.
/tmp/build-env-87r_7tpv/lib/python3.8/site-packages/setuptools/dist.py:484: UserWarning: Normalizing '0.1dev' to '0.1.dev0'
warnings.warn(tmpl.format(**locals()))
running egg_info
writing src/ml_online.egg-info/PKG-INFO
writing dependency_links to src/ml_online.egg-info/dependency_links.txt
writing requirements to src/ml_online.egg-info/requires.txt
writing top-level names to src/ml_online.egg-info/top_level.txt
reading manifest file 'src/ml_online.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'src/ml_online.egg-info/SOURCES.txt'
/tmp/build-env-87r_7tpv/lib/python3.8/site-packages/setuptools/dist.py:484: UserWarning: Normalizing '0.1dev' to '0.1.dev0'
warnings.warn(tmpl.format(**locals()))
running sdist
running egg_info
writing src/ml_online.egg-info/PKG-INFO
writing dependency_links to src/ml_online.egg-info/dependency_links.txt
writing requirements to src/ml_online.egg-info/requires.txt
writing top-level names to src/ml_online.egg-info/top_level.txt
reading manifest file 'src/ml_online.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'src/ml_online.egg-info/SOURCES.txt'
running check
warning: check: missing required meta-data: url
warning: check: missing meta-data: either (author and author_email) or (maintainer and maintainer_email) must be supplied
creating ml_online-0.1.dev0
creating ml_online-0.1.dev0/src
creating ml_online-0.1.dev0/src/ml_online
creating ml_online-0.1.dev0/src/ml_online.egg-info
creating ml_online-0.1.dev0/src/ml_online/notebooks
creating ml_online-0.1.dev0/src/ml_online/tasks
copying files to ml_online-0.1.dev0...
copying MANIFEST.in -> ml_online-0.1.dev0
copying README.md -> ml_online-0.1.dev0
copying setup.py -> ml_online-0.1.dev0
copying src/ml_online/__init__.py -> ml_online-0.1.dev0/src/ml_online
copying src/ml_online/env.yaml -> ml_online-0.1.dev0/src/ml_online
copying src/ml_online/infer.py -> ml_online-0.1.dev0/src/ml_online
copying src/ml_online/io.py -> ml_online-0.1.dev0/src/ml_online
copying src/ml_online/model.pickle -> ml_online-0.1.dev0/src/ml_online
copying src/ml_online/pipeline-features.yaml -> ml_online-0.1.dev0/src/ml_online
copying src/ml_online/pipeline.yaml -> ml_online-0.1.dev0/src/ml_online
copying src/ml_online/service.py -> ml_online-0.1.dev0/src/ml_online
copying src/ml_online.egg-info/PKG-INFO -> ml_online-0.1.dev0/src/ml_online.egg-info
copying src/ml_online.egg-info/SOURCES.txt -> ml_online-0.1.dev0/src/ml_online.egg-info
copying src/ml_online.egg-info/dependency_links.txt -> ml_online-0.1.dev0/src/ml_online.egg-info
copying src/ml_online.egg-info/requires.txt -> ml_online-0.1.dev0/src/ml_online.egg-info
copying src/ml_online.egg-info/top_level.txt -> ml_online-0.1.dev0/src/ml_online.egg-info
copying src/ml_online/notebooks/__init__.py -> ml_online-0.1.dev0/src/ml_online/notebooks
copying src/ml_online/notebooks/fit.py -> ml_online-0.1.dev0/src/ml_online/notebooks
copying src/ml_online/tasks/__init__.py -> ml_online-0.1.dev0/src/ml_online/tasks
copying src/ml_online/tasks/features.py -> ml_online-0.1.dev0/src/ml_online/tasks
copying src/ml_online/tasks/raw.py -> ml_online-0.1.dev0/src/ml_online/tasks
Writing ml_online-0.1.dev0/setup.cfg
Creating tar archive
removing 'ml_online-0.1.dev0' (and everything under it)
======================================= Building image: docker build . --tag ml_online:0.1dev =======================================
Sending build context to Docker daemon 15.87kB
Step 1/7 : FROM condaforge/mambaforge:4.10.1-0
---> 05e3542d3437
Step 2/7 : COPY environment.lock.yml project/environment.lock.yml
---> Using cache
---> 7b7a0089698b
Step 3/7 : RUN mamba env update --name base --file project/environment.lock.yml && conda clean --all --force-pkgs-dir --yes
---> Running in 4d1e9dc0a309
anaconda/pkgs/main/linux
# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/conda/exceptions.py", line 1079, in __call__
return func(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/conda_env/cli/main.py", line 80, in do_call
exit_code = getattr(module, func_name)(args, parser)
File "/opt/conda/lib/python3.9/site-packages/conda_env/cli/main_update.py", line 123, in execute
result[installer_type] = installer.install(prefix, specs, args, env)
File "/opt/conda/lib/python3.9/site-packages/mamba/mamba_env.py", line 45, in mamba_install
index = load_channels(
File "/opt/conda/lib/python3.9/site-packages/mamba/utils.py", line 93, in load_channels
index = get_index(
File "/opt/conda/lib/python3.9/site-packages/mamba/utils.py", line 74, in get_index
is_downloaded = dlist.download(True)
RuntimeError: Download error (28) Timeout was reached [https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/noarch/repodata.json]
Failed to connect to mirrors.tuna.tsinghua.edu.cn port 443: Connection timed out
`$ /opt/conda/bin/mamba update --name base --file project/environment.lock.yml`
environment variables:
CIO_TEST=<not set>
CONDA_AUTO_UPDATE_CONDA=false
CONDA_DIR=/opt/conda
CONDA_ROOT=/opt/conda
CURL_CA_BUNDLE=<not set>
PATH=/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin
:/bin
REQUESTS_CA_BUNDLE=<not set>
SSL_CERT_FILE=<not set>
active environment : None
user config file : /root/.condarc
populated config files : /opt/conda/.condarc
conda version : 4.10.1
conda-build version : not installed
python version : 3.9.2.final.0
virtual packages : __linux=3.10.0=0
__glibc=2.31=0
__unix=0=0
__archspec=1=x86_64
base environment : /opt/conda (writable)
conda av data dir : /opt/conda/etc/conda
conda av metadata url : https://repo.anaconda.com/pkgs/main
channel URLs : https://conda.anaconda.org/conda-forge/linux-64
https://conda.anaconda.org/conda-forge/noarch
package cache : /opt/conda/pkgs
/root/.conda/pkgs
envs directories : /opt/conda/envs
/root/.conda/envs
platform : linux-64
user-agent : conda/4.10.1 requests/2.25.1 CPython/3.9.2 Linux/3.10.0-957.27.2.el7.x86_64 ubuntu/20.04.1 glibc/2.31
UID:GID : 0:0
netrc file : None
offline mode : False
An unexpected error has occurred. Conda has prepared the above report.
The command '/bin/sh -c mamba env update --name base --file project/environment.lock.yml && conda clean --all --force-pkgs-dir --yes' returned a non-zero code: 1
Error: An error occurred when executing command: docker build . --tag ml_online:0.1dev
Original error message: Command '('docker', 'build', '.', '--tag', 'ml_online:0.1dev')' returned non-zero exit status 1.
There's some issue with your network:
RuntimeError: Download error (28) Timeout was reached [https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/noarch/repodata.json]
Failed to connect to mirrors.tuna.tsinghua.edu.cn port 443: Connection timed out
There's little I can do to help. Just ensure that conda works. Try installing some virtual environment, see if you get the same error. Otherwise, try with a different network.
Thanks, I switched a server host and slove it, an error occurred during the process of Testing File client,How should I troubleshoot the problem?
==================================== Testing image: docker run ml_online:0.1dev ploomber status =====================================
100%|██████████| 5/5 [00:00<00:00, 10280.16it/s]
name Last run Outdated? Product Doc (short) Location
---------- ------------ ----------- ------------ ------------- ------------
get Has not been Source code File('/mnt/s Get training /opt/conda/l
run hared-folde. data ib/python3.8
..ts/raw/get /site-packag
.parquet') es/ml_online
/tasks/raw.p
y:5
sepal-area Has not been Source code File('/mnt/s Compute /opt/conda/l
run & Upstream hared-folde. sepal area ib/python3.8
..sepal_area /site-packag
.parquet') es/ml_online
/tasks/featu
res.py:4
petal-area Has not been Source code File('/mnt/s Compute /opt/conda/l
run & Upstream hared-folde. petal area ib/python3.8
..petal_area /site-packag
.parquet') es/ml_online
/tasks/featu
res.py:13
features Has not been Source code File('/mnt/s Join raw /opt/conda/l
run & Upstream hared-folde. data with ib/python3.8
..s/features generated /site-packag
.parquet') features es/ml_online
/tasks/featu
res.py:22
fit Has not been Source code MetaProduct( Script /opt/conda/l
run & Upstream {'model': Fi trains a ib/python3.8
le('/mnt/sh. model /site-packag
..model.pick es/ml_online
le'), 'nb': /notebooks/f
File('/mnt/s it.py
h.../report.
html')})
======================================================== Testing File client ========================================================
Error: Missing File client
Hint: Run "docker run -it ml_online:0.1dev /bin/bash" to to debug your image. Ensure a File client is configured.
cat env.yaml
sample: False
product_root: /mnt/shared-folder
cat soopervisor.yaml
training:
backend: argo-workflows
repository: null
mounted_volumes:
- name: shared-folder
spec:
nfs:
server: 10.xxx.xxx.19
path: /home/data/nfs
Ah, sorry about that! I introduced that recently and forgot to update the example, try this:
soopervisor export training --skip-tests
Thank you for your patient guidance,this example has been able to run normally。
argo submit -n argo training/argo.yaml
Name: ml-online-t6ng9 Namespace: argo ServiceAccount: default Status: Pending Created: Wed Jul 14 08:50:50 +0800 (now) Progress:
This workflow does not have security context set. You can run your workflow pods more securely by setting it. Learn more at https://argoproj.github.io/argo-workflows/workflow-pod-security-context/
argo logs -n argo ml-online-t6ng9
100%|██████████| 5/5 [00:00<00:00, 11155.06it/s] 100%|██████████| 5/5 [00:00<00:00, 10597.03it/s] 100%|██████████| 5/5 [00:00<00:00, 11293.23it/s] 100%|██████████| 5/5 [00:00<00:00, 11125.47it/s] 100%|██████████| 5/5 [00:00<00:00, 10527.87it/s] Executing: 100%|██████████| 11/11 [00:05<00:00, 2.18cell/s]
NFS: tree products
products ├── features │ ├── features.parquet │ ├── petal_area.parquet │ └── sepal_area.parquet ├── model.pickle ├── raw │ └── get.parquet └── report.html
I have two more questions:
mounted_volumes
support PV/PVC? If so, what should I do?
- In the soopervisor.yaml file, does the storage mount
mounted_volumes
support PV/PVC? If so, what should I do?
I have solved the second problem, from the google cloud (https://soopervisor.readthedocs.io/en/latest/tutorials/kubernetes.html#optional-mounting-a-shared-disk
) document.
Glad you were able to solve the PV/PVC thing.
Regarding deployment. The example you have been working on (ml-online), contains the code to export an online API via Flask, but we haven't written the example code to deploy that to Kubernetes. If you are willing to do so, I'd be happy to incorporate some code of yours into this repository. I think it'd be great to show how to take the sample Flask app and deploy it in Kubernetes.
There is another example (ml-intermediate) that shows batch deployment. Since the batch deployment pipeline is similar to the training pipeline, the steps for running it on Kubernetes are pretty much the same, although we haven't written a tutorial yet.
If you haven't done so already, please show your support with a star on our main repository. Thanks!
P.S. thanks for working on running the example, I updated the docs with some of your observations!
Thanks for your answer. Recently, I am investigating related frameworks of mlops, and the work tasks are relatively heavy. I will try to write examples and deploy to k8s in my spare time.
Awesome! Feel free to reach out if you have more questions. I'll close this issue.
Hi, when I installed miniconda, and then executed as follows :
When I execute the
soopervisor add training --backend argo-workflows
command, an error occurs: