ploomber / projects

Sample projects using Ploomber.
Apache License 2.0
83 stars 25 forks source link

ml-online example error #7

Closed liguodongiot closed 3 years ago

liguodongiot commented 3 years ago

Hi, when I installed miniconda, and then executed as follows :

cd projects/ml-online/

conda env create --file environment.yml --force

conda env export --no-build --file environment.lock.yml

conda activate ml-online

cp soopervisor.yaml soopervisor.yaml.bak

soopervisor add training --backend argo-workflows

When I execute the soopervisor add training --backend argo-workflows command, an error occurs:

Traceback (most recent call last): File "/home/guodong/miniconda3/envs/ml-online/lib/python3.8/site-packages/ploomber/spec/dagspec.py", line 475, in _auto_load spec = cls(path, env=env, lazy_import=lazy_import, reload=reload) File "/home/guodong/miniconda3/envs/ml-online/lib/python3.8/site-packages/ploomber/spec/dagspec.py", line 271, in init Meta.initialize_inplace(self.data) File "/home/guodong/miniconda3/envs/ml-online/lib/python3.8/site-packages/ploomber/spec/dagspec.py", line 618, in initialize_inplace data['meta'] = Meta.default_meta(data['meta']) File "/home/guodong/miniconda3/envs/ml-online/lib/python3.8/site-packages/ploomber/spec/dagspec.py", line 651, in default_meta meta['source_loader'] = SourceLoader(meta['source_loader']) File "/home/guodong/miniconda3/envs/ml-online/lib/python3.8/site-packages/ploomber/placeholders/SourceLoader.py", line 65, in init raise ValueError('Could not locate module "{}"'.format(module)) ValueError: Error initializing SourceLoader with {'module': 'ml_online'}. Error message: Could not locate module "ml_online" The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/guodong/miniconda3/envs/ml-online/bin/soopervisor", line 8, in sys.exit(cli()) File "/home/guodong/miniconda3/envs/ml-online/lib/python3.8/site-packages/click/core.py", line 1137, in call return self.main(args, kwargs) File "/home/guodong/miniconda3/envs/ml-online/lib/python3.8/site-packages/click/core.py", line 1062, in main rv = self.invoke(ctx) File "/home/guodong/miniconda3/envs/ml-online/lib/python3.8/site-packages/click/core.py", line 1668, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/guodong/miniconda3/envs/ml-online/lib/python3.8/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/guodong/miniconda3/envs/ml-online/lib/python3.8/site-packages/click/core.py", line 763, in invoke return __callback(args, kwargs) File "/home/guodong/miniconda3/envs/ml-online/lib/python3.8/site-packages/soopervisor/cli.py", line 52, in add Exporter('soopervisor.yaml', env_name=name).add() File "/home/guodong/miniconda3/envs/ml-online/lib/python3.8/site-packages/soopervisor/abc.py", line 120, in init self._dag = DAGSpec.find(lazy_import=True).todag().render( File "/home/guodong/miniconda3/envs/ml-online/lib/python3.8/site-packages/ploomber/spec/dagspec.py", line 523, in find spec, = DAGSpec._auto_load(to_dag=False, File "/home/guodong/miniconda3/envs/ml-online/lib/python3.8/site-packages/ploomber/spec/dagspec.py", line 485, in _auto_load raise exc from e ploomber.exceptions.DAGSpecInitializationError: Error initializing DAG from /home/guodong/projects/ml-online/src/ml_online/pipeline.yaml

edublancas commented 3 years ago

Looks like you're missing installing the local package:

# run this in the ml-online folder
pip install --editable .

Let me know if that works.

liguodongiot commented 3 years ago

Thanks, When the pip install --editable . command is executed, this command can run normally.

soopervisor add training --backend argo-workflows
================================== Adding /home/guodong/projects/ml-online/training/Dockerfile... ===================================
=============================================================== Done ================================================================

but When I execute the soopervisor export training command, an error occurs:

soopervisor export training
/home/guodong/miniconda3/envs/ml-online/lib/python3.8/site-packages/ploomber/spec/dagspec.py:312: UserWarning: The following placeholders are declared in the environment but unused in the spec: {'product_root'}
  warnings.warn('The following placeholders are declared in the '
/home/guodong/miniconda3/envs/ml-online/lib/python3.8/site-packages/ploomber/spec/dagspec.py:312: UserWarning: The following placeholders are declared in the environment but unused in the spec: {'product_root'}
  warnings.warn('The following placeholders are declared in the '
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 12671.61it/s]
============================================== Packaging code: python -m build --sdist ==============================================
/home/guodong/miniconda3/envs/ml-online/bin/python: No module named build
Error: An error occurred when executing command: python -m build --sdist
Original error message: Command '('python', '-m', 'build', '--sdist')' returned non-zero exit status 1.

I found that executing the python setup.py build sdist command alone can be successful, but when executing the python -m build --sdist command alone, an error occurs.

what should I do?

edublancas commented 3 years ago

Ah, good catch! Looks like I missed adding that dependency!

Try this:

pip install build

That should make python -m build --sdist work.

liguodongiot commented 3 years ago

thank you very much。

When I continued to execute the soopervisor export training command, an error occurred during the process of building the image,what should I do?

/home/guodong/miniconda3/envs/ml-online/lib/python3.8/site-packages/ploomber/spec/dagspec.py:312: UserWarning: The following placeholders are declared in the environment but unused in the spec: {'product_root'}
  warnings.warn('The following placeholders are declared in the '
/home/guodong/miniconda3/envs/ml-online/lib/python3.8/site-packages/ploomber/spec/dagspec.py:312: UserWarning: The following placeholders are declared in the environment but unused in the spec: {'product_root'}
  warnings.warn('The following placeholders are declared in the '
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 11935.98it/s]
============================================== Packaging code: python -m build --sdist ==============================================
Found existing installation: setuptools 56.0.0
Uninstalling setuptools-56.0.0:
  Successfully uninstalled setuptools-56.0.0
Collecting wheel
  Using cached wheel-0.36.2-py2.py3-none-any.whl (35 kB)
Collecting setuptools>=40.8.0
  Using cached setuptools-57.1.0-py3-none-any.whl (818 kB)
Installing collected packages: wheel, setuptools
Successfully installed setuptools-57.1.0 wheel-0.36.2
WARNING: You are using pip version 21.1.1; however, version 21.1.3 is available.
You should consider upgrading via the '/tmp/build-env-87r_7tpv/bin/python -m pip install --upgrade pip' command.
/tmp/build-env-87r_7tpv/lib/python3.8/site-packages/setuptools/dist.py:484: UserWarning: Normalizing '0.1dev' to '0.1.dev0'
  warnings.warn(tmpl.format(**locals()))
running egg_info
writing src/ml_online.egg-info/PKG-INFO
writing dependency_links to src/ml_online.egg-info/dependency_links.txt
writing requirements to src/ml_online.egg-info/requires.txt
writing top-level names to src/ml_online.egg-info/top_level.txt
reading manifest file 'src/ml_online.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'src/ml_online.egg-info/SOURCES.txt'
/tmp/build-env-87r_7tpv/lib/python3.8/site-packages/setuptools/dist.py:484: UserWarning: Normalizing '0.1dev' to '0.1.dev0'
  warnings.warn(tmpl.format(**locals()))
running sdist
running egg_info
writing src/ml_online.egg-info/PKG-INFO
writing dependency_links to src/ml_online.egg-info/dependency_links.txt
writing requirements to src/ml_online.egg-info/requires.txt
writing top-level names to src/ml_online.egg-info/top_level.txt
reading manifest file 'src/ml_online.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'src/ml_online.egg-info/SOURCES.txt'
running check
warning: check: missing required meta-data: url

warning: check: missing meta-data: either (author and author_email) or (maintainer and maintainer_email) must be supplied

creating ml_online-0.1.dev0
creating ml_online-0.1.dev0/src
creating ml_online-0.1.dev0/src/ml_online
creating ml_online-0.1.dev0/src/ml_online.egg-info
creating ml_online-0.1.dev0/src/ml_online/notebooks
creating ml_online-0.1.dev0/src/ml_online/tasks
copying files to ml_online-0.1.dev0...
copying MANIFEST.in -> ml_online-0.1.dev0
copying README.md -> ml_online-0.1.dev0
copying setup.py -> ml_online-0.1.dev0
copying src/ml_online/__init__.py -> ml_online-0.1.dev0/src/ml_online
copying src/ml_online/env.yaml -> ml_online-0.1.dev0/src/ml_online
copying src/ml_online/infer.py -> ml_online-0.1.dev0/src/ml_online
copying src/ml_online/io.py -> ml_online-0.1.dev0/src/ml_online
copying src/ml_online/model.pickle -> ml_online-0.1.dev0/src/ml_online
copying src/ml_online/pipeline-features.yaml -> ml_online-0.1.dev0/src/ml_online
copying src/ml_online/pipeline.yaml -> ml_online-0.1.dev0/src/ml_online
copying src/ml_online/service.py -> ml_online-0.1.dev0/src/ml_online
copying src/ml_online.egg-info/PKG-INFO -> ml_online-0.1.dev0/src/ml_online.egg-info
copying src/ml_online.egg-info/SOURCES.txt -> ml_online-0.1.dev0/src/ml_online.egg-info
copying src/ml_online.egg-info/dependency_links.txt -> ml_online-0.1.dev0/src/ml_online.egg-info
copying src/ml_online.egg-info/requires.txt -> ml_online-0.1.dev0/src/ml_online.egg-info
copying src/ml_online.egg-info/top_level.txt -> ml_online-0.1.dev0/src/ml_online.egg-info
copying src/ml_online/notebooks/__init__.py -> ml_online-0.1.dev0/src/ml_online/notebooks
copying src/ml_online/notebooks/fit.py -> ml_online-0.1.dev0/src/ml_online/notebooks
copying src/ml_online/tasks/__init__.py -> ml_online-0.1.dev0/src/ml_online/tasks
copying src/ml_online/tasks/features.py -> ml_online-0.1.dev0/src/ml_online/tasks
copying src/ml_online/tasks/raw.py -> ml_online-0.1.dev0/src/ml_online/tasks
Writing ml_online-0.1.dev0/setup.cfg
Creating tar archive
removing 'ml_online-0.1.dev0' (and everything under it)
======================================= Building image: docker build . --tag ml_online:0.1dev =======================================
Sending build context to Docker daemon  15.87kB
Step 1/7 : FROM condaforge/mambaforge:4.10.1-0
 ---> 05e3542d3437
Step 2/7 : COPY environment.lock.yml project/environment.lock.yml
 ---> Using cache
 ---> 7b7a0089698b
Step 3/7 : RUN mamba env update --name base --file project/environment.lock.yml && conda clean --all --force-pkgs-dir --yes
 ---> Running in 4d1e9dc0a309
anaconda/pkgs/main/linux  

# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<

    Traceback (most recent call last):
      File "/opt/conda/lib/python3.9/site-packages/conda/exceptions.py", line 1079, in __call__
        return func(*args, **kwargs)
      File "/opt/conda/lib/python3.9/site-packages/conda_env/cli/main.py", line 80, in do_call
        exit_code = getattr(module, func_name)(args, parser)
      File "/opt/conda/lib/python3.9/site-packages/conda_env/cli/main_update.py", line 123, in execute
        result[installer_type] = installer.install(prefix, specs, args, env)
      File "/opt/conda/lib/python3.9/site-packages/mamba/mamba_env.py", line 45, in mamba_install
        index = load_channels(
      File "/opt/conda/lib/python3.9/site-packages/mamba/utils.py", line 93, in load_channels
        index = get_index(
      File "/opt/conda/lib/python3.9/site-packages/mamba/utils.py", line 74, in get_index
        is_downloaded = dlist.download(True)
    RuntimeError: Download error (28) Timeout was reached [https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/noarch/repodata.json]
    Failed to connect to mirrors.tuna.tsinghua.edu.cn port 443: Connection timed out

`$ /opt/conda/bin/mamba update --name base --file project/environment.lock.yml`

  environment variables:
                 CIO_TEST=<not set>
  CONDA_AUTO_UPDATE_CONDA=false
                CONDA_DIR=/opt/conda
               CONDA_ROOT=/opt/conda
           CURL_CA_BUNDLE=<not set>
                     PATH=/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin
                          :/bin
       REQUESTS_CA_BUNDLE=<not set>
            SSL_CERT_FILE=<not set>

     active environment : None
       user config file : /root/.condarc
 populated config files : /opt/conda/.condarc
          conda version : 4.10.1
    conda-build version : not installed
         python version : 3.9.2.final.0
       virtual packages : __linux=3.10.0=0
                          __glibc=2.31=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /opt/conda  (writable)
      conda av data dir : /opt/conda/etc/conda
  conda av metadata url : https://repo.anaconda.com/pkgs/main
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /opt/conda/pkgs
                          /root/.conda/pkgs
       envs directories : /opt/conda/envs
                          /root/.conda/envs
               platform : linux-64
             user-agent : conda/4.10.1 requests/2.25.1 CPython/3.9.2 Linux/3.10.0-957.27.2.el7.x86_64 ubuntu/20.04.1 glibc/2.31
                UID:GID : 0:0
             netrc file : None
           offline mode : False

An unexpected error has occurred. Conda has prepared the above report.

The command '/bin/sh -c mamba env update --name base --file project/environment.lock.yml && conda clean --all --force-pkgs-dir --yes' returned a non-zero code: 1
Error: An error occurred when executing command: docker build . --tag ml_online:0.1dev
Original error message: Command '('docker', 'build', '.', '--tag', 'ml_online:0.1dev')' returned non-zero exit status 1.
edublancas commented 3 years ago

There's some issue with your network:

    RuntimeError: Download error (28) Timeout was reached [https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/noarch/repodata.json]
    Failed to connect to mirrors.tuna.tsinghua.edu.cn port 443: Connection timed out

There's little I can do to help. Just ensure that conda works. Try installing some virtual environment, see if you get the same error. Otherwise, try with a different network.

liguodongiot commented 3 years ago

Thanks, I switched a server host and slove it, an error occurred during the process of Testing File client,How should I troubleshoot the problem?

 ==================================== Testing image: docker run ml_online:0.1dev ploomber status =====================================
100%|██████████| 5/5 [00:00<00:00, 10280.16it/s]
name        Last run      Outdated?    Product       Doc (short)    Location
----------  ------------  -----------  ------------  -------------  ------------
get         Has not been  Source code  File('/mnt/s  Get training   /opt/conda/l
            run                        hared-folde.  data           ib/python3.8
                                       ..ts/raw/get                 /site-packag
                                       .parquet')                   es/ml_online
                                                                    /tasks/raw.p
                                                                    y:5
sepal-area  Has not been  Source code  File('/mnt/s  Compute        /opt/conda/l
            run           & Upstream   hared-folde.  sepal area     ib/python3.8
                                       ..sepal_area                 /site-packag
                                       .parquet')                   es/ml_online
                                                                    /tasks/featu
                                                                    res.py:4
petal-area  Has not been  Source code  File('/mnt/s  Compute        /opt/conda/l
            run           & Upstream   hared-folde.  petal area     ib/python3.8
                                       ..petal_area                 /site-packag
                                       .parquet')                   es/ml_online
                                                                    /tasks/featu
                                                                    res.py:13
features    Has not been  Source code  File('/mnt/s  Join raw       /opt/conda/l
            run           & Upstream   hared-folde.  data with      ib/python3.8
                                       ..s/features  generated      /site-packag
                                       .parquet')    features       es/ml_online
                                                                    /tasks/featu
                                                                    res.py:22
fit         Has not been  Source code  MetaProduct(  Script         /opt/conda/l
            run           & Upstream   {'model': Fi  trains a       ib/python3.8
                                       le('/mnt/sh.  model          /site-packag
                                       ..model.pick                 es/ml_online
                                       le'), 'nb':                  /notebooks/f
                                       File('/mnt/s                 it.py
                                       h.../report.
                                       html')})
======================================================== Testing File client ========================================================
Error: Missing File client
Hint: Run "docker run -it ml_online:0.1dev /bin/bash" to to debug your image. Ensure a File client is configured.

cat env.yaml

sample: False
product_root: /mnt/shared-folder

cat soopervisor.yaml

training:
  backend: argo-workflows
  repository: null
  mounted_volumes:
    - name: shared-folder
      spec:
        nfs:
          server: 10.xxx.xxx.19
          path: /home/data/nfs
edublancas commented 3 years ago

Ah, sorry about that! I introduced that recently and forgot to update the example, try this:

soopervisor export training --skip-tests
liguodongiot commented 3 years ago

Thank you for your patient guidance,this example has been able to run normally。


argo submit -n argo training/argo.yaml

Name: ml-online-t6ng9 Namespace: argo ServiceAccount: default Status: Pending Created: Wed Jul 14 08:50:50 +0800 (now) Progress:

This workflow does not have security context set. You can run your workflow pods more securely by setting it. Learn more at https://argoproj.github.io/argo-workflows/workflow-pod-security-context/

argo logs -n argo ml-online-t6ng9

100%|██████████| 5/5 [00:00<00:00, 11155.06it/s] 100%|██████████| 5/5 [00:00<00:00, 10597.03it/s] 100%|██████████| 5/5 [00:00<00:00, 11293.23it/s] 100%|██████████| 5/5 [00:00<00:00, 11125.47it/s] 100%|██████████| 5/5 [00:00<00:00, 10527.87it/s] Executing: 100%|██████████| 11/11 [00:05<00:00, 2.18cell/s]

NFS: tree products

products ├── features │   ├── features.parquet │   ├── petal_area.parquet │   └── sepal_area.parquet ├── model.pickle ├── raw │   └── get.parquet └── report.html

I have two more questions:

  1. This example only has the training pipeline. Currently, no prediction pipeline is found on the official website to deploy to kubernetes. Is there currently support for deploying inference services in kubernetes?
  2. In the soopervisor.yaml file, does the storage mount mounted_volumes support PV/PVC? If so, what should I do?
liguodongiot commented 3 years ago
  1. In the soopervisor.yaml file, does the storage mount mounted_volumes support PV/PVC? If so, what should I do?

I have solved the second problem, from the google cloud (https://soopervisor.readthedocs.io/en/latest/tutorials/kubernetes.html#optional-mounting-a-shared-disk ) document.

edublancas commented 3 years ago

Glad you were able to solve the PV/PVC thing.

Regarding deployment. The example you have been working on (ml-online), contains the code to export an online API via Flask, but we haven't written the example code to deploy that to Kubernetes. If you are willing to do so, I'd be happy to incorporate some code of yours into this repository. I think it'd be great to show how to take the sample Flask app and deploy it in Kubernetes.

There is another example (ml-intermediate) that shows batch deployment. Since the batch deployment pipeline is similar to the training pipeline, the steps for running it on Kubernetes are pretty much the same, although we haven't written a tutorial yet.

If you haven't done so already, please show your support with a star on our main repository. Thanks!

P.S. thanks for working on running the example, I updated the docs with some of your observations!

liguodongiot commented 3 years ago

Thanks for your answer. Recently, I am investigating related frameworks of mlops, and the work tasks are relatively heavy. I will try to write examples and deploy to k8s in my spare time.

edublancas commented 3 years ago

Awesome! Feel free to reach out if you have more questions. I'll close this issue.