thoth-station / integration-tests

Integration tests for the Thoth project to make sure deployment works as expected
GNU General Public License v3.0
4 stars 12 forks source link

Adviser tests are failing because of allocated CPU time exceeded #266

Open mayaCostantini opened 2 years ago

mayaCostantini commented 2 years ago

Describe the bug Tests for the thamos_advise feature are producing the following error in stage:

ERROR    thoth.adviser.run:155: Resolver was killed as allocated CPU time was exceeded - https://thoth-station.ninja/j/cpu_time_exceeded

To Reproduce Steps to reproduce the behavior: See last integration tests report for stage environment.

Expected behavior Tests complete successfully.

mayaCostantini commented 2 years ago

/priority critical-urgent

mayaCostantini commented 2 years ago

/kind bug

fridex commented 2 years ago

To test the resolver is such cases, I try to create a lock file using Pipenv and submit an advise with the lock file as generated by Pipenv. In that case, resolver reports why it removes packages Pipenv resolved:

it might be a good idea to experiment with requirements (and possibly constraints as well) to narrow down to the issue one wants to debug. An example can be a failure when adviser was not able to find a resolution that would satisfy requirements. In such a case, it might be good to generate a lock file with expected pinned set of packages using other tools (e.g. Pipenv, pip-tools) and submit the lock file to the recommender system. The logs produced during the resolution and stack level justifications might give hints why the given resolution was rejected.

See docs.

fridex commented 2 years ago

/sig stack-guidance /priority critical-urgent

fridex commented 2 years ago

Failing tests:

Failure:

2022-03-07 15:24:20,728  23 INFO     thoth.adviser.resolver:1175: Scoring user's stack - see https://thoth-station.ninja/j/user_stack
2022-03-07 15:24:20,733  23 INFO     thoth.adviser.resolver:612: Scoring user's stack based on the lock file submitted - see https://thoth-station.ninja/j/user_stack
2022-03-07 15:24:21,467  23 WARNING  thoth.adviser.sieves.solved:127: Removing package ('jupyter-tensorboard', '0.2.0', 'https://pypi.org/simple') due to installation time error in the software environment - see https://thoth-statio
n.ninja/j/install_error
2022-03-07 15:24:21,468  23 INFO     thoth.adviser.resolver:624: User's stack was removed based on sieves - see https://thoth-station.ninja/j/rm_user_stack

Failure:

2022-03-07 15:20:19,411  22 INFO     thoth.adviser.resolver:612: Scoring user's stack based on the lock file submitted - see https://thoth-station.ninja/j/user_stack
2022-03-07 15:20:20,288  22 WARNING  thoth.adviser.sieves.solved:127: Removing package ('jupyter-tensorboard', '0.2.0', 'https://pypi.org/simple') due to installation time error in the software environment - see https://thoth-station.ninja/j/install_error
2022-03-07 15:20:20,288  22 INFO     thoth.adviser.resolver:624: User's stack was removed based on sieves - see https://thoth-station.ninja/j/rm_user_stack

Failure:

2022-03-07 15:22:28,279  22 INFO     thoth.adviser.resolver:1175: Scoring user's stack - see https://thoth-station.ninja/j/user_stack
2022-03-07 15:22:28,284  22 INFO     thoth.adviser.resolver:612: Scoring user's stack based on the lock file submitted - see https://thoth-station.ninja/j/user_stack
2022-03-07 15:22:28,979  22 WARNING  thoth.adviser.sieves.solved:127: Removing package ('jupyter-tensorboard', '0.2.0', 'https://pypi.org/simple') due to installation time error in the software environment - see https://thoth-station.ninja/j/install_error
2022-03-07 15:22:28,979  22 INFO     thoth.adviser.resolver:624: User's stack was removed based on sieves - see https://thoth-station.ninja/j/rm_user_stack

Failure:

2022-03-07 15:24:20,728  23 INFO     thoth.adviser.resolver:1175: Scoring user's stack - see https://thoth-station.ninja/j/user_stack
2022-03-07 15:24:20,733  23 INFO     thoth.adviser.resolver:612: Scoring user's stack based on the lock file submitted - see https://thoth-station.ninja/j/user_stack
2022-03-07 15:24:21,467  23 WARNING  thoth.adviser.sieves.solved:127: Removing package ('jupyter-tensorboard', '0.2.0', 'https://pypi.org/simple') due to installation time error in the software environment - see https://thoth-station.ninja/j/install_error
2022-03-07 15:24:21,468  23 INFO     thoth.adviser.resolver:624: User's stack was removed based on sieves - see https://thoth-station.ninja/j/rm_user_stack

Based on the lock file we use in repos, it looks like that thoth-solver was not able to solve jupyter-tensorboard==0.2.0' in the given runtime environment.

However, for some schenarios adviser was able to resolve application dependencies when triggered manually. I've created a new integration-tests job to confirm if these tests are still failing. Nevertheless, it would be great to check why thoth-solver did not solve jupyter-tensorboard in the given runtime environment.

fridex commented 2 years ago

thoth-solver fails to install jupyterlab-tensorboard with the following error:

Command exited with non-zero status code (1):     ERROR: Command errored out with exit status 1:
     command: /opt/app-root/src/solver-venv/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-0i94_y48/jupyter-tensorboard/setup.py'"'"'; __file__='"'"'/tmp/pip-install-0i94_y48/jupyter-tensorboard/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-vgv1i21f/install-record.txt --single-version-externally-managed --compile --install-headers /opt/app-root/src/solver-venv/include/site/python3.8/jupyter-tensorboard
         cwd: /tmp/pip-install-0i94_y48/jupyter-tensorboard/
    Complete output (71 lines):
    running install
    running build
    running build_py
    creating build
    creating build/lib
    creating build/lib/jupyter_tensorboard
    copying jupyter_tensorboard/application.py -> build/lib/jupyter_tensorboard
    copying jupyter_tensorboard/tensorboard_manager.py -> build/lib/jupyter_tensorboard
    copying jupyter_tensorboard/api_handlers.py -> build/lib/jupyter_tensorboard
    copying jupyter_tensorboard/__init__.py -> build/lib/jupyter_tensorboard
    copying jupyter_tensorboard/handlers.py -> build/lib/jupyter_tensorboard
    creating build/lib/jupyter_tensorboard/static
    copying jupyter_tensorboard/static/tensorboardlist.js -> build/lib/jupyter_tensorboard/static
    copying jupyter_tensorboard/static/style.css -> build/lib/jupyter_tensorboard/static
    copying jupyter_tensorboard/static/tree.js -> build/lib/jupyter_tensorboard/static
    running build_scripts
    creating build/scripts-3.8
    copying scripts/jupyter-tensorboard -> build/scripts-3.8
    changing mode of build/scripts-3.8/jupyter-tensorboard from 644 to 755
    running install_lib
    creating /opt/app-root/src/solver-venv/lib/python3.8/site-packages/jupyter_tensorboard
    copying build/lib/jupyter_tensorboard/application.py -> /opt/app-root/src/solver-venv/lib/python3.8/site-packages/jupyter_tensorboard
    copying build/lib/jupyter_tensorboard/tensorboard_manager.py -> /opt/app-root/src/solver-venv/lib/python3.8/site-packages/jupyter_tensorboard
    copying build/lib/jupyter_tensorboard/api_handlers.py -> /opt/app-root/src/solver-venv/lib/python3.8/site-packages/jupyter_tensorboard
    copying build/lib/jupyter_tensorboard/__init__.py -> /opt/app-root/src/solver-venv/lib/python3.8/site-packages/jupyter_tensorboard
    copying build/lib/jupyter_tensorboard/handlers.py -> /opt/app-root/src/solver-venv/lib/python3.8/site-packages/jupyter_tensorboard
    creating /opt/app-root/src/solver-venv/lib/python3.8/site-packages/jupyter_tensorboard/static
    copying build/lib/jupyter_tensorboard/static/tensorboardlist.js -> /opt/app-root/src/solver-venv/lib/python3.8/site-packages/jupyter_tensorboard/static
    copying build/lib/jupyter_tensorboard/static/style.css -> /opt/app-root/src/solver-venv/lib/python3.8/site-packages/jupyter_tensorboard/static
    copying build/lib/jupyter_tensorboard/static/tree.js -> /opt/app-root/src/solver-venv/lib/python3.8/site-packages/jupyter_tensorboard/static
    byte-compiling /opt/app-root/src/solver-venv/lib/python3.8/site-packages/jupyter_tensorboard/application.py to application.cpython-38.pyc
    byte-compiling /opt/app-root/src/solver-venv/lib/python3.8/site-packages/jupyter_tensorboard/tensorboard_manager.py to tensorboard_manager.cpython-38.pyc
    byte-compiling /opt/app-root/src/solver-venv/lib/python3.8/site-packages/jupyter_tensorboard/api_handlers.py to api_handlers.cpython-38.pyc
    byte-compiling /opt/app-root/src/solver-venv/lib/python3.8/site-packages/jupyter_tensorboard/__init__.py to __init__.cpython-38.pyc
    byte-compiling /opt/app-root/src/solver-venv/lib/python3.8/site-packages/jupyter_tensorboard/handlers.py to handlers.cpython-38.pyc
    running install_egg_info
    running egg_info
    writing jupyter_tensorboard.egg-info/PKG-INFO
    writing dependency_links to jupyter_tensorboard.egg-info/dependency_links.txt
    writing entry points to jupyter_tensorboard.egg-info/entry_points.txt
    writing requirements to jupyter_tensorboard.egg-info/requires.txt
    writing top-level names to jupyter_tensorboard.egg-info/top_level.txt
    reading manifest file 'jupyter_tensorboard.egg-info/SOURCES.txt'
    writing manifest file 'jupyter_tensorboard.egg-info/SOURCES.txt'
    Copying jupyter_tensorboard.egg-info to /opt/app-root/src/solver-venv/lib/python3.8/site-packages/jupyter_tensorboard-0.2.0-py3.8.egg-info
    running install_scripts
    copying build/scripts-3.8/jupyter-tensorboard -> /opt/app-root/src/solver-venv/bin
    changing mode of /opt/app-root/src/solver-venv/bin/jupyter-tensorboard to 755
    Installing jupyter-tensorboard script to /opt/app-root/src/solver-venv/bin
    writing list of installed files to '/tmp/pip-record-vgv1i21f/install-record.txt'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-0i94_y48/jupyter-tensorboard/setup.py", line 52, in <module>
        setup(
      File "/opt/app-root/src/solver-venv/lib64/python3.8/site-packages/setuptools/__init__.py", line 145, in setup
        return distutils.core.setup(**attrs)
      File "/usr/lib64/python3.8/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/usr/lib64/python3.8/distutils/dist.py", line 966, in run_commands
        self.run_command(cmd)
      File "/usr/lib64/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/tmp/pip-install-0i94_y48/jupyter-tensorboard/setup.py", line 47, in run
        enable_extension_after_install()
      File "/tmp/pip-install-0i94_y48/jupyter-tensorboard/setup.py", line 30, in enable_extension_after_install
        from jupyter_tensorboard.application import (
      File "/tmp/pip-install-0i94_y48/jupyter-tensorboard/jupyter_tensorboard/__init__.py", line 3, in <module>
        from .handlers import load_jupyter_server_extension   # noqa
      File "/tmp/pip-install-0i94_y48/jupyter-tensorboard/jupyter_tensorboard/handlers.py", line 3, in <module>
        from tornado import web
    ModuleNotFoundError: No module named 'tornado'
    ----------------------------------------
ERROR: Command errored out with exit status 1: /opt/app-root/src/solver-venv/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-0i94_y48/jupyter-tensorboard/setup.py'"'"'; __file__='"'"'/tmp/pip-install-0i94_y48/jupyter-tensorboard/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-vgv1i21f/install-record.txt --single-version-externally-managed --compile --install-headers /opt/app-root/src/solver-venv/include/site/python3.8/jupyter-tensorboard Check the logs for full command output.

The issue here is that jupyter-tensorboard executes code after installation that expects tornado present in the environment. As we install jupyter-tensorboard without dependencies, the code behind executing the post-install procedure to register the extension fails.

fridex commented 2 years ago

In the recent report, the adviser was able to find a resolution to this issue - that is using an older version of jupyter-tensorboard that does not perform any post-install procedure.

Closing this as integration tests are green. Nevertheless, we should report this upstream and see what their opinion is on this one.

/close

sesheta commented 2 years ago

@fridex: Closing this issue.

In response to [this](https://github.com/thoth-station/integration-tests/issues/266#issuecomment-1062063803): >In the recent report, the adviser was able to find a resolution to this issue - that is using an older version of jupyter-tensorboard that does not perform any post-install procedure. > >Closing this as integration tests are green. Nevertheless, we should report this upstream and see what their opinion is on this one. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
fridex commented 2 years ago

/reopen

sesheta commented 2 years ago

@fridex: Reopened this issue.

In response to [this](https://github.com/thoth-station/integration-tests/issues/266#issuecomment-1116210197): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
codificat commented 2 years ago

Today's aws-prod tests show green. Worth ensuring stage tests are also green

codificat commented 2 years ago

/assign @fridex /lifecycle active

fridex commented 2 years ago

Scheduled integration-tests for stage, we should receive an email report after the integration tests finish.

codificat commented 2 years ago

Right now, integration tests in stage are not running (https://github.com/thoth-station/thoth-application/issues/2599) /remove-lifecycle active until this is addressed

codificat commented 2 years ago

In yesterday's run of the integration tests in aws-prod, one of the adviser tests failed (ps-cv-pytorch):

... Then I ask for an advise for the cloned application for runtime environment ps-cv-pytorch , without user stack supplied and without static analysis (965.794s) 
...
2022-06-28 03:17:46,572 thoth.adviser.run           ERROR: Resolver was killed as allocated CPU time was exceeded - https://thoth-station.ninja/j/cpu_time_exceeded

Captured logging:
INFO:thamos.lib:Using 'latest' recommendation type - see https://thoth-station.ninja/recommendation-types/
WARNING:thamos.lib:The user stack found in the lock file will not be supplied as requested
INFO:thamos.lib:Successfully submitted advise analysis 'adviser-220628030145-f174942db191749e' to 'https://api.prod.thoth-station.ninja/api/v1'
codificat commented 2 years ago

Another anecdotal update: yesterday's aws-prod integration test runs have 2 tests failing due to allocated CPU time exceeded: ps-cv-pytorch and ps-cv-tensorflow

sesheta commented 1 year ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

harshad16 commented 1 year ago

/remove-lifecycle stale /lifecycle frozen