thoth-station / graph-refresh-job

A job for scheduling solver to resolve dependency graphs of new packages
GNU General Public License v3.0
1 stars 9 forks source link

Packages from Pulp python package indexes are not analyzed #669

Closed fridex closed 2 years ago

fridex commented 2 years ago

Describe the bug

With https://github.com/thoth-station/pulp-pypi-sync-job/issues/33 fixed, we have correct Pulp Python package indexes in the stage environment. These indexes are registered, but graph-refresh does not take them into account - packages from these indexes are not analyzed.

To Reproduce Steps to reproduce the behavior:

  1. Go to stage deployment

  2. Check registered Pulp Python package indexes registered (ex. using stage User API)

  3. Trigger graph-refresh job that should trigger solving packages available on Pulp indexes (ex. donkeycar package)

    • oc create job --from=cronjob/graph-refresh graph-refresh-manual
  4. See no solvers are scheduled for Pulp Python package indexes

Expected behavior

Solvers for Pulp Python package indexes should be started.

Additional context

I've verified thoth-python correctly parses content on the Pulp package index:

from thoth.python import Source

s = Source("https://pulp.operate-first.cloud/pypi/gym-donkeycar/simple")

for i in s.get_packages():
    print(i)
    print(s.get_package_versions(i))

The issue might be in the graph-refresh logic.

fridex commented 2 years ago

/kind bug /priority critical-urgent

pacospace commented 2 years ago

Hey @fridex, I looked into this and it seems the reason why we are not analyzing from that index is because graph-refresh-job considers packages that were solved from a certain runtime environment, gym-donkeycar was solved only with from PyPI with two solvers, but from the pulp index never solved.

The logic is here: https://github.com/thoth-station/graph-refresh-job/blob/f30ea7e683ea8f1a98aa4ecd9fae57463e9c093f/producer.py#L128 we basically cannot find that package, index because the query filters by solvers in PPV.

I checked PPVE and the package is there, same as the index. So once we get one package from that index solved, than graph-refresh will start considering that package from that index.

I scheduled it from management-api in stage https://console-openshift-console.apps.ocp4.prod.psi.redhat.com/k8s/ns/thoth-middletier-stage/pods/solver-rhel-8-py38-220207111322-f191479cc5f4f33f-3139184444/logs?container=main and discovered that there is actually an error from solver:

2022-02-07 11:13:41,929  19 INFO     thoth.common:319: Setting up logging to a Sentry instance 'sentry.io/1298083', environment 'ocp4-stage' and integrations ['AioHttpIntegration']
2022-02-07 11:13:42,224  19 INFO     thoth.common:366: Logging to rsyslog endpoint is turned off
2022-02-07 11:13:42,227  19 INFO     thoth.solver:66: Thoth Dependency Solver v1.10.3
2022-02-07 11:13:46,729  19 INFO     thoth.solver.python.python:251: Resolving package 'gym-donkeycar' with version specifier '==1.1.1' from 'https://pulp.operate-first.cloud/pypi/gym-donkeycar/simple'
2022-02-07 11:13:47,072  19 WARNING  thoth.python.source:223: It looks like package name does not match the one parsed from artifact when parsing version from wheel - package name is gym-donkeycar, pared version is 1.1.1, artifact is gym_donkeycar-1.1.1-py2.py3-none-any.whl
2022-02-07 11:13:47,073  19 INFO     thoth.solver.python.python:272: Adding package 'gym-donkeycar' in version '1.1.1' for solving
2022-02-07 11:13:47,073  19 INFO     thoth.solver.python.python:279: Using index 'https://pulp.operate-first.cloud/pypi/gym-donkeycar/simple' to discover package 'gym-donkeycar' in version '1.1.1'
2022-02-07 11:14:12,835  19 INFO     thoth.solver.python.python:338: Resolving dependency versions for 'gym' with range None from 'https://pulp.operate-first.cloud/pypi/gym-donkeycar/simple'
2022-02-07 11:14:13,077  19 INFO     thoth.solver.python.base:81: No releases found for package 'gym'
2022-02-07 11:14:14,427  19 CRITICAL root:105: Traceback (most recent call last):
  File "thoth/solver/cli.py", line 164, in <module>
    cli()
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "thoth/solver/cli.py", line 142, in python
    result = resolve_python(
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/solver/python/python.py", line 410, in resolve
    solver_result = _do_resolve_index(
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/solver/python/python.py", line 344, in _do_resolve_index
    resolved_versions = _resolve_versions(
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/solver/python/python.py", line 202, in _resolve_versions
    assert len(resolved_versions.keys()) == 1, "Resolution of one package version ended with multiple packages."
AssertionError: Resolution of one package version ended with multiple packages.

solver assumes that if the one package came from a certain index, also the transitive dependencies came from that index right? that is why gym is not found.

pacospace commented 2 years ago

Hey @fridex, I looked into this and it seems the reason why we are not analyzing from that index is because graph-refresh-job considers packages that were solved from a certain runtime environment, gym-donkeycar was solved only with from PyPI with two solvers, but from the pulp index never solved.

The logic is here:

https://github.com/thoth-station/graph-refresh-job/blob/f30ea7e683ea8f1a98aa4ecd9fae57463e9c093f/producer.py#L128

we basically cannot find that package, index because the query filters by solvers in PPV.

Mmm actually rethinking about this and looking at the query logic, it should filter packages in PPVE that were already in PPV, so the remaining should be there, including that package from that index. So this is not the reason why it is not scheduled. Need to look into it more.

I checked PPVE and the package is there, same as the index. So once we get one package from that index solved, than graph-refresh will start considering that package from that index.

I scheduled it from management-api in stage https://console-openshift-console.apps.ocp4.prod.psi.redhat.com/k8s/ns/thoth-middletier-stage/pods/solver-rhel-8-py38-220207111322-f191479cc5f4f33f-3139184444/logs?container=main and discovered that there is actually an error from solver:

2022-02-07 11:13:41,929  19 INFO     thoth.common:319: Setting up logging to a Sentry instance 'sentry.io/1298083', environment 'ocp4-stage' and integrations ['AioHttpIntegration']
2022-02-07 11:13:42,224  19 INFO     thoth.common:366: Logging to rsyslog endpoint is turned off
2022-02-07 11:13:42,227  19 INFO     thoth.solver:66: Thoth Dependency Solver v1.10.3
2022-02-07 11:13:46,729  19 INFO     thoth.solver.python.python:251: Resolving package 'gym-donkeycar' with version specifier '==1.1.1' from 'https://pulp.operate-first.cloud/pypi/gym-donkeycar/simple'
2022-02-07 11:13:47,072  19 WARNING  thoth.python.source:223: It looks like package name does not match the one parsed from artifact when parsing version from wheel - package name is gym-donkeycar, pared version is 1.1.1, artifact is gym_donkeycar-1.1.1-py2.py3-none-any.whl
2022-02-07 11:13:47,073  19 INFO     thoth.solver.python.python:272: Adding package 'gym-donkeycar' in version '1.1.1' for solving
2022-02-07 11:13:47,073  19 INFO     thoth.solver.python.python:279: Using index 'https://pulp.operate-first.cloud/pypi/gym-donkeycar/simple' to discover package 'gym-donkeycar' in version '1.1.1'
2022-02-07 11:14:12,835  19 INFO     thoth.solver.python.python:338: Resolving dependency versions for 'gym' with range None from 'https://pulp.operate-first.cloud/pypi/gym-donkeycar/simple'
2022-02-07 11:14:13,077  19 INFO     thoth.solver.python.base:81: No releases found for package 'gym'
2022-02-07 11:14:14,427  19 CRITICAL root:105: Traceback (most recent call last):
  File "thoth/solver/cli.py", line 164, in <module>
    cli()
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "thoth/solver/cli.py", line 142, in python
    result = resolve_python(
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/solver/python/python.py", line 410, in resolve
    solver_result = _do_resolve_index(
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/solver/python/python.py", line 344, in _do_resolve_index
    resolved_versions = _resolve_versions(
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/solver/python/python.py", line 202, in _resolve_versions
    assert len(resolved_versions.keys()) == 1, "Resolution of one package version ended with multiple packages."
AssertionError: Resolution of one package version ended with multiple packages.

solver assumes that if the one package came from a certain index, also the transitive dependencies came from that index right? that is why gym is not found.

pacospace commented 2 years ago

Hey @fridex, I looked into this and it seems the reason why we are not analyzing from that index is because graph-refresh-job considers packages that were solved from a certain runtime environment, gym-donkeycar was solved only with from PyPI with two solvers, but from the pulp index never solved. The logic is here: https://github.com/thoth-station/graph-refresh-job/blob/f30ea7e683ea8f1a98aa4ecd9fae57463e9c093f/producer.py#L128

we basically cannot find that package, index because the query filters by solvers in PPV.

Mmm actually rethinking about this and looking at the query logic, it should filter packages in PPVE that were already in PPV, so the remaining should be there, including that package from that index. So this is not the reason why it is not scheduled. Need to look into it more.

I checked PPVE and the package is there, same as the index. So once we get one package from that index solved, than graph-refresh will start considering that package from that index. I scheduled it from management-api in stage https://console-openshift-console.apps.ocp4.prod.psi.redhat.com/k8s/ns/thoth-middletier-stage/pods/solver-rhel-8-py38-220207111322-f191479cc5f4f33f-3139184444/logs?container=main and discovered that there is actually an error from solver:

2022-02-07 11:13:41,929  19 INFO     thoth.common:319: Setting up logging to a Sentry instance 'sentry.io/1298083', environment 'ocp4-stage' and integrations ['AioHttpIntegration']
2022-02-07 11:13:42,224  19 INFO     thoth.common:366: Logging to rsyslog endpoint is turned off
2022-02-07 11:13:42,227  19 INFO     thoth.solver:66: Thoth Dependency Solver v1.10.3
2022-02-07 11:13:46,729  19 INFO     thoth.solver.python.python:251: Resolving package 'gym-donkeycar' with version specifier '==1.1.1' from 'https://pulp.operate-first.cloud/pypi/gym-donkeycar/simple'
2022-02-07 11:13:47,072  19 WARNING  thoth.python.source:223: It looks like package name does not match the one parsed from artifact when parsing version from wheel - package name is gym-donkeycar, pared version is 1.1.1, artifact is gym_donkeycar-1.1.1-py2.py3-none-any.whl
2022-02-07 11:13:47,073  19 INFO     thoth.solver.python.python:272: Adding package 'gym-donkeycar' in version '1.1.1' for solving
2022-02-07 11:13:47,073  19 INFO     thoth.solver.python.python:279: Using index 'https://pulp.operate-first.cloud/pypi/gym-donkeycar/simple' to discover package 'gym-donkeycar' in version '1.1.1'
2022-02-07 11:14:12,835  19 INFO     thoth.solver.python.python:338: Resolving dependency versions for 'gym' with range None from 'https://pulp.operate-first.cloud/pypi/gym-donkeycar/simple'
2022-02-07 11:14:13,077  19 INFO     thoth.solver.python.base:81: No releases found for package 'gym'
2022-02-07 11:14:14,427  19 CRITICAL root:105: Traceback (most recent call last):
  File "thoth/solver/cli.py", line 164, in <module>
    cli()
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "thoth/solver/cli.py", line 142, in python
    result = resolve_python(
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/solver/python/python.py", line 410, in resolve
    solver_result = _do_resolve_index(
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/solver/python/python.py", line 344, in _do_resolve_index
    resolved_versions = _resolve_versions(
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/solver/python/python.py", line 202, in _resolve_versions
    assert len(resolved_versions.keys()) == 1, "Resolution of one package version ended with multiple packages."
AssertionError: Resolution of one package version ended with multiple packages.

solver assumes that if the one package came from a certain index, also the transitive dependencies came from that index right? that is why gym is not found.

I will test the query locally with a pg dump from stage.

pacospace commented 2 years ago

Hey @fridex, I looked into this and it seems the reason why we are not analyzing from that index is because graph-refresh-job considers packages that were solved from a certain runtime environment, gym-donkeycar was solved only with from PyPI with two solvers, but from the pulp index never solved. The logic is here: https://github.com/thoth-station/graph-refresh-job/blob/f30ea7e683ea8f1a98aa4ecd9fae57463e9c093f/producer.py#L128

we basically cannot find that package, index because the query filters by solvers in PPV.

Mmm actually rethinking about this and looking at the query logic, it should filter packages in PPVE that were already in PPV, so the remaining should be there, including that package from that index. So this is not the reason why it is not scheduled. Need to look into it more.

I checked PPVE and the package is there, same as the index. So once we get one package from that index solved, than graph-refresh will start considering that package from that index. I scheduled it from management-api in stage https://console-openshift-console.apps.ocp4.prod.psi.redhat.com/k8s/ns/thoth-middletier-stage/pods/solver-rhel-8-py38-220207111322-f191479cc5f4f33f-3139184444/logs?container=main and discovered that there is actually an error from solver:

2022-02-07 11:13:41,929  19 INFO     thoth.common:319: Setting up logging to a Sentry instance 'sentry.io/1298083', environment 'ocp4-stage' and integrations ['AioHttpIntegration']
2022-02-07 11:13:42,224  19 INFO     thoth.common:366: Logging to rsyslog endpoint is turned off
2022-02-07 11:13:42,227  19 INFO     thoth.solver:66: Thoth Dependency Solver v1.10.3
2022-02-07 11:13:46,729  19 INFO     thoth.solver.python.python:251: Resolving package 'gym-donkeycar' with version specifier '==1.1.1' from 'https://pulp.operate-first.cloud/pypi/gym-donkeycar/simple'
2022-02-07 11:13:47,072  19 WARNING  thoth.python.source:223: It looks like package name does not match the one parsed from artifact when parsing version from wheel - package name is gym-donkeycar, pared version is 1.1.1, artifact is gym_donkeycar-1.1.1-py2.py3-none-any.whl
2022-02-07 11:13:47,073  19 INFO     thoth.solver.python.python:272: Adding package 'gym-donkeycar' in version '1.1.1' for solving
2022-02-07 11:13:47,073  19 INFO     thoth.solver.python.python:279: Using index 'https://pulp.operate-first.cloud/pypi/gym-donkeycar/simple' to discover package 'gym-donkeycar' in version '1.1.1'
2022-02-07 11:14:12,835  19 INFO     thoth.solver.python.python:338: Resolving dependency versions for 'gym' with range None from 'https://pulp.operate-first.cloud/pypi/gym-donkeycar/simple'
2022-02-07 11:14:13,077  19 INFO     thoth.solver.python.base:81: No releases found for package 'gym'
2022-02-07 11:14:14,427  19 CRITICAL root:105: Traceback (most recent call last):
  File "thoth/solver/cli.py", line 164, in <module>
    cli()
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/app-root/lib64/python3.8/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/opt/app-root/lib64/python3.8/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "thoth/solver/cli.py", line 142, in python
    result = resolve_python(
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/solver/python/python.py", line 410, in resolve
    solver_result = _do_resolve_index(
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/solver/python/python.py", line 344, in _do_resolve_index
    resolved_versions = _resolve_versions(
  File "/opt/app-root/lib64/python3.8/site-packages/thoth/solver/python/python.py", line 202, in _resolve_versions
    assert len(resolved_versions.keys()) == 1, "Resolution of one package version ended with multiple packages."
AssertionError: Resolution of one package version ended with multiple packages.

solver assumes that if the one package came from a certain index, also the transitive dependencies came from that index right? that is why gym is not found.

I will test the query locally with a pg dump from stage.

Ok confirmed, graph-refresh-job query is correct and give the correct package from correct index present into the database.

pacospace commented 2 years ago

So I assume the package from graph-refresh-job is scheduled, but solver fails due to the error above and it never produces a solver results, therefore data are not synced.

@fridex The problem is in the transitive dependencies of that packages not present in the pulp index considered.

I tested solver locally also:

2022-02-07 12:49:05,718 46560 INFO     thoth.solver.python.python:340: Resolving dependency versions for 'gym' with range None from 'https://pulp.operate-first.cloud/pypi/gym-donkeycar/simple'

is it normal to consider that all packages should be available on the same index where the direct dependency is considered? Is there a way for solver to check pypi if some transitive dependencies are not present in that index? (I assume this is not safe).

Or should we upload the transitive dependencies also to Pulp?

fridex commented 2 years ago

Wow, thanks for digging into this. 👍🏻 Now I see what is wrong - we have a logical bug here. Indeed, solver will need to be adjusted to make sure packages are correctly resolved for all the versions, considering transitive dependencies but keeping only one index as "the source" of the direct dependency that is solved. I'll try to look into this. Thanks a lot for debugging this 💯

pacospace commented 2 years ago

Wow, thanks for digging into this. 👍🏻 Now I see what is wrong - we have a logical bug here. Indeed, solver will need to be adjusted to make sure packages are correctly resolved for all the versions, considering transitive dependencies but keeping only one index as "the source" of the direct dependency that is solved. I'll try to look into this. Thanks a lot for debugging this 100

Sure! I can open an issue in solver for this :)

fridex commented 2 years ago

This has been fixed by referenced PRs. We can consider this as done.

/close

sesheta commented 2 years ago

@fridex: Closing this issue.

In response to [this](https://github.com/thoth-station/graph-refresh-job/issues/669#issuecomment-1049097473): >This has been fixed by referenced PRs. We can consider this as done. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.