rpm-software-management / dnf5

Next-generation RPM package management system
Other
256 stars 86 forks source link

[Python] calling `rpm.Package` methods outside of the function scope which depsolved a transaction triggers invalid `WeakPtr` dereference #1748

Open thozza opened 1 month ago

thozza commented 1 month ago

In https://github.com/osbuild/osbuild, we have executables which wrap DNF (4 and 5) Python APIs, that we use for depsolving package sets that are then installed as part of OS image building process.

One of the recently added functionality is to generate SPDX SBOM documents for the depsolved transaction. This is done by transforming the depsolving result, specifically package metadata and their relationships, into the respective SPDX SBOM model. The transformation is implemented by a library function, which accepts List[dnf.package.Package] or List[libdnf5.rpm.Package] and extracts the necessary data from the package objects.

However, calling any methods on the libdnf5.rpm.Package objects outside of the scope of the function, which depsolved the original transaction, triggers a core dump due to invalid pointer dereference:

terminate called after throwing an instance of 'libdnf5::AssertionError'
  what():  include/libdnf5/common/weak_ptr.hpp:208: TPtr* libdnf5::WeakPtr<TPtr, ptr_owner>::operator->() const [with TPtr = libdnf5::Base; bool ptr_owner = false]: Assertion 'is_valid()' failed: Dereferencing an invalidated WeakPtr
Aborted (core dumped)

The reproducer is attached below. Maybe I'm not using the API correctly, but the documentation is not really helpful in figuring this out.

The behavior is consistent with:

Reproducer

#!/usr/bin/python3

import tempfile
from typing import List

import libdnf5
from libdnf5.base import GoalProblem_NO_PROBLEM as NO_PROBLEM

def depsolve_pkgset(
    repo_paths: List[str],
    pkg_include: List[str]
) -> List[libdnf5.rpm.Package]:
    """
    Perform a dependency resolution on a set of local RPM repositories.
    """

    with tempfile.TemporaryDirectory() as tempdir:
        base = libdnf5.base.Base()
        conf = base.get_config()
        conf.config_file_path = "/dev/null"
        conf.persistdir = f"{tempdir}{conf.persistdir}"
        conf.cachedir = f"{tempdir}{conf.cachedir}"
        conf.reposdir = ["/dev/null"]
        conf.pluginconfpath = "/dev/null"
        conf.varsdir = ["/dev/null"]

        sack = base.get_repo_sack()
        for repo_path in repo_paths:
            sack.create_repos_from_dir(repo_path)

        base.setup()
        sack.update_and_load_enabled_repos(load_system=False)

        goal = libdnf5.base.Goal(base)
        for pkg in pkg_include:
            goal.add_install(pkg)
        transaction = goal.resolve()

        transaction_problems = transaction.get_problems()
        if transaction_problems != NO_PROBLEM:
            raise RuntimeError(f"transaction problems: {transaction.get_resolve_logs_as_strings()}")

        pkgs = []
        for tsi in transaction.get_transaction_packages():
            pkgs.append(tsi.get_package())

        for pkg in pkgs:
            print(pkg.get_name())

        return pkgs

if __name__ == "__main__":
    pkgs = depsolve_pkgset(["/etc/yum.repos.d"], ["bash"])

    # This triggers a core dump
    #
    # terminate called after throwing an instance of 'libdnf5::AssertionError'
    # what():  include/libdnf5/common/weak_ptr.hpp:208: TPtr* libdnf5::WeakPtr<TPtr, ptr_owner>::operator->() const [with TPtr = libdnf5::Base; bool ptr_owner = false]: Assertion 'is_valid()' failed: Dereferencing an invalidated WeakPtr
    print(pkgs)
jan-kolarik commented 2 weeks ago

Note: this still needs to be triaged, I am just adding notes from our previous 1-1 conversation to provide context.

Me: My assumption is that the owner of the packages objects is the transaction which is destroyed when going out of scope, so you might probably need to keep the whole transaction object. Or maybe you would need to keep a reference to the whole base.

Reporter: I'd expect the Package object to keep any references it needs to work.

Me: I agree that would be the better and user friendly, I am just not sure if this could be resolved easily as the Python bindings are generated by SWIG. We will get back to you in the upstream issue.