mamba-org / mamba

The Fast Cross-Platform Package Manager
https://mamba.readthedocs.io
BSD 3-Clause "New" or "Revised" License
6.89k stars 353 forks source link

Logging callbacks have too much overhead on libmambapy #3415

Open jaimergp opened 2 months ago

jaimergp commented 2 months ago

Troubleshooting docs

Anaconda default channels

How did you install Mamba?

Mambaforge or latest Miniforge

Search tried in issue tracker

logging debug

Latest version of Mamba

Tried in Conda?

Not applicable

Describe your issue

2.x API includes logging callbacks so the information reported by libsolv or libmamba can be routed through a Python callable via libmambapy.solver.libsolv.Database.set_logger(). However, this becomes very slow when the libsolv throughput is too large (e.g. verbosity=DEBUG). It causes a slowdown so large that the solver takes minutes to finish.

Are there any ways to have a less resource consuming way? I don't need fancy formatting or full control, just the ability to decide when to send different levels to stdout based on the verbosity chosen by the user.

mamba info / micromamba info

This is with libmambapy 2.0.0rc3

Logs

N/A

environment.yml

N/A

~/.condarc

N/A
jjerphan commented 2 months ago

Thank you for the report.

We have observed a tangential issue for some environments whose resolution via libsolv was hanging due to the logging system. Both issue have the same root cause: all of the possible logs of libsolv are emitted by default as we set log_level to std::numeric_limits<int>::max().

See: https://github.com/openSUSE/libsolv/blob/27aa6a72c7db73d78aa711ae412231768e77c9e0/src/pool.c#L1623-L1637

https://github.com/mamba-org/mamba/blob/08d7b4e9cb9b168d05f3bdc57c0708ee80d5a1c9/libmamba/src/solver/libsolv/database.cpp#L90-L108

https://github.com/mamba-org/mamba/blob/08d7b4e9cb9b168d05f3bdc57c0708ee80d5a1c9/libmamba/src/solver/libsolv/database.cpp#L110-L127

https://github.com/mamba-org/mamba/blob/154012f256d32bee1ae64f5b11d694cb45d75cef/libmamba/src/core/package_database_loader.cpp#L31-L53

An option can be for (release) builds not to emit debug logs unless if explicitly asked by users (with verbosity=DEBUG as you propose).

jjerphan commented 2 months ago

3416 addresses the performance regression. We think that the handling of log levels and of verbosity better be implemented in another PR.

jjerphan commented 1 month ago

@jaimergp: can you report if you have observed improvements with 2.0.0rc4?

jaimergp commented 1 month ago

@jaimergp: can you report if you have observed improvements with 2.0.0rc4?

In the sense that there are no performance overheads, yes. But I can't seem to be able to invoke obtain the libsolv logs anymore.

Isn't this sufficient?

    libmamba_context = libmambapy.Context(
        libmambapy.ContextOptions(
            enable_signal_handling=False,
            enable_logging=True,
        )
    )
    # Output params
    libmamba_context.output_params.json = context.json
    if libmamba_context.output_params.json:
        libmambapy.cancel_json_output(libmamba_context)
    libmamba_context.output_params.quiet = context.quiet
    libmamba_context.set_log_level(
        {
            4: libmambapy.LogLevel.TRACE,
            3: libmambapy.LogLevel.DEBUG,
            2: libmambapy.LogLevel.INFO,
            1: libmambapy.LogLevel.WARNING,
            0: libmambapy.LogLevel.ERROR,
        }[context.verbosity]
    )

This ^ correctly gives the libmamba logs, but not for libsolv. I thought I needed to call set_logger() on the Database instances:

        params = ChannelResolveParams(
            platforms=set(self.subdirs),
            channel_alias=CondaURL.parse(str(context.channel_alias)),
            custom_channels=ChannelResolveParams.ChannelMap(custom_channels),
            custom_multichannels=ChannelResolveParams.MultiChannelMap(custom_multichannels),
            home_dir=str(Path.home()),
            current_working_dir=os.getcwd(),
        )
        db = Database(params)
        db.set_logger(logger_callback)

with logger_callback():

def logger_callback(level: libmambapy.solver.libsolv.LogLevel, msg: str, logger=_db_log):
    # from libmambapy.solver.libsolv import LogLevel
    # levels = {
    #     LogLevel.Debug: logging.DEBUG, # 0 -> 10
    #     LogLevel.Warning: logging.WARNING, # 1 -> 30
    #     LogLevel.Error: logging.ERROR, # 2 -> 40
    #     LogLevel.Fatal: logging.FATAL, # 3 -> 50
    # }
    if level.value == 0:
        # This incurs a large performance hit!
        logger.debug(msg)
    else:
        logger.log((level.value + 2) * 10, msg)

But no luck?

jjerphan commented 1 month ago

Yes, we entirely disabled libsolv's logs for now. We need to think about how to properly handle them as well as alternative solvers'.

Feel free to open a PR to propose a solution.

jaimergp commented 1 month ago

Would it be reasonable to let libsolv print to stderr directly in debug level?

jjerphan commented 1 month ago

Probably.

Edit: I do not have bandwidth to properly think about it now.