man-group / ArcticDB

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
http://arcticdb.io
Other
1.51k stars 93 forks source link

Logging doesn't show in JupyterHub #541

Open mehertz opened 1 year ago

mehertz commented 1 year ago

Describe the bug

Turning on logging doesn't produce valid logging output in Jupyter notebooks.

This works fine in ipython.

Steps/Code to Reproduce

import os

os.environ['ARCTICDB_version_loglevel'] = 'debug'

from arcticdb import Arctic

ac = Arctic(<MY CONNECTION URI>)
lib = ac[<MY LIB>]
lib.list_symbols()

Expected Results

Logging such as this to be visible:

20230703 13:51:22.096289 1801004 D arcticdb.version | Symbol List: get_symbol_set: called
20230703 13:51:22.096380 1801004 D arcticdb.version | Symbol List: attempt_load: Symbol list load attempt
20230703 13:51:22.150124 1801004 D arcticdb.version | Symbol List: load_from_version_keys: Loading symbols from version keys
20230703 13:51:22.247876 1801004 D arcticdb.version | Symbol List: load: Compaction necessary. Obtaining lock...
20230703 13:51:22.474641 1801004 D arcticdb.version | Symbol List: load: Checking whether we still need to compact under lock
20230703 13:51:22.493012 1801004 D arcticdb.version | Symbol List: can_update_symbol_list: found_last=false
20230703 13:51:22.493065 1801004 D arcticdb.version | Symbol List: can_update_symbol_list: has_newer=false
20230703 13:51:22.493088 1801004 D arcticdb.version | Symbol List: write_symbols: Writing following number of symbols to symbol list cache: 100

OS, Python Version and ArcticDB Version

Python, Linux

Backend storage used

All

Additional Context

Note this can be worked around by setting a file logger:

def log_to_file(levels: dict, log_file: str):
    from arcticdb.config import configure, make_loggers_config, Defaults

    config = make_loggers_config(Defaults.DEFAULT_LOG_LEVEL, levels)
    sink = config.sink_by_id["file"]
    sink.file.path = log_file
    for key in levels.keys():
        config.logger_by_id[key].sink_ids.append("file")

    configure(config, force=True)

log_to_file({"version": "DEBUG"}, <FILE PATH>)
mehertz commented 1 year ago

573 is a not a complete resolution to this.

To properly resolve, we can:

  1. Redirect c logging to a special file descriptor, and then dump all of it to the terminal using the Python print: https://github.com/rapidsai/cuml/pull/2774/files#diff-b18b992f86cf227be780f9d38aee57e284f550aa3c05ee8e73d57ef894161410
  2. Add a callback in the spdlog logger than calls the Python print after logging: https://github.com/rapidsai/cuml/pull/2781
  3. Use the PyBind functionality to redirect the stderr fd to the Python stdout/stderr stream: https://pybind11.readthedocs.io/en/stable/advanced/pycpp/utilities.html

Option 3 would be my preference, though it has the downside of not being thread safe.

qc00 commented 1 year ago

I presume option 3 affects the entire process and may end up in a feedback loop if running outside Jupyter (when os.stderr is pointing at fd 2).

I think we (also) need to figure out if Jupyter's IO redirection implementation buffers or not. If it does, we might not see any output until we release the GIL, making the warnings about long operations pointless.

alexowens90 commented 1 year ago

Solution 1:

Solution 2:

Solution 3:

3 is probably the right long-term solution. @qc00 we should be able to detect at runtime if we're running in Jupyterhub, and only set the callback if we are.

jjerphan commented 1 year ago

This is not an issue with ArcticDB, but rather with JupyterHub which is not correctly redirecting output (JupyterLab also has this problem).

This issue is not present when using standalone notebook via jupyter notebook as it correctly redirect outputs.