ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
5.23k stars 592 forks source link

feat(debugging): add a top-level `info` or similar API to help devs debug issues #6828

Closed cpcloud closed 1 month ago

cpcloud commented 1 year ago

Is your feature request related to a problem?

It's related to problem of understanding problems :)

Describe the solution you'd like

We'd discussed this as a team somewhere, perhaps just in passing, but I wanted to get some of the discussion down in a public place.

Users report bugs using one or more backends, and knowing some details about both their ibis install and the backend(s) they're using is critical for address the issue.

One of the primary requirements is that a user can provide the dependencies installed for a specific extra (the majority of such extras are named after backends).

The API should not just print output, but should return an object amenable to machine processing. If we want a "print-only" API, that's fine but it should consume the output of the former.

To that end here are a couple of thoughts on what this might look like (naming is up for grabs)

Case 1: ibis.info()

This returns the following information:

  1. The ibis version
  2. Any extras for which at least one dependency is installed, with some indicator of any missing dependencies.

Possible example output for the case where duckdb is installed but not duckdb_engine:

>>> ibis.info()
{
    "ibis": "6.1.0",
    "duckdb": {
        "duckdb": "0.8.1",
        "duckdb_engine": None  # missing dependency indicator
    },
    # perhaps information about the current default backend is also useful?
}

Case 2: ibis.info(con)

In this case, the user is accessing version information about a live query engine.

The primary difference with case 1 is that we'll show the output of con.version, and show dependency information related only to that backend

>>> con = ibis.connect(...)
>>> ibis.info(con)
{
    "ibis": "6.1.0",
    "trino": {
        "trino-python-client": "0.230",
        "server": {
            "version": "0.422",
        }
    }
}

Case 3: ibis.info("snowflake")

This would show the status of all dependencies for a given backend regardless of whether any of its dependencies are installed:

>>> ibis.info("snowflake")
{
    "ibis": "6.1.0",
    "snowflake": {
        "snowflake-connector-python": None,
        "snowflake-sqlalchemy": None,
    }
}

Here this indicates that none of the dependencies for the snowflake backend are installed.

What version of ibis are you running?

master

What backend(s) are you using, if any?

N/A

Code of Conduct

NickCrews commented 1 year ago

Why not include

cpcloud commented 1 year ago

@NickCrews All of those are useful and all but the last one are feasible.

all installed python packages

Is there a standard way to get that information regardless of how things are installed? AFAIK there isn't.

The best we can do is look at top-level dependencies, since transitive dependencies may or may not have their own dependency information included anywhere except in setup.py (source tarballs for example).

NickCrews commented 1 year ago

pandas has something similar, but they hardcode in the list of packages they want to get the versions for: https://github.com/pandas-dev/pandas/blob/2bca01853c84d6b18b3e841f70e574de819857df/pandas/util/_print_versions.py#L103