python / cpython

The Python programming language
https://www.python.org
Other
63.15k stars 30.23k forks source link

Make it possible to use `sys.monitoring` for pdb/bdb #120144

Open gaogaotiantian opened 4 months ago

gaogaotiantian commented 4 months ago

Feature or enhancement

Proposal:

I had a few proposals to utilize sys.monitoring for pdb and all of them were raising concerns. We had a discussion during language summit and I got feedbacks from several members that breaking backwards compatibility for bdb is not acceptible. I can totally understand the concern so I have a much more conservative proposal to do this. I would like to get opinions from people before start working on it.

TL; DR:

  1. All existing code will work exactly as before.
  2. sys.monitoring will be an opt-in feature for bdb.
  3. Minimum changes are needed for existing debugger (including pdb) to onboard.

Details:

What I propose, is to keep all the code for bdb as it is, and add extra code for sys.monitoring without disturbing the existing methods. Everything will be hidden behind a feature gate so all the existing code will work exactly as before.

In order to use sys.monitoring instead of sys.settrace, the user needs to init bdb.Bdb with an argument (use_monitoring=True?). Then the underlying bdb.Bdb will work in sys.monitoring mode. Ideally, that's the only change the debugger needs to make.

Of course, in reality, if the debugger wants to onboard this feature, it may need a few tweaks. For example, in pdb, debug command toggles sys.settrace to make it possible to debug something while tracing, that needs to be modified. However, the goal is to make it trivial for debugger developers to switch from sys.settrace to sys.monitoring, if they knew how sys.monitoring works.

Let me re-emphasize it in case that's still a confusion - there's nothing the developer needs to do, if they just want the old sys.settrace, everything will simply work because all the new stuff will be behind a feature gate.

If they chose to use sys.monitoring, there will be a few APIs in bdb.Bdb that does not even make sense anymore - trace_dispatch and dispatch_* functions. The documentation already says:

The following methods of Bdb normally don’t need to be overridden.

and a normal debugger should not override those methods anyway (pdb does not). Again, those APIs will still work in sys.settrace mode.

As for pdb, we can also add a feature gate for it. The user can choose between the sys.settrace mode and the sys.monitoring mode. The behaviors in two modes should be identical, except for the performance. We can even make 3.14 a transition version, where the default mechanism is still sys.settrace, and the user can opt-in the sys.monitoring mode by explicitly asking for it (through initialization argument or command line argument). This way, we can get feedbacks from the brave pioneers without disturbing most pdb users. We will have a full year to fix bugs introduced by the mechanism and stablize it.

In 3.15, we can make sys.monitoring the default and still keep sys.settrace as a fallback plan.

So, why bother? Because we can really gain 100x performance with breakpoints. Not only with breakpoints, even without breakpoints, there's a significant overhead to run with debugger attached:

def trace(*args):
    return None

def f():
    start = time.perf_counter()
    fib(22)
    print(time.perf_counter() - start)

f()

The overhead with trace attached is 4x-5x for f() because the call event will still be triggered and even if f_trace == None, the instrumentation is still there and will be executed! We can have an almost zero-overhead debugger and people are very excited about the possibility.

Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

Links to previous discussion of this feature:

https://github.com/python/cpython/issues/103103 https://discuss.python.org/t/make-pdb-faster-with-pep-669/37674

Linked PRs

ncoghlan commented 4 months ago

The general idea seems sound to me.

Given the way the code is structured, a per-instance setting seems like it will be the easiest approach at the Bdb level, but for Pdb, I'm wondering if it may be useful to represesent the two modes as distinct subclasses, and offer a way to globally specify which one the module level Pdb name refers to, along the lines of:

# in pdb.py
class _PdbBase(bdb.Bdb, cmd.Cmd):
    _bdb_use_monitoring = False   # Passed to `bdb.Bdb` in `__init__`
    ...

class PdbTrace(_PdbBase):
    ...

class PdbMonitoring(_PdbBase):
    _bdb_use_monitoring = True   # Changes how `Bdb` base class is initialised
    ...

Pdb = PdbTrace # Use classic sys.settrace-based impl by default

def set_default_pdb(cls):
    """Updated the type referenced by `pdb.Pdb`

    Affects the behaviour of the `breakpoint()` builtin, and all
    `pdb` module level functions that create `Pdb` instances.
    """
    global Pdb
    Pdb = cls
gaogaotiantian commented 4 months ago

That's surely one way to do it. We can always add a switch to globally flip the backend. I think the only run-time difference is how the code below works:

from pdb import Pdb
set_default_pdb('monitoring')
p = Pdb()

Is there any other advantage for having two classes? The code structure would be clearer if we have different implementations for PdbMonitoring and PdbTrace, but I'm expecting minimum difference (and it may not be that way). The entry for breakpoint() is pdb.set_trace() so I don't see a big difference there.

I'm just throwing out ideas and maybe it turned out that having two classes is much cleaner. Just wanted to know what's your concern here and why you proposed this.

gaogaotiantian commented 4 months ago

Also it would be really helpful if @Yhg1s could provide some feedback. I know you have some concerns about changing bdb.