python / cpython

The Python programming language
https://www.python.org
Other
62.29k stars 29.93k forks source link

C-API for signalling monitoring events #111997

Closed scoder closed 5 days ago

scoder commented 10 months ago

Feature or enhancement

Proposal:

Language implementations for the CPython runtime (Cython, JIT compilers, regular expression engines, template engines, etc.) need a way to signal PEP-669 monitoring events to the registered listeners.

1) We need a way to create events and inject them into the monitoring system. Since events have more than one signature, we might end up needing more than one C-API function for this.

2) We need a way to map 3D source code positions (file name, line, character) to 1D integer offsets. Code objects help, but branches might cross source file boundaries, so that's more tricky. For many use cases, a mapping between (line, character) positions and an integer offset would probably suffice, although templating languages usually provide some kind of include commands, as does Cython. There should be some help in the C-API for building up such a mapping.

The reason why I think we need CPython's help for mapping indices is that both sides, listeners and even producers, need to agree on the same mapping. Sadly, the events don't include line/character positions directly but only a single integer. So event producers need a way to produce a number that event listeners like coverage analysers can map back to a source code position.

Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

Links to previous discussion of this feature:

https://discuss.python.org/t/pep-669-low-impact-monitoring-for-cpython/13018/61?u=scoder

Linked PRs

scoder commented 10 months ago

@encukou @markshannon @vstinner

vstinner commented 10 months ago

Do you want to propose an API for that?

drdavella commented 10 months ago

PEP-669 seems to discuss a mechanism to modify the bytestream for the purpose of implementing monitoring. Would access to this mechanism also be provided as part of this API? I haven't seen too much mention of this feature elsewhere.

Source: https://peps.python.org/pep-0669/#rationale

The quickening mechanism provided by PEP 659 provides a way to dynamically modify executing Python bytecode. These modifications have little cost beyond the parts of the code that are modified and a relatively low cost to those parts that are modified. We can leverage this to provide an efficient mechanism for monitoring that was not possible in 3.10 or earlier.

IMO it would be very useful for an API of this nature to expose low-level interpreter state to monitoring tools. For example, if we are able to see events like INSTRUCTION, it would also be useful to have access to the operands involved.

scoder commented 9 months ago

Do you want to propose an API for that?

In order to map source/line/column information to an offset, we need a way to build a co_linetable for a code object: https://github.com/python/cpython/blob/main/Objects/locations.md That's probably the most complex thing to do here. IIUC, CPython implements this in C here: https://github.com/python/cpython/blob/d67f947c72af8a215db2fd285e5de9b1e671fde1/Python/assemble.c#L190-L444

That's purely internal code. I don't think we need C-API support for this, but a Python implementation would be nice. Or a static method on code objects to build the byte string from a Python list of positions. Given the complexity of the format and the risk of future changes, I think it's in CPython's responsibility to generate this format from something user friendly.

compile.h defines the location like this:

/* source location information */
typedef struct {
    int lineno;
    int end_lineno;
    int col_offset;
    int end_col_offset;
} _PyCompilerSrcLocation;

This looks like a function could help that accepts a sequence of tuples of these four values and maps it to a bytes object with the corresponding line table in it.

We'd probably still have to reimplement it in Cython in order to generate the line table string in "older" Python versions. But we could at least rely on it from CPython 3.13 on.

scoder commented 9 months ago

@markshannon, could you please comment whether you consider this the right approach? To me, a line table seems required to generate source level monitoring events.

scoder commented 9 months ago

@markshannon any comments? Given that Python 3.12 broke an entire C-API feature, I consider it a blocker for 3.13 to fix it.

markshannon commented 8 months ago

What C-API was broken? AFAICT there was no API to call the tstate->c_tracefunc and tstate->c_profilefunc functions. How were you calling those functions, and why does it no longer work?

markshannon commented 7 months ago

C extensions can already register callbacks and set events using the Python API, so I assume we are talking about adding an API for C extensions to fire events.

There will be two parts to such an API:

  1. Specifying the mapping from "code objects" and offsets to full locations.
  2. Firing events from C extension code.

Handling locations

Apart from LINE events, all event callback functions include code: CodeType, offset: int. Let's not force C extensions to build a full code objects, so we should relax code to be a "code like" object. We define exactly what "code like" means later.

We also need a specification for the table mapping offsets to locations. The existing table format is quite fiddly, and can only be searched linearly, so designing a new table format might be a good idea.

Firing events from C extension code

Any potential API has to be more complex than for sys.settrace as we need to handle disabling of event locations and multiple tools.

We will need 16 bits of data for each event location, to track active tools and which tool/location pairs are disabled.

We will need to initialize this data, and to specify which event location maps to which event.

With that in mind, and using the PY_START event as an example, an API could look something like this:

typedef struct _PyMonitoringState {
    uint8_t active;
    uint8_t opaque;
} PyMonitoringState;

typedef struct _PyCodeLikeObject PyCodeLikeObject;

/* Returns a version, which should be stored and passed to the next call of _PyMonitioringEnterScope.
   Arguments:
       previous_version: Points to a per code-like object value. The value must be set to 0 before first call per code-like object.
       state_array: The array of all PyMonitoringStates for this code-like object. 
                    Should be initialized to all zeroes before first call.
       event_types: Array of the event types describing the event type for each state in state array.
       length: The length of the state and event arrays, which must be the same.
*/
void _PyMonitoringEnterScope(uint64_t *previous_version, PyMonitoringState *state_array, uint8_t event_types, uint32_t length);

int _PyMonitoring_FirePyStartEvent(PyCodeLikeObject *codelike, uint32_t offset, PyMonitoringState *state);

If performance is going to be an issue, then the above could be implemented as inline functions. We would lose ABI compatibility, but would still retain API compatibility across CPython versions.

Example usage.

Suppose we have a code-like object that that should fire two events when called, PY_START then PY_RETURN.

We would need to describe the events:

static const uint8_t EVENTS = { PY_START, PY_RETURN };

And the code-like object would need an array for the state and a version number.

struct MyCodeLike {
    PyObject_HEAD
    PyMonitoringState monitoring_state[2];
    uint64_t version;
    /* Other fields */
};

The C code for the function:


void init_code(MyCodeLike *code)
{
     code->monitoring_state = {0, 0};
     code->version = 0;
}

PyObject *call_code(MyCodeLike *code)
{
    _PyMonitoringEnterScope(&code->version, code->monitoring_state, EVENTS, 2);
    ...
    _PyMonitoring_FirePyStartEvent(code, 0, &code->monitoring_state[0]);
    ...
    _PyMonitoring_FirePyReturnEvent(code, 1, &code->monitoring_state[1]);
    ...
}
encukou commented 7 months ago

@scoder, would such API work for you?

scoder commented 6 months ago

Hmm, looks like I didn't receive any email notifications for this ticket although I created it, am subscribed to it, and got explicitly mentioned. No idea why. Sorry. I've re-subscribed, just in case.

@markshannon, I've done some guesswork to fill in the gaps of your proposal. Please correct me if I misunderstood something.

Comments from me:

markshannon commented 6 months ago

The version is essentially a global counter of listener registrations which lead to different event types being handled (or not) over time.

More or less, yes. It is changed when the set of events being monitored changes. Listener registrations do not change it.

Each Fire call then receives the code-like and essentially an integer ID as offset for a given "code position", which refers to a specific entry in the monitoring state array.

The ID also serves as an index into the locations table.

The Fire function looks if the monitoring state is marked as active and if so, creates and pushes an event with that offset ID to the listeners.

Broadly, yes. There are up to 8 possible listeners for events, so if the state is non-zero it represents a bit vector of active listeners.

I fail to see why you need both the pointer to the array entry and its offset in the Fire calls.

I'm assuming that the layout of the code-like struct is opaque to the VM, it is up to Cython/pybind11/mypyc, etc. how to lay it out, so we can't deduce the offset from the pointer to the monitoring state. Thus we need both.

How will listeners know the file and line number from the notification?"

From the (as yet unspecified) location table and the offset ID.

The code-like struct would need a file path field as well, to map events to source files (if applicable).

Would one location per code-like object be enough?

It looks like MyCodeLike could just be zeroed out with memset() for initialisation.

MyCodeLike would include location information, but we can design the monitoring state struct so that it should be initialized to zero.

scoder commented 6 months ago

Would one location per code-like object be enough?

Probably not. If the API requires a "start scope", "work in scope", "end scope" kind of state keeping, then Cython's include statement would get in the way, which allows injecting source code from other files. And include features are probably common in templating languages etc.

While it's probably possible for user code to juggle with different code-like objects for a single code scope, it would be nicer if the API kept code scope and file position independent from each other.

iritkatriel commented 6 months ago

typedef struct _PyCodeLikeObject PyCodeLikeObject;

What is _PyCodeLikeObject and where is it defined? I'm confused by this and the later discussions on MyCodeLike (which is user-defined).

markshannon commented 6 months ago

What is _PyCodeLikeObject and where is it defined? I'm confused by this and the later discussions on MyCodeLike (which is user-defined).

It is defined by the third-party code (generator). It can be any object, but will need to support certain attributes and methods so that it looks like a Code object to coverage, profile, inspect, etc.

iritkatriel commented 6 months ago
  • The EnterScope call will look at the version, and if it was updated since the last call with these states, re-initialises the monitoring state array, marking currently audited events as active based on their event type. Otherwise, returns quickly.

How does it determine that the version changed since last time (where does it store the previous version)?

(I got an answer offline)

scoder commented 5 months ago

I've started working on an implementation in Cython and it seems that frames are still implied to exist for the PY_START, PY_RETURN and PY_YIELD events, according to https://docs.python.org/3/library/sys.monitoring.html#events

I understand that the description makes sense from within CPython when targeting consumers, and Cython has long faked Python frames for its functions, but for external event generators, suggesting that Python execution frames exist just because there's a function start/end event or a generator-like function running seems rather demanding. The Limited C-API doesn't expose frames at all.

Should we just update the documentation here to explain that frames only exist when events originate from the Python eval loop?

I can't really say how consumers would deal with these events. There's always a frame on the stack, just not necessarily the one of the currently running code. That might become confusing.

scoder commented 5 months ago

Regarding @encukou's question here, I agree that the states and versions fit more with the idea of a frame. Code objects are static for a given piece of code, whereas the states must be independent for each (parallel) call, and the versions are tied to them.

Also, I noticed at some point that e.g. coverage uses dicts to map code objects to collected runtime state. Thus, the code object really needs to be the identical object across calls, which rules out keeping local execution state in it. (EDIT: This could be worked around by adding hash+eq to the code-like object implementation – making it even more complex. And assuming that tools really use hash+eq and not identity matches, as would be possible for Python functions.)

gvanrossum commented 4 months ago

@iritkatriel Is this still a (deferred) release blocker?

scoder commented 3 months ago

Regarding the interaction with the existing sys.set_trace() API, I tried running coverage 7.5.1 which still uses the old tracing API. I sent a PR to fix the listener arguments when generating of LINE events in https://github.com/python/cpython/pull/119179

After that, I got assertion failures in CPython when generating LINE events in sys_trace_line_func, file legacy_tracing.c:

    assert(args[0] == (PyObject *)_PyFrame_GetCode(frame->f_frame));

The old API depended on frames to provide line information, the new one doesn't. The effect is that event creators now need to know whether there are listeners of the old API (which they probably can't know), or they always need to set up frames just in case, which is quite an annoyance given the new shiny monitoring API.

Any idea what we can do about this?

markshannon commented 3 months ago

We can remove the assert, nothing will crash. However, there might be an issue with the frame passed to the sys.settrace callback function, as it will be the frame of the closest Python function, not the Cython function.

How did Cython work with profiling prior to 3.12? Did you create a fake frame?

scoder commented 3 months ago

How did Cython work with profiling prior to 3.12? Did you create a fake frame?

Yes. We could keep doing that, but a) frames aren't "officially" exposed from the C-API and b) as I understand it, the goal of creating the monitoring C-API was to allow non-Python non-frame code to interact with the tracing infrastructure, so requiring frames seems counter-productive.

I now found that coverage.py was actually updated to work directly with sys.monitoring, so I tried that and ran into some more issues:

  File "/home/stefan/source/Python/cython/cython-git/venv/py314/lib/python3.14/site-packages/coverage/sysmon.py", line 337, in sysmon_py_start
    sys_monitoring.set_local_events(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        self.myid,
        ^^^^^^^^^^
    ...<7 lines>...
        # | events.JUMP
        ^^^^^^^^^^^^^^^
    )
    ^
SystemError: cannot instrument shim code object 'func1'

The exception originates here and seems to trigger on generally all code objects that do not have byte code: https://github.com/python/cpython/blob/de19694cfbcaa1c85c3a4b7184a24ff21b1c0919/Python/instrumentation.c#L1971-L1974

set_local_events() also has this code, requiring the "code-like object" to be an actual code object: https://github.com/python/cpython/blob/de19694cfbcaa1c85c3a4b7184a24ff21b1c0919/Python/instrumentation.c#L2253-L2259

I'm afraid that we might still run into a lot of similar assumptions about what is traceable code.

scoder commented 3 months ago

Changing the code from if (code->_co_firsttraceable >= Py_SIZE(code)) { to if (code->_co_firsttraceable && code->_co_firsttraceable >= Py_SIZE(code)) { lets me get past the check and continue.

iritkatriel commented 3 months ago

Should we make sys_trace_line_func and other functions in legacy_tracing.c create a fake frame if frame passed in is NULL?

iritkatriel commented 3 months ago

What are the open issues here currently?

iritkatriel commented 5 days ago

I'm closing this as complete. Let's create new issues for any followup work.