ocaml / ocaml

The core OCaml system: compilers, runtime system, base libraries
https://ocaml.org
Other
5.19k stars 1.06k forks source link

Fix `snprintf_os` use in `runtime_events_consumer.c` #13089

Closed eutro closed 1 month ago

eutro commented 1 month ago

This PR fixes an incorrect use of snprintf_os for formatting the runtime events ring file path in the implementation of otherlibs/runtime_events. Specifically, this changes the call to use the OS string rather than a char * copy, which both matches how it is constructed in runtime_events.c:

https://github.com/ocaml/ocaml/blob/a7afade7ad7cf1cffc21579e7ab2cd821122376b/runtime/runtime_events.c#L96 https://github.com/ocaml/ocaml/blob/a7afade7ad7cf1cffc21579e7ab2cd821122376b/runtime/runtime_events.c#L264-L265

and is actually correct. It also has the benefit of removing a single alloc/free pair on all OSes.


This bug only manifests on Windows, where char_os is not char, and thus was unlikely to have been tested by anyone (before me, I suppose[^1]). Windows has further issues with this API, in how the Unix library handles PIDs on Windows (as HANDLEs casted to int, rather than real PIDs, see a similar issue in #11021), which makes some existing uses of Runtime_events.create_cursor still broken on Windows, with no unambiguous fix.[^2]

At a glance, there seem to be further bugs in this function too. For instance, runtime_events_loc is unconditionally initialised to caml_stat_alloc_noexc(RING_FILE_NAME_MAX_LEN), but isn't freed on the code path with pid < 0, nor does it seem to be freed at all if the function completes unexceptionally. I'm also slightly weary of the use of int for the PID[^3].

I don't think a change entry is needed for this fix specifically, but it could be linked to the change entry with a bug fix for the above.

[^1]: I've successfully attached to another process using runtime events on a build of OCaml that includes this fix as well as another to fix the PID issue, but I could produce a program that attaches with just this fix, if desired.

[^2]: There are many different options: This function could perform the HANDLE to PID conversion, however that would fail if the user does provide an actual PID. The Unix library could also be changed: a pidfd type; providing a way to get the real PID; returning actual PIDs even on Windows; etc. Another option would be to sidestep the use of PIDs altogether, and add something like OCAML_RUNTIME_EVENTS_FILE to specify an exact file.

[^3]: It should work fine on Linux (PID_MAX_LIMIT is 2^22 on 64-bit systems, according to proc(5)) and macOS (according to this SO post, PIDs don't go higher than 99998). However it could be an issue on Windows, either if we interpret it as a HANDLE (which is a pointer), or as an actual PID (Raymond Chen claims to have seen PIDs in the 4 bn. range in this SO comment).

eutro commented 1 month ago

This is a bug fix, so I do think it deserves a Changes entry.

It's turning out to be harder than the bug fix itself...

By the way, how did the bug manifest for you?

I was trying to get https://github.com/tarides/runtime_events_tools/ working on Windows and when I fixed the PID issue I ran into it formatting the completely wrong path anyway.

nojb commented 1 month ago

Thanks for adding the Changes entry. I tweaked it to mention the terms that may be relevant for the ocassional Changes reader ("Windows", runtime_events library).