New tasklet attributes tasklet.trace_function and tasklet.profile_function

ghost commented 10 years ago

Originally reported by: Anselm Kruis (Bitbucket: akruis, GitHub: akruis)

I added two attributes to class tasklet: tasklet.trace_function and tasklet.profile_function. These attributes are the tasklet counterparts of the standard functions sys.gettrace(), sys.settrace(), sys.getprofile() and sys.setprofile(). With these attributes it is now possible to control tracing completely using the schedule callback. An example is given in the documentation and in Stackless/demo/tracing.py.

The implementation also changes slp_schedule_task_prepared / slp_restore_tracing to modify tracing related members of PyThreadState only with official API functions. This prevents a ref-counting problem and prevents incorrect values of the global flag _Py_TracingPossible in ceval.c

Bitbucket: https://bitbucket.org/stackless-dev/stackless/issue/43

ghost commented 7 years ago

Original comment by Anselm Kruis (Bitbucket: akruis, GitHub: akruis):

Removing milestone: 2.7.6-slp (automated comment)

ghost commented 10 years ago

Original comment by Anonymous:

Yeah, it is all about existing debuggers. Their support for Stackless is quite limited. Let's change it! With the recent changes it's much simpler to add Stackless support to any debugger.

Where I would suggest to use the post-switch callback as a debugger entry point, since that is not a place "inside" the stackless machinery, which IMO is no good context to attach a debugger to.

ghost commented 10 years ago

Original comment by Anselm Kruis (Bitbucket: akruis, GitHub: akruis):

So, while I can understand the need to make intermediate fixes to stackless to address issues with existing debuggers, we really need to make debuggers work better with stackless, since it will always require some knowledge of "current tasklet" in the callbacks.

Yeah, it is all about existing debuggers. Their support for Stackless is quite limited. Let's change it! With the recent changes it's much simpler to add Stackless support to any debugger.

ghost commented 10 years ago

Original comment by Kristján Valur Jónsson (Bitbucket: krisvale, GitHub: kristjanvalur):

#!text

 This would not be possible without the per tasklet tracing we currently have. If Python had global trace/profiling flags only, IMHO many embedded and closed source use cases would be much harder to implement.

I don't get it. How is this different from regular C profiling? I can take Microsoft Word and run it under VisualStudio profiler. It will tell me a bunch of what the application is doing, but it will be somewhat unhelpful since I don't have debugging information for word.exe.

If the PyDev debugger (or any debugger) has problems with source files that don't have corresponding .py files, then this is, IMHO, a problem with the debugger itself...

So, while I can understand the need to make intermediate fixes to stackless to address issues with existing debuggers, we really need to make debuggers work better with stackless, since it will always require some knowledge of "current tasklet" in the callbacks.

ghost commented 10 years ago

Original comment by Anselm Kruis (Bitbucket: akruis, GitHub: akruis):

I just made final change to the demo/tracing.py example. It now uses from __future__ import print_function and avoids any assumptions about the current tasklet during the execution of the schedule callback.

ghost commented 10 years ago

Original comment by Anselm Kruis (Bitbucket: akruis, GitHub: akruis):

Two final comments:

Global tracing/profiling flag

This schedule callback preserves the profile/tracing function over tasklet switches.

#!python
def schedule_cb(prev, next):
    if prev and next:
        next.profile_function = prev.profile_function
        next.trace_function = prev.trace_function

If this impacts performance too much, you can still

add a similar fast schedule callback as a C-function
add a global trace function to ceval.c. Such a global trace function would be totally independent from Stackless.

Tracing in flowGuide

flowGuide is our product to control HPC computations. flowGuide is entirely written in Stackless Python 2.7 and flowGuide is closed source. Our customers get *.pyc files (except for libraries). To define a HPC computation a customer writes Python code, that must follow certain conventions. flowGuide executes this code as tasklet.

A customer wants to trace his Python code without getting annoying messages from the debugger about unavailable source files for the closed source parts of flowGuide. Additionally we need remote debugging functions, because the code runs somewhere in a data center. Therefore we added code to flowGuide to closely control tracing of customer provided code. This would not be possible without the per tasklet tracing we currently have. If Python had global trace/profiling flags only, IMHO many embedded and closed source use cases would be much harder to implement.

flowGuide supports the PyDev debugger, but the addition of Stackless support to PyDev is an unrelated work of Fabio Zadrozny. Of course I was very pleased and I try to support him as good as possible.

p.s. We should document that profiling effectively disables soft-switching.

ghost commented 10 years ago

Original comment by Kristján Valur Jónsson (Bitbucket: krisvale, GitHub: kristjanvalur):

Yes. Python has a per-process GIL. Multiple interpreters do share the GIL and some other globals. Don't know what the use case is for multiple interpretrers. Certainly, for example, two different independent libraries in the same process, both with their own private use of python, cannot independently initialize python using Py_Initialize. I think they need to use some interpreter-specific apis to start new interpreter instances. Very confusing.

ghost commented 10 years ago

Original comment by Anonymous:

in fact, the GIL is global for all interpreters.

http://stackoverflow.com/questions/1585181/is-the-python-gil-really-per-interpreter

I'm trying to find examples of using more than one interpreter, but to no avail, yet.

ghost commented 10 years ago

Original comment by Anonymous:

Ah, do interpreters share a common GIL?

I did not think about this and thought we could have a GIL for each interpreter.

I don't know of a usecase, never saw one. Exactly that is what I want to explore for Gillespy, and for that the GIL must be split per-interpreter, to allow them to run freely. Yeah, that requires more isolation between interpreters.

Probably it is better to have the tracing global flag per interpreter.

There may be now only a single lock, but that might change...

ghost commented 10 years ago

Original comment by Kristján Valur Jónsson (Bitbucket: krisvale, GitHub: kristjanvalur):

I think there are two reasons why this is a thread-local thing in cPython:

The threadState is a convenient place to put this.
When threads were added, getting interlevaed eneter/leave calls from different threads on the same callback would have broken existing profilers/tracers.

I think I could probably create a patch on the tracker making tracing global. Not sure if it should be truly global or interpreter global. In fact, I don't know what the use case is for multiple interpreters, are they supposed to be completely separate? After all, they share a gommon GIL.....

ghost commented 10 years ago

Original comment by Anonymous:

Actually it looks reasonable to have a global tracing flag, and I too don't see why it is per thread. Maybe it evolved over the years and was not that interesting to change for the few threads.

Other for tasklets: there you can have very many, and it is not clear what fits all needs.

It would be great to have a global trace thing for stackless.

It would also be great to have a special function for just one tasklet, without the need to write a handler that filters all other tasklets away.

I am a little bit influenced by the logging module. There you can have a simple way to log things globally, or you can control fine-grained what to log when and where. Maybe we could add something similar to tracing?

Tracing could be a tasklet-local function that defaults into calling the per-thread function, that defaults to ask the global function, something like that. Such a decision is probably tri-state: I want to do tracing, I do not want it at all, or I do not care/ask up in the hierarchy.

Still thinking about a most flexible, efficient and easy-to-use thing.

I think the lack of a python global trace should not stop us from extending in either way, global and local. Instead I think this could become a proposal on py-dev. Although in these days I'm in favor of doing it, instead of asking :-)

ghost commented 10 years ago

Original comment by Kristján Valur Jónsson (Bitbucket: krisvale, GitHub: kristjanvalur):

Hi Anselm.

I agree that it is unfortunate how we preserve this stuff differently for tasklets. We could possibly have an "extra state" object that can be added to tasklets that have unusual state, such as an exception state or special trace flags. We could migrate some of the tasklet state to that if we wanted.

As for the per-tasklet/pert-thread tracing state: cPython's api does not have the possibility (as far as I know) of modifying this for anything but the current thread. So I'm not sure why we need it for tasklets. Also, is there a use case for having different functions for different tasklets? If not, then just having a "enabled/disabled" flag for each tasklet is sufficient, and relying on the callback in the thread state is useful.

Anyway, the reason I'm harping on about this is not that I inherently dislike your idea. It is that I really want it to be possible, in the future, to easily turn on profiling for an entire program with one single setting, e.g. when you want to watch a running stackless program for 10 seconds to gather profiling information, and then leave it running. Something that we have sometimes done for our live servers. With the globaltrace flag, this was easy. I just want to make sure that any changes we do don't make it more complicated to add such a feature later.

In fact, I am not sure why cPython even bothered having the trace flags a per-thread property. When you're profiling a c program, you usually want to profile the entire program, not individual threads. Is there, in fact, a reason why you would want to profile just one thread, or just one tasklet? It will give you incomplete information, will it not? We added the current per-tasklet extension of the traceing already (stackless considers sys.set_trace to be a tasklet local function, much as it is thread local in cPython). But I wonder if we shouldn't have gone the other way, consider sys.set_trace to be interpreter global!

ghost commented 10 years ago

Original comment by Anselm Kruis (Bitbucket: akruis, GitHub: akruis):

Hi Kristján,

of course you're right. I should have discussed this topic here in more detail. And sorry for my late answer, but I'm currently in the mountains of the Bavarian forest and have only a very limited internet access.

Actually I considered a global trace switch, but I don't think it fits to the standard Python API very well. In Python tracing/profiling is a per thread property and this can be naturally extended to tasklets. At my company, in our workflow engine we actively use to possibility to trace only a particular tasklet. This is a special case of different trace functions per tasklet. A global switch would be a valuable addition, but does not replace an interface at the tasklet level.

In every case Stackless already deals with the per thread tracing state. The code works fairly well, but lacks an option to set the tracing on a non-current tasklet. Standard python provides the 4 functions sys.[get|set][trace|profile] as a per thread interface to tracing. The natural extension of this interface to tasklets is a read/write property for tracing and for profiling. Therefore I don't think it is a big deal to add these properties.

Of course we could decide to remove the per tasklet tracing state, but that wouldn't be a big win. We still need to preserve the exception state. But I like your stackless.globaltrace idea. Indeed it simplifies the most common use cases.

In between I have completed the work on the new properties and I'm ready to push a final commit. This commit adds support for getting / setting the trace properties on hard switched tasklets. It does so by locating the tracing-variables in the cstate stack. If it is OK, I'll push this commit.

A more technical point: the current mechanics for preserving thread state on tasklet switching are really a bit arcane. For soft switching we use 2 different cframes and for hard switching we have local variables in transfer_with_exc. I would like to simplify and unify this code for future versions of Stackless (i.e. 2.8 and 3.4). I propose to store thread state of a non-current tasklet in the tasklet object itself. This would increase the size of the tasklet object a bit (1int, 7pointers) but we could get rid of a lot of code and the soft-switching performance will be little better. I have first code for this, but it is not yet ready. Currently the code is commented out by #ifdefs.

ghost commented 10 years ago

Original comment by Kristján Valur Jónsson (Bitbucket: krisvale, GitHub: kristjanvalur):

(I also would really like if we could discuss changes like these before rushing to implementing them. It is not considered good practice for a developer to create tickets, assign them to themself, and then implement them. Some coders were notoriously using the defect tracking at CCP to allow them to sneak in their own pet features that developers are now prohibited from creating tickets, only QA can do that :) )

ghost commented 10 years ago

Original comment by Kristján Valur Jónsson (Bitbucket: krisvale, GitHub: kristjanvalur):

Did you take my suggestion into consideration, Anselm? I personally think that if you want to enable profiling/tracing for the entire program, using the scheduling callback is a particularly cludgy way to do it. It would be much nicer if we just told stackless to have the trace/profile callback either apply to the entire thread, or just to the current tasklet. I don't think there is a need to have separate callback values per tasklet because there really is no conceivable use case where you would want to have different callbacks for different tasklets.

Our CCP patch has a simple attribute, stackless.globaltrace, which when True, makes stackless not juggle the tracing callbacks when it switches context. Am I correct in thinking this will achieve what you are trying to do wrt. dev env support?

ghost commented 10 years ago

Original comment by Anonymous:

Yes, the mixture of hard- and soft- switching is mind bending, and it always takes a lot of time to get into it. The perception of "where am I" is totally different when moving between the hard and soft model. It was also hard to understand when a switch really is done and when I am the old and when the new tasklet. Just some memories...

ghost commented 10 years ago

Original comment by Anselm Kruis (Bitbucket: akruis, GitHub: akruis):

Hard switching is hard. :-) I'll commit a third version of the code soon, if I have internet access. I'll stay the next few days at the farm house of my parents in law in the Bavarian Forest. Last year the internet connection was very poor.

ghost commented 10 years ago

Original comment by Kristján Valur Jónsson (Bitbucket: krisvale, GitHub: kristjanvalur):

A local modification that we have at CCP is stackless.globaltrace flag. This means that turning on sys.settrace etc. will affect all tasklets too.

This is very useful in our case, because what we use these for (profiling mostly) is to tur n on tracing/profiling in a long-running program, for a bit, to get a sense of what it is doing. we want the ability to flip the state of the entire program back and forth and not have to iterate over various different threads/tasklets to do so. This is similar to how you would profile/debug/trace a c program. Either it is currently being monitored (the entire program) or not.

I never pushed that change becaused it seemed a bit intrusive and was also implemented in a somewhat ad-hoc manner, but I wonder if you could see some such mechanism possible using your new paradigm?

ghost commented 10 years ago

Original comment by Anselm Kruis (Bitbucket: akruis, GitHub: akruis):

The commit id for 2.7-slp is 92a8e005e75c.

For 3.x we should convert the example code to Python 3.

stackless-dev / stackless

New tasklet attributes tasklet.trace_function and tasklet.profile_function #43