Add threadref.finalize, like weakref.finalize

mentalisttraceur commented 4 years ago

Would be very nice to add a threadref.finalize, analogous to weakref.finalize the way threadref.ref is analogous to weakref.ref.

mentalisttraceur commented 4 years ago

Initial implementation subclassing weakref.finalize is very easy, simple, and works.

However, the problem is how to best handle the portability problems.

mentalisttraceur commented 4 years ago

The "ideal" implementation, assuming no portability concerns and the latest Python features:

from weakref import finalize as _finalize

class finalize(_finalize):
    def __init__(self, func, /, *args, **kwargs):
        try:
            anchor = _threadlocal.anchor
        except AttributeError:
            anchor = _threadlocal.anchor = _Object()
            anchor._thread = _current_thread()
        super().__init__(anchor, func, *args, **kwargs)
    def __repr__(self):
        info = self.peek()
        if info is None:
            return f'<threadref.finalize {id(self)} dead>'
        thread = info[0]
        return f'<threadref.finalize {id(self)} {thread!r}>'

mentalisttraceur commented 4 years ago

In one sense of "ideal", when possible, the ideal way to deal with portability issues when writing code is:

For language features like syntax, assume a transpiler exists from new to old and will be ran on your code when installing that code on systems that need it.
For library features, assume a polyfill exists and will injected when setting up the environment, deploying the code, or initializing the runtime.

In other words, ideally, developers would be free to use language constructs which best express the logic and intent of their code, and portability to older or less complete implementations would be an independent, neatly decoupled problem.

mentalisttraceur commented 4 years ago

Per the last comment, the ideal way to deal with this problem at the level of this library is to simply act as if there is no problem: just import weakref.finalize and use f-strings and positional-only arguments.

Of course, the problem is that currently, that ideal tooling doesn't exist. So I think in practice that translates to just releasing a package that only works on the latest Python versions.

mentalisttraceur commented 4 years ago

Now, obviously we don't really need positional-only arguments or f-strings. Those are mostly aesthetic touches. They make the code clearer.

(They allegedly make the code faster too, but it is trivial to imagine a sufficiently advanced optimizing compiler that could easily produce comparable code from what portable alternative I would write instead - and if performance matters, you use something like PyPy or Cython with a modern C compiler, and those technologies are well-positioned to have the kinds of optimization logic that would do that.)

Since string formatting/interpolation is only used in __repr__, not anywhere intended to be in a hot loop, we can obviously just replace that with more portable string concatenation - which is what I currently do anyway.

Positional-only arguments are only needed in init, but we can emulate them like this:

def __init__(*args, **kwargs):
    def __init__(self, func, *args):
        return self, func, args
    self, func, args = __init__(*args)

Using a function with the same name to split *args lets us use Python's own facilities for raising a TypeError with the right message if function arguments don't match its signature instead of manually doing it.

(If we didn't need standard error behavior, we could just do the more concise and direct self, func, args = args[0], args[1], args[2:] and get a ValueError. But I think in this case, a Pythonic TypeError is worth the two lines of extra boilerplate needed to implement it.)

So we can get the portable version of f-strings and positional-only arguments easily enough.

mentalisttraceur commented 4 years ago

One notable difference between positional-only arguments and string formatting is that the method of string formatting is purely an implementation detail that has no observable side-effects within the semantics of Python, while positional-only arguments are visible to anyone who inspects the function signature, such as when calling help on the module or class or method, or in an IDE.

Instead of __init__(self, func, /, *args, **kwargs) you get just __init__(*args, **kwargs). Minor, but not ideal.

With something like compose.__call__, this was fine because the only positional-only argument implicitly hidden this way was self, which the caller does not normally pass explicitly, and so the signaturee __call__(*args, **kwargs) is still very intuitive and self-describing. Arguably even more intuitive and self-describing than __call__(self, /, *args, **kwargs). Unless you have that attention to detail and correctness at the right moment, so that you think "wait, where's the self argument?" or "how did they manage to get a self-less signature on a method?", you might not even notice anything off about it.

But with threadref.finalize, if we did this, the programmatically inspectable signature would be hiding a function parameter that the caller must pass in: the func argument. *arg implies that all arguments are optional, but a finalizer's func argument is required. The docstring can help clarify this, but it's still rather non-obvious and takes some thinking and trust or investigation to reconcile the docstring with the signature.

One way to solve this - and personally this is the way I would prefer - is to outright reject this pattern of taking both a callable, and arguments for that callable.

This is what closures and lambda and functools.partial are for! Those are elegantly decoupled and composable solutions to the problem of binding some arguments to a callable so that they can be passed through together.

But Python is full of interfaces that convolute themselves just so that they can pass through arbitrary arguments along with callables. No other language community I know of does this so widely, and Python is uniquely ill-suited for it because a keyword argument can become a positional argument and vice-versa, so almost uniquely to Python, argument names in your signature are part of your public interface, and collisions are likely any time someone forgets this or doesn't/can't use positional-only arguments.

We really, really ought to stop. The only reason I am even considering perpetuating this pattern in threadref.finalize is because I felt like it would be bad to surprise my users by deviating from the interface of weakref.finalize.

But if it were up to me, the signature would be __init__(self, function), and if you needed to pass arguments through you'd use partial application or a lambda expression or whatever else, and you'd like it.

Yet another alternative is to simply fall back to __init__(self, func, *args, **kwargs), but I think this is the worst alternative because it is the only one that has a landmine of irregular behavior in the interface - it works consistently until one of the keyword arguments you want to pass through is func, and then suddenly you get different behavior. That's awful interface design, and I'm only mentioning it for completeness.

mentalisttraceur commented 4 years ago

Anyway, I've already developed a workable pattern for packaging different versions depending on Python version (see for example the setup.py and Makefile in with-as-a-function, so if I wanted to use different variants of the implementation for different versions of Python in the PyPI package, I could.

So to be clear, that isn't the issue. The technical side is solved. The issue is deciding what variants to bother implementing and packaging.

mentalisttraceur commented 4 years ago

So to summarize two outstanding questions:

Package a separate Python >=3.8 version which takes advantage of positional-only arguments, or no?
Pass *args and *kwargs through, or no?

(One other possibility that I already ruled out is "yes to the first, no only for the backwards compatible variant without positional-only arguments for the second" - I think inconsistency of interface between Python versions is very bad for users.)

mentalisttraceur commented 4 years ago

Also, since in Python 3.8 weakref.finalize made func a positional-only argument, there is no keyword-argument signature compatibility across Python versions.

Similarly, because weakref.finalize takes a referent as its first positional argument and threadref.finalize won't, the only signature compatibility between them is keyword-argument.

In other words, the only reason why someone would want us to preserve the same argument name (func) is for

passing it as a keyword argument (which is incompatible with version that implement it as a positional-only argument), or
signature introspection and code based on it (which we arguably have no substantial obligation to support).

This doesn't really change much, but the first half of that feels freeing (and the second half, restricting) when it comes to choice of name if implementing the function such that the func argument is named and in the signature.

mentalisttraceur commented 4 years ago

But the bigger and harder question I saved for last: what to do about importing weakref.finalize?

Again, the ideal is we just unconditionally import it, and have a class that inherits from it, and move on. Then I would say "if that doesn't work for you, install or implement a polyfill that has enough functionality for your use case". And if you are only using threadref.ref and never actually use threadref.finalize, then a sufficient polyfill for your use case might be as simple as weakref.finalize = object.

This has ripples for the question about whether to support *args and **kwargs passthrough: if threadref.finalize doesn't, then any weakref.finalize polyfill doesn't have to deal with it either.

Let's sorta walk through the list of Python versions and implementations:

CPython: weakref.finalize available as of 3.4. All versions lower than that are deprecated upstream. You can reliably polyfill anything on the standard modules. There is a backport available for Python 2, which officially supports Python 2.7 but at a quick skim should support down to 2.5, and with some minor manual effort could go even lower. So I could in good conscience say "if you are on CPython, and you need threadref, then the lack of weakref.finalize is your problem.

PyPy: Committed to supporting Python 2.7 indefinitely. This is annoying, but per the last paragraph, there is a backport, and it officially supports PyPy.

I presume IronPython and Jython and so on are approximately covered by the logic that covers CPython and PyPy.

MicroPython: no weakref support, no threading support. If you wanted threadref on there, you'd have to start by implementing threading and then you might be able to bypass using weakref for the implementation anyway. One unfortunate thing is that on MicroPython (at least for built-in modules), it seems you cannot add new attributes onto a module from within the Python code. Fortunately, if you are deploying MicroPython, you should be able to patch any module as part of your implementation anyway.

Brython: not sure if threadref even works on there, but it has all the relevant imports. I'm fortunate enough to not care enough to dig into it further. I imagine that like Skulpt, ultimately either threading and weakref are dummy implementations that preserve superficial behavior, or ultimately you'd have to do something completely different there anyway.

In fact let's say that's the case for all JS implementations of Python: the actual implementations of "threads" might actually be something more JavaScripty like "web workers", and it might be possible to bypass using a weakref-based implementation entirely.

mentalisttraceur commented 4 years ago

One further consideration from a completely different angle, which the consideration of other Python versions has raised: maybe I really shouldn't be doing a subclassing implementation.

In a perfect world the implementation wouldn't really depend on a weakref to a real object (and in fact Python's standard library's pure Python fallback for thread local objects actually relies on a weakref to the threading.Thread of the current thread! Which means that the current threadref implementation would add a cycle that makes threadref detection of thread stopping depend on garbage collection! (Which is technically optional and potentially turned off!)

CPython actually uses a different C implementation of thread local objects. Empirically, based on testing, they don't have the same problem. But this is an argument for either making threadref.ref not dereference to the threading.Thread object of its thread (because in order to do that, it must keep a reference to it alive somewhere), or at least making the internal "anchor" objects hold a weak reference to the thread object. (Actually a quick test suggests that when using the pure Python implementation of threading.local (the one in the _threading_local module), the threadref.ref callbacks are simply not called - at least not immediately, and not when gc.collect() is manually called, so it's either even less portable than I thought, or my testing is getting something wrong, even though it worked right with the same test logic with the default C threading.local implementation.)

Anyway, so another question this sorta opens up is: should threadref embrace its weakref implementation as part of its nature/interface (in which case, it maybe should just implement finalize as a subclass of weakref.finalize) or should it embrace being merely semantically like "weak references to running threads" (in which case, it should maybe implement its own implementation of finalize that is independent of the weakref one, so that other ways of implementing thread done-ness detection are easy to plug in).

The latter choice requires more thought and effort and I very much want to currently avoid that for lack of time, but perhaps eventually?

mentalisttraceur commented 4 years ago

That last tangent circles back to this issue's main question (do I even add a threadref.finalize?) by bringing focus back to "what is the point of this library, specifically?"

Is the point to provide a generic "callbacks for thread doneness" interface? Or is the point to provide one specific Implemention upon which such a thing can be built?

I think for me, right now, the answer is the latter. The point is to capture this tiny useful hack in a package: a thread local variable is collected when a thread finishes, so a weakref to it fires its callback when the thread finishes.

Maybe I should document this better in the package README but I think this clarifies everything for me: no threadref.finalize for now! This package is too much a low level hack "saved in case it is ever again actually needed" to be putting this much effort into it.

(But just for the record, and as a reminder to future me, while I was still feeling that I should add a threadref.finalize, I was leaning towards a single implementation variant which unconditionally imported weakref.finalize and did not do argument passthrough.)

mentalisttraceur commented 4 years ago

So for now I have decided not to implement this, but if anyone is actually using this and finds themselves wanting this feature, I guess let me know.

(Final reminder to self: reconsider if threadrefs should even reference their thread or just be opaque.)

mentalisttraceur / python-threadref

Add threadref.finalize, like weakref.finalize #1