Closed mikehearn closed 8 years ago
In general, I'm not too fond of providing flags for things that modify core behavior, but you may be right that we should simply not try to serialize TLs. My question is this, though: what's wrong with a best-effort attempt to serialize TLs, and not serializing those that fail serialization? From what I can tell (it's been a while since I touched this code) the current implementation tries to decide in advance whether the TL can be serialized or not. This, indeed, seems wrong, and maybe we should just try to serialize and give up if any exceptions are thrown (similarly in deserialization). I'm not too fond of swallowing exceptions, either, but this may be the right approach in this case.
So far, you've had the most experience with this feature. What do you think of a best-effort approach, and where do you see it failing?
Well, serialisation has been a pain it's true, but I'm also a bit concerned at the poking of the insides via Unsafe. TLS slots are performance sensitive and likely to change as the JVM is optimised. As none of my fibers care about TLS I'd rather trade it off for increased robustness and compatibility.
The current serialisation issue with TLS is actually ThreadLocal subclasses, so the existing code that tries to decide whether to serialise the contents or not doesn't work, as it's the ThreadLocal object itself that causes the problems.
Serialization aside, fibers must support TLs or too many things just won't work (worse -- they'll act very strangely b/c the TLs value would change every time a fiber executes on a different thread). Supporting TLs means using Unsafe (at least for now...)
What breaks if TLS swapping is disabled?
Plenty. People run JAX-RS and servlets on fibers. Those often rely on TLs. Also, people use Clojure's dynamic bindings. However, the experimental Continuation
class (not in the master branch), which we intend to merge along with the support for automatic instrumentation planned for Java 9, already has this option (the detached
argument). The reason for that is that not all continuations are (abstract) threads, but all fibers are, and so are expected to behave like threads.
Ah ha! That sounds like the right abstraction for me (at the moment at least). Great, well, things are working for now (sorta .... I had to just disable the errant feature when run inside gradle), so perhaps I'll wait until Continuation is available and then switch to that.
If you're curious as to why we're waiting with continuations, the reason is that 99% of the cases where people want a continuation, what they really want is a fiber, and when the more primitive continuation is really necessary, it is usually for some advanced stuff that would be annoying to do with manual annotations. For a complete explanation of the theory, read/watch this.
Sure, I do understand. The way I'm using fibers/continuations is a bit unusual, as you know (it's really about the programming model rather than performance), so for now I'm trying to simplify as much as possible.
+1 for the issue.
I'm currently trying to deal with huge memory leak due to thread-local handling in quasar.
I have an instance of non-thread-safe structure in thread-locals (per-thread cache which is not thread-safe). Quasar moves that data structure to fiber. I have hundreds of thousands of fibers. So I have hundreds of thousands of such structures (~50K each) instead of just 16.
I think, quasar could have one of or both:
BTW, is there any quick workaround for the issue?
So you want to keep the TL per-thread? Take a look at TrueThreadLocal
. It's specifically designed for striped data structures.
Thanks for the TrueThreadLocal
. Although, it is partial solution of the problem, because code that does caching is in the library which does not depend on quasar.
The best solution for striped data structures is CoreLocal
/ProcessorLocal
which has been discussed by Doug Lea, I think, for some time, but I don't know if and when it is coming.
The problem is that a ThreadLocal
and a core-local object used for striping are supposed to have very different semantics, and it's impossible to tell which is the one you want. The best solution, I think, is to let Quasar know about certain ThreadLocal
subclasses which it should replace with TrueThreadLocal
.
Another appeal for the ability to disable TLS serialisation. I'm going to fork the library to add this now, because today we hit yet another problem caused by this feature and it's just way past the cost/benefit ratio for our use case. I simply do not want my fiber to capture things I am not aware of - ever - and whilst reasoning about things on the stack and things reachable from the stack is not that hard, reasoning about what random libraries or things inside the JDK might have stashed in TLS slots is extremely hard.
This time apparently it was possible to serialise a sun.nio.ch.Util$1 class but not deserialise it, for some reason. The actual error calls the class "2222sun.nio.ch.Util$1" .... so perhaps some kind of stream corruption, but it happens when serialising fiberLocals.
If you add a flag that is only used in the serialization-related methods, I have no problem with merging it (as long as the default is to serialize TLs; as I said, we'll do more work to make this more robust, and other languages -- like Clojure -- rely on TLs quite a lot).
OK, cool, thanks.
This has been done and merged
The issue is only partially fixed: it is not possible to tell quasar to preserve thread-locals when switching fibers.
This issue was specifically about serialisation. I guess a separate issue would need to be filed for tighter control over thread local handling.
@stepancheg Could you explain exactly what problem you are facing?
I think his issue is the same as he described above, no? He wants "true" thread locals to avoid needless loss of cache efficiency.
@stepancheg Are you familiar with TrueThreadLocal?
@pron TrueTreadLocal
is not always a suitable solution, because it requires for all libraries (which use ThreadLocal
) to be dependent on quasar.
And even if thread-locals are meant to be per-fiber, for me it is safer to get incorrect program (if some library depend on thread-locals) than deal with hard to find memory leaks.
I understand. The problem is that I am hesitant to add an option requesting what is essentially incorrect behavior. More fine-grained control over TLs and scheduling would be possible when we offer continuations for Java 9. The workaround is to create a Fiber
subclass that overrides onPark
and onResume
, and then use reflection to call install
/restoreFiberLocals
to undo the fiber's default behavior.
The problem is that I am hesitant to add an option requesting what is essentially incorrect behavior.
I think it is not big deal, when the option if off by default, but I got your point.
The workaround is to create a Fiber subclass that overrides onPark and onResume, and then use reflection to call install/restoreFiberLocals to undo the fiber's default behavior.
Dependence on private API is a shot in the foot. It causes headaches for future developers.
More fine-grained control over TLs and scheduling would be possible when we offer continuations for Java 9.
How is it related to Java 9? AFAIU, only new Java 9 feature related to quasar is stack-walking API.
How is it related to Java 9?
From our experiments, directly using continuations would require lots of suspendable annotations, and so we've decided to properly add them only when annotations would no longer be necessary (thanks to the stack-walking API). Unlike fibers, continuations aren't threads, and so their API allows for choosing whether or not to manage thread-locals.
The workaround is to create a Fiber subclass that overrides onPark and onResume, and then use reflection to call install/restoreFiberLocals to undo the fiber's default behavior.
Could you at least please make these functions protected
?
Current thread-local handling in fibers is memory-leak minefields.
What's the problem with calling them reflectively? I don't want to add methods to the API until there's a more permanent solution.
What's the problem with calling them reflectively?
Because reflective code will break on some newer version of quasar.
I understand, but I don't want to expose those methods precisely because we may want them to go away when there's a better solution, so making them protected would be worse...
Just found another leak. This time because of async-http-client: https://github.com/AsyncHttpClient/async-http-client/blob/master/client/src/main/java/org/asynchttpclient/util/StringUtils.java#L21
I have no cases where I do need per-fiber locals, but I have a dozen of cases where hacked thread-locals cause excessive memory usage.
Anyway, I end up redeclaring Fiber
subclass in co.paralleluniverse.fibers
package (java 1.8 still allows it), and overriding installFiberLocals
and restoreThreadLocals
to do nothing, and it seems to work as I expect. May I suggest to declare these two functions protected
(probably with modified signatures to accept thread-local-map objects) instead of package-private until you find a better solution?
How about this - mark them as protected but deprecated. People who use them get a warning. The users are happy. The API is not committed to. They can be made private again when a better solution exists in Java 9. Win/win.
How about this - mark them as protected but deprecated ... They can be made private again when a better solution exists in Java 9.
I think that's OK if quasar developers are committed to preserving this feature in future versions of quasar with (maybe) different API.
Here's the thing. When you use fibers (as opposed to continuations), there is never reason not to want TLs, except when a particular TL is actually used not for thread-specific data but as a striping mechanism (a ProcessorLocal
, really). The workaround I described would only work if you know for certain that all of the TLs are like that. It is certain to break the semantics of arbitrray code.
Any solution -- even an inconvenient one -- must differentiate between the two different desired semantics of TLs. I see no point in marking as depracated methods that are certain to break code. I am, however, happy to consider a hook that lets you examine TLs one by one and decide if you want to to keep it on the thread or in the fiber.
When you use fibers (as opposed to continuations), there is never reason not to want TLs, except when a particular TL is actually used not for thread-specific data
In practice there is a huge reason not to use thread-locals in a way it works in quasar. Library developers tend to store large amount of data in thread-locals assuming that there are only few threads and total memory overhead would be small.
I see no point in marking as depracated methods that are certain to break code.
The point is that current implementation of fibers does not allow to use certain libraries without loosing a lot of memory (with inherent GC overhead) due to data stored in thread-locals. When you have millions of fibers that becomes a problem.
I am, however, happy to consider a hook that lets you examine TLs one by one and decide if you want to to keep it on the thread or in the fiber.
I think that would be perfect.
BTW, at the moment I'd suggest to add StrandLocal
class which is an opposite of TrueThreadLocal
, which should work even if fiber-locals based on ThreadLocal
are not saved-restored.
Library developers tend to store large amount of data in thread-locals assuming that there are only few threads and total memory overhead would be small.
I understand. But code that relies on TLs is 100% sure to break if they are not supported correctly. The only case when it won't break is when a TL is used as a "processor-local" for striping, and this is the minority of cases.
The point is that current implementation of fibers does not allow to use certain libraries without loosing a lot of memory (with inherent GC overhead) due to data stored in thread-locals.
Again, code that uses TLs cannot work at all if they're not properly supported.
BTW, at the moment I'd suggest to add StrandLocal class which is an opposite of TrueThreadLocal, which should work even if fiber-locals based on ThreadLocal are not saved-restored.
The only sensible implementation that allows code to work at all is copying thread locals by default. There will not be an option to not save TLs by default in a fiber (though maybe in a continuation), as there is no possible way this could ever work. Use of TLs for striping is the exception, not the rule.
So far this feature is a massive net negative, at least for me. Things put in ThreadLocal's are invariably not designed to be serialised, it involves poking around in JVM internals and the JVM treats ThreadLocal's specially - back when I was hitting serialisation issues the JVM would segfault due to the use of Unsafe.
Overall, I'd rather have a flag to disable this and have Quasar leave TLS slots alone.