Closed jupyterkat closed 11 months ago
are you saying the polling on mpsc for every instruction was a bigger bottleneck than the new thread locals?
It’s probably the mutex existing for absolutely no reasons at all, getting locks is expensive when you get it literally every instruction.
I just switched mpsc to flume because it’s the better choice overall lol
I guess the thread locals in auxtools are ok, could we swap back the debug server ones though? At least the one that is accessed for every single instruction
Is there a specific reason why? Is it the constant initialization check?
The instruction hook function currently almost doubles init time on tg (or something like that). It's easily a case where we should be keeping it as optimized as possible.
Thread locals don't just add an init check, they're effectively adding extra layers of indirection.
If anything I want to increase the simplicity (post-compile) of our global accesses rather than turning them in to TLS or any other kind of variable with indirection