preaction / Log-Any

Simple, fast Perl logging API compatible with any logging system
Other
13 stars 19 forks source link

Excessive overhead in logging methods #69

Open nrdvana opened 6 years ago

nrdvana commented 6 years ago

Back in the pre-0.9 versions of Log::Any, there was no Proxy and methods called on $log would go directly to the adapter, who could efficiently squelch them. It meant that things like

   for (1..1000000) {
      ...
      $log->trace('reached point foo');
      ...
   }

could run extremely quickly if the back-end adapter was well optimized and the log level disabled. I did implement such back-ends, like Log::Any::Adapter::TAP and Log::Any::Adapter::Daemontools which use dynamic-re-blessing to replace methods with empty subs, for the best theoretical performance possible with Log::Any.

Then, version 0.9 came out and introduced the 'Proxy', which added a layer. It wasn't that big of a deal though, since it still checked the is_ method before evaluating any strings, and it was still pretty efficient, though it made my effort less relevant.

I see now that the latest version has quite a few new features. But, while they all look useful, it is no longer remotely as efficient as the old versions when dealing with disabled log levels. I can of course write $log->foo(...) if $log->is_foo but this is very tedious and not really practical for normal use.

At this point, I'm considering writing my own Proxy for Log::Any so I can count on it to remain minimal, and probably a new front-end module to go with it, like

# equivalent to ${caller."log"}= Log::Any->get_logger(proxy_class => "Log::Any::Direct")
use Log::Any::Direct '$log';

I'm wondering if you would be interested in having it contributed to this package, or if I should release it separately on CPAN.

I would also be interested in establishing a protocol between the proxy and the adapter where the proxy could register itself with the adapter so that the adapter can tell it when logging levels change. Not anything elaborate, just a callback or something and the adapter would be responsible for maintaining the weak references and etc. (Of course, my use case would be to have the adapter re-bless the proxy any time the logging levels changed, thus restoring my original optimization.)

I could also implement that independently on CPAN, since I'd be writing both the proxy and the adapter, but if you're interested in making this a part of Log::Any itself, or just documenting the protocol for other adapters to follow, I'd be happy to collaborate, and submit a patch.

preaction commented 6 years ago

The proxy and the adapter have different goals. The new features added change the proxy's goal more: The proxy does useful things on its own. So, it can check wantarray to see if something is wanted before doing something, but the Proxy does things independently of whether the Adapter will do a thing. Even if is_debug is false, if someone is using the return value from $log->debug(), everything but shipping to the adapter must happen so that the string that results can be used.

But, if the user of the Proxy knows absolutely that they don't care about those extra things and only want to pass things to the Adapter, we could speed (in theory) it up a bit.

A new proxy class would work for that. You could write a proxy (Log::Any::Proxy::Direct or Log::Any::Proxy::Fast) that disables any/all of the new features, or checks the is_* first off no matter what. Since your code is choosing which proxy is uses, it would know that it won't get a return value.

We could also change the is_* method to accept an argument of the string to log only if is_* is true, and keeping the return value the same (true if the string was logged, false otherwise). That would change $log->debug( ... ) if $log->is_debug in to $log->is_debug( ... ).

I would accept either of these solutions, though the new proxy class would be less of a change.

As for a protocol for Adapter objects to talk back to Proxy objects, I'm not sure how that will work. Do enough of the log adapters support knowing such a thing?

There's also this project, which claims to be 1.5x faster than Log-Any: https://metacpan.org/pod/Log::ger.

nrdvana commented 6 years ago

What I really would like most of all is a way to return to the efficiency of Log::Any 0.15 API on an opt-in basis for module authors. All of my code is 0.15 compliant anyway, so I'd just change the use line everywhere. I almost want to create a "Null" proxy which actually just exposes the Adapter like used to happen, but since the contract for adapters has officially changed there's no guarantee that other back-ends still adhere to it. So yes, basically I think a custom Proxy is the way to go, and give it a minimalist set of methods that is compatible with the original API and doesn't require evaluating any part of the arguments if the back end is disabled.

I have a lot invested in Log::Any so I wouldn't really want to change logging systems, though it looks like perlancar put a lot of effort into Log::ger already, unfortunately. I think it's better not to fragment the logging ecosystem further, if possible.

For the protocol, I haven't really investigated how difficult it would be for other logging back-ends to take advantage of this, though my main concern is to fix mine, and leave a path that other back-ends can use if they decide to. Sort of a "hey, if you want to optimize things, here's how to officially do it". You can see the mechanism I use between my Daemontools adapter and its Config objects, and portions of that functionality that could even be moved to this new Proxy so that other modules don't have to re-implement it. I would basically declare the main class passing through to the adapter:

package Log::Any::Proxy::Fast;
sub _set_squelch {
    my ($self, $level)= @_;
    bless $self, __PACKAGE__.'::L'.$level;
}
sub trace { ... $adapter->trace(...) ... }    sub is_trace { ... $adapter->is_trace ... }
sub debug { ... $adapter->debug(...) ... }  sub is_debug { ... $adapter->is_debug ... }
....

and then write out subclasses with various levels disabled:

package Log::Any::Proxy::Fast::L7;
@ISA= 'Log::Any::Proxy::Fast';
sub trace {}   sub is_trace { 0 }

package Log::Any::Proxy::Fast::L6;
@ISA= 'Log::Any::Proxy::Fast::L7';
sub debug {}   sub is_debug { 0 }
....

and then add an optional hook between the adapter and the proxy:

package Log::Any::Proxy::Fast;
sub init {
    ...
    $adapter->register_proxy($self) if $adapter->can('register_proxy');
    ...
}

Presuming the adapter knew when its log level was changing, it would just call "_set_squelch" on each proxy that it knew about, if the proxy supports that method.

So, the Adapter opts-in to know about proxies using it, and the Proxy opts-in to the ability to change squelch level, and the producer module opts-in to wanting this new Proxy API.

mephinet commented 6 years ago

Michael, I understand that logging performance is of high importance for you - so that you went through all the hoops and whistles you've done in https://metacpan.org/source/NERDVANA/Log-Any-Adapter-Daemontools-0.101/lib/Log/Any/Adapter/Daemontools.pm. However, I don't see much benefit in having different Proxy implementations on CPAN, and would rather prefer that we try to improve the existing one instead. As I did some of the changes to the Log::Any::Proxy this year, I'm partly to blame for the slow-down. Performance was not one of my top priorities (the adapter I was testing with has all levels turned on), so there should be room for improvement. I've profiled your logging example, and when looking at the Log::Any::Proxy code now, I think that we should add an early-return after checking the adapter's log level and the caller's wantarray - just as we do in the f versions of the logging methods. This gives me a speed-up of 3.7 on the $log->trace calls (when run 100.000 times). Each call (against Log::Any::Adapter::Stderr) is now down to 0.7µs on my notebook, without having to add the is_trace guard in the caller.

Does this satisfy your goal? Does anyone see a downside in doing this optimization?

preaction commented 6 years ago

That optimization seems fine to me (especially if it passes the test suite, which is our best guess at "working") and can probably be applied whether or not this solves @nrdvana's problem satisfactorily.

As for the proxy keeping track of its adapter's log level (the adapter calling a method on each proxy when the log level changes), I'm still not convinced it can be done well, and I really don't want to have you do it and then have me not like how it's done and have you have wasted your time on it. So, keeping that in mind:

Whether reblessing or having the proxy have its own log level before delegating to the adapter, it's the proxy keeping some state that has previously been only in the adapter. This state must also always be in the adapter: The adapter is the source of truth for the consumer's log level for the given category. There's all kinds of fun bugs that result from trying (and failing) to keep multiple copies of the same state in sync (since each adapter instance and each proxy instance will need to know the state). That's assuming that knowing the state at all times is even possible (which is likely only those adapters that directly write logs and so keep track of the log level themselves, not adapters for other logging systems).

Having an API for adapters to talk to proxies would also be a new thing: Right now a proxy can read an adapter's state, but the adapter itself can never talk to the proxy, cannot even know a proxy exists (except that it's getting messages from something). There's nothing really preventing it, but no proxy method exists that an adapter would find useful. Adding one would turn a unidirectional message-passing API (Producer -> Proxy -> Adapter -> Consumer) into a bidirectional one (Producer -> Proxy <-> Adapter <-> Consumer), greatly increasing the complexity possible.

There's nothing wrong with doing these things provided it gains enough to justify the complexity. But, I don't think adding this feature would improve enough over alternatives like just writing a proxy that always does the level checks and doesn't do any processing to the logged strings (so, basically, the 1.040 version of Log::Any::Proxy). Another option is constant-folding away the logging entirely using something like use constant DEBUG => $ENV{DEBUG} || !$ENV{PRODUCTION}; $log->debug( ) if $DEBUG; (or something like https://metacpan.org/pod/Keyword::DEVELOPMENT).

nrdvana commented 6 years ago

Yeah I think that's a good idea. It satisfies my actual production needs, so the ticket can be closed on that.

But, it doesn't scratch my itch that makes me want to write a Proxy ;-) so I'd like to continue the conversation if you don't mind. The real problem is that there's a lot of different sensible APIs for the producer that people have proposed, and you can't roll them all into the same class without a whole lot of if statements and control knobs, and adding all of that adds bloat to a module that originally advertised itself on extreme minimalism for the sake of getting everyone to adopt it.

In order to get there, the only two possibilities I see are to make the Adapter double as a Proxy API like it used to, or to provide a linkage between them so that the adapter and proxy can cooperate. I described the linkage scenario above, but here's a proposal for the first one (which again, I could publish separate on CPAN, but would like some consensus ...)

package Log::Any0;
sub import {
    ...
    my $adapter= Log::Any::Manager->get_adapter(...);
    unless ($adapter->can("tracef")) {
        ... # do something clever to give default versions of those methods to the adapter
    }
    ${$caller.'::log'}= $adapter;
}
nrdvana commented 6 years ago

Just to give a little more background on my use cases, any time I write something complex I try to add a lot of ->trace and ->debug methods. Later on, when I've forgotten how it all works and something in production has suddenly broken on me, I just enable trace logging and now I can see the deep behavior. Sometimes I only want to enable trace on a single category. Other times I only want to enable trace on a single web request. (I actually have this in production, where I set a special browser cookie and then all requests from that browser produce debug or trace level of logging in the server logs)

If the log level is enabled, then I don't really care how efficient or how many levels it has to hop through to be seen. But if it's disabled, I want to get performance as if it doesn't exist. But I also don't want to have to type a lot when I'm writing it, or restart the process to change log levels.

I also like to push the limits of what can be done with perl. I'm the DeLorean digital dashboard guy, and yes I do actually have Log::Any trace messages in the middle of my graphics rendering loop. So maybe I should be using a special-purpose XS logging system for that stuff (though calls into XS aren't really much less overhead than calling sub { 0 } ) but when I originally adopted Log::Any, it was perfectly suitable for this since it didn't add any overhead, and I wrote my own adapter.

nrdvana commented 6 years ago

As a sidenote, the internal check for wantarray still leaves me with a possible problem where I might have written

$condition eq 'usual' or $log->debugf("condition is unusual %s", $massive_data_structure);

which after an upgrade of Log::Any could severely impact page-serving times. I don't usually write logging statements that way though, but it is something I have to be careful of now.

preaction commented 6 years ago

I mean, like I said, I'd totally accept into core a minimalist proxy that did only the bare minimum to forward requests to its adapter (the proxy object as it was in 1.040). Producers are allowed to choose their proxy, and all proxies know how to talk to all adapters.

The structured logging feature basically does what you just proposed: If the adapter supports a structured method, that gets used. Otherwise, the proxy uses the level methods and turns the data structure into a string. But, like you said, it would be better for producers to opt-in to performance reduction rather than opt-out (which knowledge I am coming to too late, so the only way back is to opt-in to performance improvement and prevent further performance degradation through additional proxies in the core distribution).

There's also nothing stopping anyone from using an adapter directly except that using a proxy means being able to reliably support more than one adapter. If there are a set of adapters that all have the same API, then it's possible to use those without a proxy. If it's not easy to use adapters directly, I'd support adding method like my $log = Log::Any::Adapter->build( $class, @args ) or something. It's not quite in the spirit of the framework, but I see no reason to forbid excising the proxy from the workflow if one wants.

But for the main Log::Any workflow, the proxy is a necessary component: Not all adapters support the same API and/or logging concepts (categories, contexts, or even log levels). There's no real way to make the project like it was before the proxy object as it was in 1.040 (before the proxy returned the formatted string that was logged) still have it work like it does.

The thinner the proxy is, the more tightly-coupled a proxy is to a specific adapter's API, the less "any" the proxy is. If the proxy doesn't support the alias methods that a producer expects it to (for example, is_critical, a syslog level, gets mapped to something for the log4perl adapter), if it just does sub AUTOLOAD { my ( $method ) = $AUTOLOAD =~ /::([^:]+)$/; shift->{adapter}->$method( @_ ); }, then it isn't a proxy and you'd be better off (and faster) just using the adapter directly (or excising even the adapter and using the consumer directly). Once you add the mapping, it's a proxy (and that's what I imagine the proxy started out as).

nrdvana commented 6 years ago

IIRC, the original contract for the producer API was "logging methods take exactly one string", "if you call the *f method, it will call sprintf and dumper() any references, then pass that one string to the logging method". The contract for an adapter was "you must derive from Log::Any::Adapter::Base and implement all the ->logging_methods() and optionally ->detection_methods".

So for Log4perl, the adapter inherits from ::Base and then iterates ->logging_methods and applies the level mapping where needed, since it doesn't have all the same names on its own loggers. I just checked around (random sampling on cpan) and they all still inherit from base, and of course they all still have to accept a single string argument to the main logging method.

So, I think the only thing that prevents the adapter from being used like the original API is the lack of the "*f" methods and aliases on Log::Any::Adapter::Base. Looking back, it appears David Golden removed them ( 86476919d1220380e20a77b660c51b83d27fe81a, f6a91e907f3d0cc284c86f9e0ea1586cdd331a65 ) just as cleanup since they weren't needed anymore. It probably wouldn't hurt anything to add them back.

preaction commented 6 years ago

If you want to use an individual adapter's methods directly, you are free to, but the *f functions are part of the proxy, not the adapter. If an individual adapter wants to also implement a proxy API, it is free to, but I won't be making it a requirement, nor will I ship a proxy that uses *f functions an adapter has available: The proxy has the formatter callback, not the adapter, and checking adapters to see if they implement something like ->formatter_methods() (to match ->logging_methods()) sounds like it would exacerbate the problem you're here regarding to no actual benefit for people who use Log::Any to integrate with multiple logging systems (since they use the proxy).

I again reiterate that there can be a proxy that is exactly the proxy that was in 1.040 and I will accept it into the core Log::Any distribution. A proxy with those features could probably even be made faster with expanded application of level detection. We could even do other things to improve performance on a new proxy that could minimally break the proxy API (such small breaks could be made with a new proxy, since it doesn't require backwards compatibility).

Additional performance improvements will be judged based on their effectiveness and added complexity. This project already has quite a lot of indirection (I didn't even know about the Manager class until I tried adding the Null proxy), and I would be wary of adding even more complexity. The project still needs to have maintainers, and using clever tricks that only the most experienced Perl devs will understand is a sure route towards stagnation and rot.

nrdvana commented 6 years ago

How about just making get_adapter($category) public? Then other cpan modules could make whatever front-end they wanted and continue using all the log::any routing to the back-end, including the adapter configs that the consumer sets up . Modules that care about performance could just directly log to it, and problem solved. The only hesitation would be if you were planning any future compatibility breaks on the adapter API, or planning to implement category routing in the Proxy rather than via the adapter like is done now.

re: the *f methods on Adapter::Base; I wouldn't suggest the proxy forwarding those to the adapter now, but to recap history, for five years the Log::Any system was exposing adapters directly which did have *f methods on each of them, by way of inheriting them from the base class. Then in 2015 dgolden released the rewrite where those had been moved to a proxy (and breaking various features of both my logging adapters in the process, so hopefully that was the last back-compat-breaking change) Since the methods have been there for longer than they haven't been, I was suggesting it wouldn't hurt anything to restore them. But I admit it would just be clutter for most users at this point. Though mostly harmless clutter.

I might submit a minimalist Proxy as well, but I'm actually toying with some different ideas and might not decide exactly what I want right away. Having get_adapter officially available would get me back to identical performance of 0.15 without needing to finish those ideas first.

mephinet commented 6 years ago

Pull request #70 created.

@nrdvana regarding your sidenote in https://github.com/preaction/Log-Any/issues/69#issuecomment-350360906 - I think you're mistaken: if you check my example code in https://gist.github.com/mephinet/fbdeff254e00d3fa4a8df6beb90f0064 you can see that wantarray returns undef even if the method is called as right-hand-side of an or, so my performance improvement should work in this case, too.

nrdvana commented 6 years ago

@mephinet ah nice, perl is smarter than i gave it credit for. In that case there's probably a negligable number of cases where this changed on anyone. (would have to be last statement in a function)