google / flogger

A Fluent Logging API for Java
Apache License 2.0
1.45k stars 123 forks source link

Benchmarks and String Creation #316

Open adligo opened 2 years ago

adligo commented 2 years ago

Doesn't this style of logging api incur the cost of string creation, even when the logger is off ?

I haven't looked at the byte code, but I'm wondering what exactly is ment by 'essentially free'.

I remember reading in the initial log4j that the whole reason for the usage of the imperative paradigms if, i.e.; if (log.isDebugEnabled()) { log.debug("foo"); }

Was to avoid the cost of the creation of the string "foo". The impartive if statement would create branching bytecode, with GOTO statements. IIt would seem to me that this functional stream log API style would result in a lot of extra string creation and garbage collection. Has anyone benchmarked it ? Where are the benchmarks ?

lowasser commented 2 years ago

Not exactly. Here's a useful breakdown:

In the specific code you give, "foo" is free, because it's a class-level constant. When the class gets loaded, all constant strings in the source code get loaded, too, so you really don't pay anything extra for this; even if you have the if as you describe, "foo" would still get loaded as soon as the class gets loaded.

Flogger's format arguments are constant, and therefore free as well -- see e.g. the first example

logger.atInfo().withCause(exception).log("Log message with: %s", argument);

"Log message with: %s" is a constant and therefore free.

The expensive part is replacing %s with argument.toString(), and Flogger only bothers to do that when the log statement is enabled, which is what you'd want.

(I'm also looking right now at very positive benchmarks for Flogger, but I don't know if there is anything published...)

hagbard commented 2 years ago

As Louis says, there should be no cost to "string creation" because you express that as "literal format string plus arguments" and let Flogger format it only when the log statement is enabled (and Flogger doesn't even have to format anything if the backend can do structured logging).

If what you're wondering is how: logger.atFine().log("message: %d", myIntValue);

Avoids things like the overhead of varargs and auto-boxing, that's all explained in depth in: https://google.github.io/flogger/anatomy.html

TL;DR: Flogger is completely zero-allocation for disabled log statements up to 2 arguments of any type (it has overloads to cover all necessary combinations of autoboxing for 1 or 2 arguments) and it's free from varargs up to 10 arguments.

Extensive analysis of many thousands of log statements in Google code showed that this would cover the vast majority of cases (2 arguments is the most common case, and above 5 arguments becomes extremely rare).

It also supports lazy arguments via simple lambda closures: logger.atFine().log("message: %s", lazy(() -> myExpensiveCall())); and while this can incur a small allocation for the capturing lambda, it's still very fast (and fast enough to be preferable to the error prone use of guards around log statements).

At some point we should probably publish our benchmarks, though we're also happy if people want to benchmark it for themselves (though you need to be quite careful to mimic real log statements, e.g. at representative stack depths and with a mix of lengths of format strings and types of argument ... it's quite fiddly).

adligo commented 2 years ago

Well all the bytecode I have ever looked at was doing new String("foo"); for the string literal. Not sure when or if that changed. But if you are both convinced I will get some bytecode from Eclipse and post here ...

cslee00 commented 2 years ago

That's already been addressed above:

In the specific code you give, "foo" is free, because it's a class-level constant. When the class gets loaded, all constant strings in the source code get loaded, too, so you really don't pay anything extra for this; even if you have the if as you describe, "foo" would still get loaded as soon as the class gets loaded.

The referenced link (https://google.github.io/flogger/anatomy.html) shows the constant string handling (no "new String").

Chris.

On Wed, 1 Jun 2022 at 15:08, Adligo Inc @.***> wrote:

Well all the bytecode I have ever looked at was doing new String("foo"); for the string literal. Not sure when or if that changed. But if you are both convinced I will get some bytecode from Eclipse and post here ...

— Reply to this email directly, view it on GitHub https://github.com/google/flogger/issues/316#issuecomment-1144190463, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCCXPQ7Z2M5PNQ4CEXNGMTVM7NMVANCNFSM5XSLJQJA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

adligo commented 2 years ago

/ Compiled from LogByteCode.java (version 11 : 55.0, super bit) public class org.adligo.test.LogByteCode {

// Method descriptor #11 ([Ljava/lang/String;)V // Stack: 2, Locals: 2 public static void main(java.lang.String[] args); 0 invokestatic com.google.common.flogger.FluentLogger.forEnclosingClass() : com.google.common.flogger.FluentLogger [1] 3 astore_1 [f] 4 aload_1 [f] 5 invokevirtual com.google.common.flogger.FluentLogger.atInfo() : com.google.common.flogger.LoggingApi [2] 8 checkcast com.google.common.flogger.FluentLogger$Api [3] 11 ldc <String "my main message"> [4] 13 invokeinterface com.google.common.flogger.FluentLogger$Api.log(java.lang.String) : void [5] [nargs: 2] 18 return Line numbers: [pc: 0, line: 8] [pc: 4, line: 9] [pc: 18, line: 10] Local variable table: [pc: 0, pc: 19] local: args index: 0 type: java.lang.String[] [pc: 4, pc: 19] local: f index: 1 type: com.google.common.flogger.FluentLogger

// Method descriptor #20 ()V // Stack: 2, Locals: 2 public LogByteCode(); 0 aload_0 [this] 1 invokespecial java.lang.Object() [6] 4 invokestatic com.google.common.flogger.FluentLogger.forEnclosingClass() : com.google.common.flogger.FluentLogger [1] 7 astore_1 [f] 8 aload_1 [f] 9 invokevirtual com.google.common.flogger.FluentLogger.atInfo() : com.google.common.flogger.LoggingApi [2] 12 checkcast com.google.common.flogger.FluentLogger$Api [3] 15 ldc <String "my LogByteCode constructor message"> [7] 17 invokeinterface com.google.common.flogger.FluentLogger$Api.log(java.lang.String) : void [5] [nargs: 2] 22 return Line numbers: [pc: 0, line: 12] [pc: 4, line: 13] [pc: 8, line: 14] [pc: 22, line: 15] Local variable table: [pc: 0, pc: 23] local: this index: 0 type: org.adligo.test.LogByteCode [pc: 8, pc: 23] local: f index: 1 type: com.google.common.flogger.FluentLogger

Inner classes: [inner class info: #3 com/google/common/flogger/FluentLogger$Api, outer class info: #25 com/google/common/flogger/FluentLogger inner name: #29 Api, accessflags: 1545 public abstract static] }

hagbard commented 2 years ago

Well all the bytecode I have ever looked at was doing new String("foo"); for the string literal. Not sure when or if that changed. But if you are both convinced I will get some bytecode from Eclipse and post here ...

I've been using Java for 20 years and literals have always been loaded from the constant pool. It's required by the language specification: https://docs.oracle.com/javase/specs/jls/se18/html/jls-3.html#jls-3.10.5 and (2005) https://docs.oracle.com/javase/specs/jls/se6/html/lexical.html#3.10.5

It's possible some JVM implementations don't do this, but it's incorrect (and there's even code in Flogger that assumes literals are interned so we can test via == for a fast test for known special cases).

adligo commented 2 years ago

Ok thanks, perhaps I was remembering the String concatenation instead of construction.

hagbard commented 2 years ago

11 ldc <String "my main message"> [4]

That's the "load constant" which gets the "my main message" literal and puts a reference to the interned String on the stack. There is no allocation here (allocation and mapping from constant pool index to a heap allocated instance is done during class loading).

adligo commented 2 years ago

public static void main(String [] args) { FluentLogger f = FluentLogger.forEnclosingClass(); f.atInfo().log("my main message"); }

public LogByteCode() { FluentLogger f = FluentLogger.forEnclosingClass(); f.atInfo().log("my LogByteCode constructor message");
} }

hagbard commented 2 years ago

Tip (in case it wasn't clear in the docs): You really want to be setting up the logger in a private static final field in the class rather than doing all the work to obtain it many times. And assuming it's the first static field in the class this also lets you use it in any static method and during other static initialization.

adligo commented 2 years ago

I'm still wondering if there is a noticeable difference between the imperative style to this fluent / functional / chaning style. It seems like things are happening on the stack that wouldn't need to happen.

adligo commented 2 years ago

Ya, I know you advise the static logger style. Having done that for years (i_log.adligo.org), I'm actually starting to prefer a C# style of passing around a context to get the loggers. I think it's easier to mock. I'm thinking of writing yet another logging api (i_log2.adligo.org?), which is probably why I'm so interested in this minutiae. THANKS FOR ALL THE RESPONSES :)

hagbard commented 2 years ago

The hot-spot compiler is good at folding stuff like that into efficient code. CPU is cheap compared to allocations and GC pressure. Yes, there are 2 (or more) calls to the logging API instead of 1, but the advantages you get from having a modular, extensible API (and thus being able to get rid of boxing and varargs without combinatorial explosion of many hundreds of method overloads) is well worth it.

In Java there's no need to obtain logger contexts and pass them around and no need to ever mock loggers for testing. You test logging by using Handlers which can be injected behind the API facade, not by mocking the API itself (and if you're using a different underlying backend to output logs, you should make sure you select one that's equally testable without mocking the user API itself).

Needing some "logger provider" context is a serious downside to any logging API since now you cannot add logging to public static methods without changing the API, and you cannot add logging during static initialization of a class.

adligo commented 2 years ago

As I mentioned I have implemented a logging API. I didn't say it was necessary, I said I preferred it. You could always create your own singleton LogFactory or something similar if you really wanted to wrap a context that could be passed around to get things initialized statically.

adligo commented 2 years ago

So I guess no one has benchmarked the differences in the extra stack calls anywhere ?

hagbard commented 2 years ago

Oh yes, we certainly have, at least inside Google.

After a week of attempting to pull out any reliable measure of the potential overhead, I gave up. Under realistic conditions there was just no observable difference between a "single method with many parameters" API and a "fluent logging API". The hotspot compiler was just figuring everything out as far as we could tell.

Remember that the only case which matters is for the "disabled" log statements, and then only under the sort of load that would trigger the hot-spot compiler. Since Flogger has only 2 concrete implementations of the fluent API, the hot-spot compiler can do all sorts of tricks by spotting places where bi-morphic dispatching occurs.

Flogger's used everywhere in Google's Java code, including many mission critical systems, so we've measured and checked as much as we can think of with regard to things like this quite carefully.

What we haven't done is tidy up the benchmark code, remove internal dependencies, document it and publish it (and the results) to github. We probably should do that at some point.

hagbard commented 2 years ago

As I mentioned I have implemented a logging API. I didn't say it was necessary, I said I preferred it. You could always create your own singleton LogFactory or something similar if you really wanted to wrap a context that could be passed around to get things initialized statically.

I suspect you'd run into cyclic static initialization issues pretty fast. Flogger sets up its "Platform" singleton on first use rather than as an explicit singleton exactly to avoid issues of cyclic dependencies during class loading. You also end up almost back at square one if you make a singleton to inject the factory since the singleton removes the choice of implementation and makes the logging in the code hard to test without explicit knowledge of what you need to do to inject a mock.

Since JDK Logger and other common backends all provide ways to capture logs for test purposes without having different implementations of the actual user facing API, I see no reason to try and solve the testability issue using mocks like this.

adligo commented 2 years ago

Looking forward to the benchmarks. On the mocking issue, I think my preference is due to my work on tests4j.adligo.org , which is based on JUnit's API but intensely concurrent. You can back a Logging facade with Handlers that use tread locals to get it done, but I've become fond of simply passing around a context. It's old fashoned (pre the Spring Declaritive Style popularity), but since I'm also trying to transpile a lot of Java to JavaScript (and TypeScript) and make the APIs work the same in the different environments, it seems to be the way to go.

adligo commented 2 years ago

Also, you don't need to test the log API in most cases. But you MAY want to test the usage of the log API. There are certain situations where you need to audit things (i.e. don't log the user's passwords, etc).