Log4j2 - Request support for ThreadContext (MDC)

MosheElisha commented 4 years ago

Hi,

Log4j2 provides a useful ThreadContext class and even more useful CloseableThreadContext.

try (CloseableThreadContext.Instance ignored = CloseableThreadContext.put("key", "value")) {
  log.atInfo().log("My message");
}

The entries will be logged if pattern contains %X or %X{key}

I read the code comment that you don't want to encourage people to use log4j specific (https://github.com/google/flogger/blob/master/log4j2/src/main/java/com/google/common/flogger/backend/log4j2/Log4j2SimpleLogEvent.java#L96).

I would like to request you to reconsider. If someone choose flogger but with a log4j2 backend, it usually means IMO that, that someone prefers your API but would like to take advantage of the log4j2 backend.

If you still prefer to prevent ThreadContext by default, please expose an easy way to extend and create a custom log4j2 backend that will allows it.

BTW - The comment mentions that usage of Flogger tags is preferred but I didn't find any documentation or example.

Thanks.

hagbard commented 4 years ago

The reason you've not found any tags stuff is that it's not supported in open-source yet. Some of the code and comments relates to features used only inside Google. I'm looking at the gRPC based context stuff at the moment, but can't say how soon it will be ready.

You raise an interesting point. The problem is that if we allow MDC then people cannot switch backends without losing functionality, but maybe that's just something we have to live with. Using Flogger's tags will be equivalent and work with any backend, but maybe it's fine to also use MDC.

It'll mean merging MDC tags and Flogger tags on every log statement though, which will get messy and slow things down.

Just to set expectations, unless people volunteer to help for things, they won't get done quickly. I'm the only person really working on this project and I get about 2 days a month of time to do it.

MosheElisha commented 4 years ago

The problem is that if we allow MDC then people cannot switch backends without losing functionality, but maybe that's just something we have to live with.

I think so. In addition, without it, people won't be able to migrate to the new API in large existing projects.

Just to set expectations, unless people volunteer to help for things, they won't get done quickly. I'm the only person really working on this project and I get about 2 days a month of time to do it.

I appreciate the effort and I don't take it for granted. I don't mind trying to contribute but after I cloned the project, I realized that I am unfamiliar with this workspace structure and build mechanism.

Can you please share instructions on how I can build and test?

Thanks.

himanshukandwal commented 4 years ago

I am also interested in contributing and having Context in LogContext, rather than in message [CONTEXT :.

hagbard commented 4 years ago

Where the context data goes is entirely up to the backend. You can already write a different backend to move the data elsewhere, it's just that the backend currently uses the "SimpleMessageFormatter", and that is, well, simple.

This is a different question to whether we merge the MDC context data into the tags (or vice versa).

Since the log4j2 backend is community supported, you're welcome to propose changes to it via PRs.

sekaiser commented 3 years ago

I am also interested in contributing and having Context in LogContext, rather than in message [CONTEXT :.

I stumbled upon the same issue and implemented my own log4j2-backend.

As I wanted to append the metadata as new fields to the log entry, I provided my own implementation of the SimpleMessageFormatter, in which I removed the appendContext method.

In Log4j2SimpleLogEvent#asLoggingEvent() I replaced

Map<String, String> mdcProperties = Collections.emptyMap();

with

Map<String, String> mdcProperties = new HashMap<>();
Metadata metadata = logData.getMetadata();
final int size = metadata.size();
for (int i = 0; i < size; i++) {
       mdcProperties.put(metadata.getKey(i).getLabel(), metadata.getValue(i).toString());
}

So far I do not really understand why a difference between Metadata and Tags is needed. Despite my lack of knowledge I wonder if the approach above is worth investigating further.

hagbard commented 3 years ago

The difference between Tags and Metadata is about efficiency. Since Metadata values can execute code in the key, they cannot be cached. Whereas tags can have a known string representation created and cached when the tags are installed in the scope. Soon it will be possible to add some metadata to scopes as well (slightly blurring the distinction) but tags can be emitted with far less work for each log statement.

At one level leve, tags and metadata feel the same (they both seem like key/value pairs) but at another level they are quite different (a custom metadata key could emit emit several values into the log statement, including things like current system state, but tags are completely immutable). Since most people just want "tag like" behaviour, having these separate is good, since now the majority of cases are easily cached and efficent.

hagbard commented 3 years ago

I am also interested in contributing and having Context in LogContext, rather than in message [CONTEXT :.

I stumbled upon the same issue and implemented my own log4j2-backend.

Would you mind sharing that? Since I have essentially no experience of Log4J2, your code might just be better for everyone (since you probably know what Log4J2 users want better than I do).

sekaiser commented 3 years ago

I would not go as far as to know what log4j2 users want, but maybe this implementation can aid this conversation. I converted my approach into a project and you can find it here: https://github.com/sekaiser/flogger-log4j2-backend

Basically, I stripped off [CONTEXT ...] from the message and moved the content into the mdc (see the test logMetadata). By doing so, there is no need to introduce an abstraction for ThreadContext, MDC, and NDC (but I probably misuse flogger's metadata). For my small use case this is no issue. Also, I need to admit that my primary concern was to make it work only.

paladox commented 3 years ago

https://github.com/sekaiser/flogger-log4j2-backend

@sekaiser would you be able to submit a pull for this project to incorporate your changes? :)

sekaiser commented 3 years ago

Sure. I will take care of it this weekend.

hagbard commented 3 years ago

Thanks for looking at this.

I see in the PR that it doesn't explicitly attempt to merge the Flogger metadata with any other metadata from the Log4J2 ThreadContext. Is that merging handled inside the Log4jLogEvent, or is it not being done?

https://logging.apache.org/log4j/log4j-2.2/manual/thread-context.html

Not merging is the simplest way to do this (and the one which is most well defined), but it should be made clear in the docs what happens. Personally I'm fine with not merging, and prefer "well defined and a bit limited" to "clever and hard to understand".

There are a few things that would need to be done before this sort of change would be acceptable in Flogger though (since how Metadata is handled is now changing with the new MetadataProcessor and ScopedLoggingContext). That being said, it wouldn't be a huge amount of work to make this process scope and log-site metadata via a MetadataProcessor to achieve the same thing.

I have to say I'm a little confused about why there's special code to handle fundamental types when adding the StringMap. I am also wondering whether you should be "freezing" the context map before adding it to the log event.

Please add comments explaining the assumptions and intent of the code to make it easier for me to review. For example if you are merging with data from the ThreadContext, what do you do if there's a duplicate key? Also be aware that MetadataKeys can have repeated values, so any map of metadata values needs to account for this somehow.

I recognize most of what's there as copied from the original backend, so please highlight where you're making changes.

Cheers, David

On Wed, 7 Apr 2021 at 23:27, Sebastian Kaiser @.***> wrote:

Sure. I will take care of it this weekend.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/flogger/issues/145#issuecomment-815276372, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGTWYQHI7ZJEYCDOOSKDKLTHTE5FANCNFSM4LTBNPXQ .

-- David Beaumont :: Îñţérñåţîöñåļîžåţîờñ Libraries :: Google Google Switzerland GmbH., Brandschenkestrasse 110, CH-8002, Zürich - Switzerland

sekaiser commented 3 years ago

Hello,

Thank you very much for investing the time to check the proposal, although it was already outdated given the recent changes in Flogger. I updated the code and replaced the old context data handling with a MetadataProcessor and MetadataHandler. By doing so, I believe I addressed a fair amount of issues such as having MetaKeys with repeated values. I am not 100% certain if I use the MetadataProcessor and the handler in the correct way. Basically, I created a custom metadata handler that can handle a log4j2 StringMap:

MetadataHandler<Log4j2KeyValueHandler> DEFAULT_HANDLER = MetadataHandler
            .builder(Log4j2MetadataKeyValueHandlers.getDefaultValueHandler())
            .setDefaultRepeatedHandler(Log4j2MetadataKeyValueHandlers.getDefaultRepeatedValueHandler())
            .build();

It seems to work though.

// some keys
private static final MetadataKey<Integer> COUNT_KEY = MetadataKey.single("count", Integer.class);
private static final MetadataKey<String> ID_KEY = MetadataKey.single("id", String.class);
private static final MetadataKey<String> REPEATABLE_KEY = MetadataKey.repeated("rep", String.class);

// log it    
backend.log(
FakeLogData.withPrintfStyle("Foo='%s'", "bar")
  .addMetadata(COUNT_KEY, 23)
  .addMetadata(ID_KEY, "test_ID")
  .addMetadata(REPEATABLE_KEY, "foo")
  .addMetadata(REPEATABLE_KEY, "bar")
  .addMetadata(REPEATABLE_KEY, "baz")
);

will generate a log entry like (using log4j default pattern layout): com.google.common.flogger.backend.log4j2.Log4j2Test_01:TRACE in 55f96302 - Foo='bar' count=23 id=test_ID rep=[foo, bar, baz]

I also tested the approach with a custom log4j2 json layout: {"logger-name":"com.google.common.flogger.backend.log4j2.DefaultLoggerTest_01:TRACE in 55f96302","level":"INFO","timestamp":1618179561072,"thread-id":"Test worker","msg":"test","foo":["bar","baz"],"log-type":"default"} `

When it comes to supporting the Thread Context Stack (NDC), I am not sure, what the best approach might be. I wonder if simple tags (those without a value) should be stored in the thread context stack while tags with a value should be stored in the thread context map. While this approach should work with different log4j2 layouts, it may create some confusion as tags would be accessed differently in log4j2 depending on whether they have a value or not. Consequently, you would use different specifiers to include tags in log4j logs:

Use %X by itself to include the full contents of the Map.
Use %X{key} to include the specified key.
Use %x to include the full contents of the Stack.

I did not find the time to explore this yet.

With the introduction of the ScopedLoggingContext, I wonder if the usage of the log4j2 API should be discouraged (except for the layouts) and instead ensure that the LogEvent is complete using Flogger API only. As you wrote, not merging is the simplest way to do this. Therefore, I started to investigate how to build and use a log4j2 specific context data provider so that I can use the ScopedLoggingContext.

I think it might be better to implement a context data provider first and create the pull request afterwards (including proper comments). I am very sorry to keep you waiting @paladox :-(

hagbard commented 3 years ago

I don't think you need to build a Log2j2 specific context data provider. Since ScopedLoggingContext is designed to work with any backend, there should be no need to entangle its implementation with backend specifics.

Just using the gRPC based context data provider should be enough, since people should be using the ScopedLoggingContext API to manage scoped metadata and tags.

What you've done so far sounds like it should be all you need to do (give or take some polishing).

Cheers, David

On Mon, 12 Apr 2021 at 01:26, Sebastian Kaiser @.***> wrote:

Hello,

Thank you very much for investing the time to check the proposal, although it was already outdated given the recent changes in Flogger. I managed to update to the most recent version of flogger and replaced the old context data handling with a MetadataProcessor and MetadataHandler. By doing so, I believe I addressed a fair amount of issues such as having MetaKeys with repeated values. I am not 100% certain if I use the MetadataProcessor and the handler in the correct way. It seems to work though.

// some keys private static final MetadataKey COUNT_KEY = MetadataKey.single("count", Integer.class); private static final MetadataKey ID_KEY = MetadataKey.single("id", String.class); private static final MetadataKey REPEATABLE_KEY = MetadataKey.repeated("rep", String.class);

// log it backend.log( FakeLogData.withPrintfStyle("Foo='%s'", "bar") .addMetadata(COUNT_KEY, 23) .addMetadata(ID_KEY, "test_ID") .addMetadata(REPEATABLE_KEY, "foo") .addMetadata(REPEATABLE_KEY, "bar") .addMetadata(REPEATABLE_KEY, "baz") );

will generate a log entry like (using log4j default pattern layout): com.google.common.flogger.backend.log4j2.Log4j2Test_01:TRACE in 55f96302 - Foo='bar' count=23 id=test_ID rep=[foo, bar, baz]

I also tested the approach with a custom log4j2 json layout: {"logger-name":"com.google.common.flogger.backend.log4j2.DefaultLoggerTest_01:TRACE in 55f96302","level":"INFO","timestamp":1618179561072,"thread-id":"Test worker","msg":"test","foo":["bar","baz"],"log-type":"default"}

When it comes to supporting the Thread Context Stack (NDC), I am not sure, what the best approach might be. I wonder if we should store simple tags (those without a value) in the thread context stack while tags with a value are stored in the thread context map. I did not find the time to explore this yet.

With the introduction of the ScopedLoggingContext, I wonder if the usage of the log4j2 api should be discouraged (except for the layouts) and instead ensure that the LogEvent is complete using Flogger API only. As you wrote, not merging is the simplest way to do this. Therefore, I investigate how to build and use a log4j2 specific context data provider so that I can use a ScopedLoggingContext.

I think it might be better to implement a context data provider first and create the pull request afterwards (including proper comments).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/flogger/issues/145#issuecomment-817391604, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGTWYVYHWQXNTDG44XJIVLTIIVZVANCNFSM4LTBNPXQ .

-- David Beaumont :: Îñţérñåţîöñåļîžåţîờñ Libraries :: Google Google Switzerland GmbH., Brandschenkestrasse 110, CH-8002, Zürich - Switzerland

sekaiser commented 3 years ago

Just using the gRPC based context data provider should be enough, since people should be using the ScopedLoggingContext API to manage scoped metadata and tags.

Thanks for pointing this out. Clearly, I was overthinking it.

Regarding the handling of tags, I changed the approach slightly and I keep it as simple as possible. Instead of filtering Tags based on whether they have a value or not, I simply write all of them to a ThreadContext.ContextStack. By doing so, I treat key/value pairs as String values, e.g.:

MetadataKey<Integer> COUNT_KEY = MetadataKey.single("count", Integer.class);

try (ScopedLoggingContext.LoggingContextCloseable ctx = GrpcContextDataProvider.getInstance()
                .getContextApiSingleton()
                .newContext()
                .withMetadata(COUNT_KEY, 23)
                .withTags(Tags.builder().addTag("foo").addTag("baz", "bar").addTag("baz", "bar2").build())
                .install()
) {
            LogContext<?, ?> logContext = (LogContext) defaultLogger.atInfo();
            logContext.log("test");
            backend.log(logContext); // this event will be caught
}

using standard PatternLayout (<PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg count=%X{count} id=%X{id} %x%n"/>):
=> 11:27:15.178 [Test worker] INFO  c.g.c.f.b.l.Log4j2Test_01:TRACE in 55f96302 - Foo='bar' count=23 id= [foo=[], baz=[bar, bar2]]

using custom JsonLayout:
=> {"logger-name":"c.g.c.f.b.l.DefaultLoggerTest_01:TRACE in 55f96302","level":"INFO","timestamp":1618651069410,"thread-id":"Test worker","msg":"test","count":"23","log-type":"default","ndc":["foo=[]","baz=[bar, bar2]"]}

With this change In mind, I'll start preparing a pull request.

hagbard commented 3 years ago

Thanks for explaining your approach.

For a bit of background, tags and metadata differ in a few subtle, but important ways.

Tags are key-value mappings which cannot be modified or replaced. If you add the tag mapping "foo" -> true and later add the mapping "foo" -> false, you get the value "foo" mapped to both true and false. This is very deliberate since the key space for tags is global and the risk of two bits of code accidentally using the same tag name is real (e.g. you get "id=abcd" but you added "id=xyz" so you think this isn't your log entry, but someone else added "id=abcd" in a context you weren't aware of).

Metadata keys are strongly typed instances which cannot be accidentally overwritten, so they are allowed to have their values replaced (if using a single-valued key) since that has to be deliberate. They also get to inherit multiple values (if using repeated keys), and unlike tags, the values are kept in an insertion ordered list which can contain duplicates (tag values are an ordered set which is deduplicated).

Part of the reason for making tags set-like is also so that as contexts are opened, the new set of merged tags can be efficiently calculated once, rather than processing all tags for every log statement. This really cuts down on the cost of using tags vs metadata. In theory you could cache the "toString()" representation of tags in your backend and key off the "Tags" instance.

The existence of "valueless" tags is also deliberate and a tag without a value just means "this tag exists". It can also be added again with a value, and that does not change the fact it exists. So I agree that it would not be a good idea to filter out tags without values.

So overall tags and metadata both give some form of key/value mapping, but their differing semantics make them best suited for different situations:

Tags are great for answering questions like "does this log statement have the following key/value mapping?" but aren't 100% reliable for answering "what is the value I added for this tag?", since tag mappings can be added by anyone.
Metadata is great for adding structured data which can carry semantics into the backend (e.g. keys and values which are recognized explicitly by a backend to modify behaviour). When using metadata keys, there's no risk of accidental clashing with other code, but the mechanism is a bit most costly at runtime.

None of this is to say I prefer using tags as a single entry, or splitting them into their key/value pairs. However the advantage of Flogger's default formatting (which splits each key/value pair out separately) is that when searching for a tag mapping, you can always find it in the form "key=value" via a simple text search and it's all encoded in a JSON-lite format which could be re-parsed fairly reliably.

In your current output, finding whether the tag "baz" has a value of "bar2" isn't a simple text search because it appears as "baz=[bar, bar2]", not "baz=bar, baz=bar2" and re-parsing the key/value mappings from the text logs would be harder.

However as we don't use the log4j2 backend in Google, if got no objection to doing it either way (though maybe it's worth documenting somewhere that this isn't 100% stable and might get tweaked in the future).

Finally, you shouldn't use "GrpcContextDataProvider.getInstance()" and should instead prefer using "ContextDataProvider.getInstance()". This isolates you from having a different context provider installed.

Cheers, David

On Sat, 17 Apr 2021 at 11:32, Sebastian Kaiser @.***> wrote:

Just using the gRPC based context data provider should be enough, since people should be using the ScopedLoggingContext API to manage scoped metadata and tags.

Thanks for pointing this out. Clearly, I was overthinking it.

Regarding the handling of tags, I changed the approach slightly and I kept it as simple as possible. Instead of filtering tags based on whether they have a value or not, I simply write all of them to the context stack now. By doing so key value pairs are treated as a single String value, e.g.:

MetadataKey COUNT_KEY = MetadataKey.single("count", Integer.class);

try (ScopedLoggingContext.LoggingContextCloseable ctx = GrpcContextDataProvider.getInstance() .getContextApiSingleton() .newContext() .withMetadata(COUNT_KEY, 23) .withTags(Tags.builder().addTag("foo").addTag("baz", "bar").addTag("baz", "bar2").build()) .install() ) { LogContext<?, ?> logContext = (LogContext) defaultLogger.atInfo(); logContext.log("test"); backend.log(logContext); // this event will be caught }

using standard PatternLayout (): => 11:27:15.178 [Test worker] INFO c.g.c.f.b.l.Log4j2Test_01:TRACE in 55f96302 - Foo='bar' count=23 id= [foo=[], baz=[bar, bar2]]

using custom JsonLayout: => {"logger-name":"c.g.c.f.b.l.DefaultLoggerTest_01:TRACE in 55f96302","level":"INFO","timestamp":1618651069410,"thread-id":"Test worker","msg":"test","count":"23","log-type":"default","ndc":["foo=[]","baz=[bar, bar2]"]}

With this change In mind, I'd start preparing a pull request.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/flogger/issues/145#issuecomment-821796219, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGTWYSWTIFG5UOE5TO5NPDTJFITHANCNFSM4LTBNPXQ .

-- David Beaumont :: Îñţérñåţîöñåļîžåţîờñ Libraries :: Google Google Switzerland GmbH., Brandschenkestrasse 110, CH-8002, Zürich - Switzerland

HenrikSkriver commented 2 years ago

Hi @sekaiser @hagbard

I have created the following issue which I think is partly solved after having looked into your pull request for this issue and the following merge. Mapping MetadataKey and Tags to separate log objects fields and out of the log statement/message #309 Allthough, I am using Slf4J/Logback and yours is using the Log4J2 framework, I think the principle is the same - I do not mind switching to Log4J2 though. :) However, the issue, as I see it, seems to be present in both the Slf4J and the Log4J backend implementations in Flogger - namely the "missing" use of the LogData.getMetadata() being mapped to a metadata field, when the LogEvent gets created in the scenario where there is not Exception being handled.

In your flogger-log4j2 backend impl, the relevant part is in Log4j2SimpleLogEvent.toString() that calls the Log4j2LogDataFormatter.appendLogData() which adds the Metadata from LogData as a metadata field at the same level as the logged message - it would appear. This approach seems to achieve exactly what I want, and so I was hoping that you could give my issue a quick read-through and give your oppinion/input.

The PR https://github.com/google/flogger/pull/242 got merged as part of https://github.com/google/flogger/pull/291 but the merge did not include the above mentioned functionality. Instead, the Log4j2LogEventUtil.appendLogData() which populates the metadata field, is now only invoked from the Log4j2LogEventUtil.formatBadLogData() method.

I did not go through alt the messages in #242 and I only noted the mentioning of the appendLogData() method in a comment which linked to issue #272 - but the merge also does not call appendLogData() except from the formatBadLogData() method.

What is the reason for not injecting the metadata as a separate field in the log object?

Thanks in advance.

google / flogger

Log4j2 - Request support for ThreadContext (MDC) #145