hashgraph / hedera-services

Crypto, token, consensus, file, and smart contract services for the Hedera public ledger
Apache License 2.0
268 stars 120 forks source link

Logging Architecture - Base Logger #5424

Open hendrikebbers opened 1 year ago

hendrikebbers commented 1 year ago

Custom logging facade

The design outlined is based on the development of a bespoke logging facade, tailored specifically to meet our unique requirements. Recognizing that components from this repository may eventually be incorporated into different software, we must consider this during development. Consequently, it is essential to implement an adapter capable of redirecting all events from our custom facade to SLF4J. While this adapter may not be utilized directly within this repository, it will be available for any projects that rely on modules from our repository.

Why we need a custom logging facade

To effectively aggregate logging events and messages from both our modules and third-party libraries, the use of a logging facade is necessary. In the Java ecosystem, SLF4J is widely recognized as the predominant logging facade, primarily due to its ability to consolidate logs from various logging libraries. Another crucial aspect of a logging facade is the distinction between its API and its implementation. In this context, Log4J2's API module stands out as it can function as a facade with a custom logging implementation. Log4J2 has certain advantages over SLF4J, including additional features. Additionally, Java offers a basic logging facade through java.lang.System.Logger.

However, each of these options comes with its own set of limitations, leading us to the conclusion that a custom logging facade is the most suitable solution for our needs:

Considering these factors, a tailor-made logging facade emerges as the optimal approach, ensuring that our specific needs are addressed effectively.

Collecting all logging messages and events

We must collect all logging messages from all loggers on the classpath at runtime. There are several patterns that support collecting log messages:

The following diagram shows a concrete idea how the logging architecture can look like in future:

260465382-d08c52cc-59bd-4c37-a0f6-261941f1198b-2 drawio

The next diagram shows a more simple version:

260468447-d26b4a2d-6a82-46ac-b87d-8d0ae4960195-2 drawio

API of our logging facade

In the realm of logging, it's pivotal to broaden our perspective from merely logging messages to logging events. A logging event encompasses much more than just a message; it's a rich source of information. Below are the key features that any modern logging system or facade should incorporate:

  1. Support for Message Placeholders: Our custom facade should allow messages with {} placeholders, similar to SLF4J. This should include support for varargs... and handle various data types, including all primitives, String, null, Supplier, and other object types.

  2. Event Origin Tracking: It's essential to trace the class from where a logging event originates, a standard practice in most logging frameworks. Further details like the method, line number, etc., are worth considering, although their extraction can be time-consuming and should be discussed further.

  3. Timestamp Inclusion: Each logging event must be timestamped to mark its creation time accurately.

  4. Thread Identification: The system should be able to identify and record the thread executing the logging call.

  5. Throwable Cause Handling: The facade must accommodate a Throwable object to represent the cause of a logging event.

  6. Log Levels: It should support various logging levels, including ERROR, WARN, INFO, DEBUG, and TRACE.

  7. Marker Support: The facade should utilize markers (identified by a String name) judiciously. Markers are optional parameters and should be used only when appropriate. For instance, an ERROR marker is redundant and can be managed by the event's log level.

  8. Key-Value Based Metadata: The system should allow for key-value metadata, where both key and value are String-based. This feature is crucial for providing additional context to a logging event, such as the current system, software version, or transaction ID. Like message arguments, Supplier should be supported for metadata values.

  9. Mapped Diagnostic Context (MDC) Support: Our facade should integrate MDC on both a global and thread-local level. Any data stored in the MDC at the time of a logging event's creation should be automatically included in the event's metadata.

These features collectively ensure that our custom logging facade not only captures detailed information about each logging event but also remains versatile and efficient in diverse logging scenarios.

Based on that definitions a logging event look like:

public enum Level {
    ERROR, WARN, INFO, DEBUG, TRACE
}

public record DefaultLogEvent(
        @NonNull Level level,
        @NonNull String loggerName,
        @NonNull String threadName,
        long timestamp,
        @NonNull LogMessage message,
        @Nullable Throwable throwable,
        @Nullable Marker marker,
        @NonNull Map<String, String> context)
        implements LogEvent {}

You'll find the api in the platform-sdk swilrds-logging module

Logging output

The presented architecture intentionally omits specifics regarding the appearance of the logging output. This design choice is a significant advantage, allowing for a high degree of flexibility. Ultimately, it's the responsibility of the chosen logging implementation—log4j2 is provided as an example in the diagram—to determine the nature of the output based on its configuration. This architecture is versatile enough to support a wide range of output options, ranging from simple console outputs to integration with sophisticated centralized systems like Graylog or Kibana.

It's important to note that the specifics of the output formats are beyond the scope of this issue. The architecture is designed to be agnostic to output formats, ensuring that it can adapt to various logging requirements and preferences. This approach allows users to customize the logging output to their specific needs without being constrained by the architecture's design. Such flexibility is key to ensuring the architecture's broad applicability and ease of integration into diverse environments.

Possible feature in future: Using logs for tracing

In microservice-based architectures, tracing is a widely adopted practice that enables tracking the flow of requests across multiple services. This tracking is typically implemented by incorporating tracing metadata into the request and response messages of the systems, often within HTTP headers. Leveraging a logging provider that supports Mapped Diagnostic Context (MDC) can further enhance tracing. It allows for the continuation of tracing within a service by appending the tracing metadata to all related log events for a specific request.

Adopting this approach in our system offers numerous benefits. By integrating tracing information into our log events, we can analyze and understand the flow of interactions within our system more effectively. For a practical guide on simple tracing implementation in logging, you can refer to Datadog's documentation. Additionally, it's noteworthy that the W3C is developing a standard for tracing HTTP headers (Trace Context), which is anticipated to be utilized by frameworks like OpenTelemetry, as outlined in their logging specification.

[!NOTE]
The concept of tracing is mentioned here to provide a comprehensive understanding of the potential enhancements to our system. While there are no immediate plans to implement this feature, it remains a consideration for future specialized use cases. Further details can be explored in this GitHub issue.

As we progress with our local Prometheus and Grafana setup, exploring the possibility of integrating tracing views directly into Grafana for developers is an exciting prospect. This integration could provide a more intuitive and visual means of monitoring and analyzing system flows, similar to the example shown in the provided screenshot.

Grafana Trace View Example

Such an integration would not only enhance the developer experience but also offer valuable insights for system optimization and troubleshooting.

Configuration

Our system requires a straightforward file-based configuration for logging, akin to what is seen in Spring Boot. This configuration should be simpler and less extensive than the XML-based configuration available in Log4J. Essentially, it needs to provide options to enable or disable file and console logging. Additionally, setting logging levels for specific packages or classes should be straightforward and intuitive.

We recomment to maintain a separate logging.properties file alongside the main settings file. This separation has several advantages:

  1. Frequent Changes: Logging settings often change during development, unlike other properties which are relatively static.
  2. Runtime Reloadability: It's crucial that the logging configuration can be reloaded at runtime, enabling adjustments to logging levels for specific classes or packages without the need to reload the entire settings.
  3. Isolation of Configurations: By keeping the logging configuration separate, we ensure that changes in logging do not affect other system configurations.

For practicality, the logging.properties file should be set to reload every second, ensuring up-to-date logging configurations at all times.

Regarding the format of the logging configuration, it should be user-friendly and self-explanatory. Here's an improved representation of the syntax for setting logging levels:

# Global default logging level
logging.level = INFO

# Specific logging levels for packages or classes
logging.level.com.swirlds.common.crypto = DEBUG
logging.level.com.swirlds.common.crypto.Signature = WARN
logging.level.com.hashgraph = WARN

This configuration demonstrates how to set a global default logging level (INFO). It also illustrates how to specify logging levels for packages and classes. For example, everything under com.swirlds.common.crypto is set to DEBUG, except for com.swirlds.common.crypto.Signature, which is explicitly set to WARN. Similarly, all loggers within com.hashgraph default to WARN. This approach ensures that loggers inherit the most specific level defined in the configuration, providing both flexibility and precision in logging management.

To provide developers with more control and flexibility in logging, we propose introducing filters for markers. This feature would allow developers to focus on log messages associated with specific markers, regardless of the logger's log level. If there's demand for such functionality among developers, the configuration for marker filters could be structured as follows:

# Marker filter configuration
logging.marker.CONFIG = ENABLED
logging.marker.CRYPTO = DISABLED
logging.marker.OTHER = DEFAULT

In this configuration, the logging.marker.NAME pattern is used, where NAME represents the marker's name, and the value can be set to ENABLED, DISABLED, or DEFAULT. For example, logging.marker.CONFIG = ENABLED would ensure that all log messages tagged with the CONFIG marker are displayed, irrespective of their log level.

A typical logging configuration file, incorporating these marker filters, might look like this:

# General logging level configuration
logging.level = INFO
logging.level.com.swirlds.common.crypto = DEBUG
logging.level.com.swirlds.common.crypto.Signature = WARN
logging.level.com.hashgraph = WARN

# Marker-specific logging configuration
logging.marker.CONFIG = ENABLED

# Additional handler configuration
logging.handler.NAME.level = WARN

To further refine our logging system, we have incorporated the concept of handlers. These handlers allow for more granular control over logging behavior. Each handler can be distinctly named and configured using the prefix logging.handler.NAME, where NAME is a unique identifier for the handler. This structure enables us to apply handler-specific settings. For instance, logging.handler.NAME.level can be used to set the logging level for a specific handler. All previously discussed features, such as marker filters and log level settings, are also applicable to these handlers.

A particularly useful feature of handlers is the inheritLevels property. This boolean property determines whether the handler should inherit the default level configurations set globally. By default, inheritLevels is set to true. However, it can be turned off to create a handler that focuses exclusively on a specific aspect of logging, such as a particular marker. For example, if you need a handler that only logs entries marked with the CRYPTO marker, your configuration would look like this:

# Handler specific for CRYPTO marker
logging.handler.CRYPTO_FILE.level = OFF
logging.handler.CRYPTO_FILE.marker.CRYPTO = ENABLED

In this configuration, the CRYPTO_FILE handler is set to ignore the global log level settings (level = OFF) but is specifically enabled to log messages tagged with the CRYPTO marker (marker.CRYPTO = ENABLED). This setup allows for the creation of dedicated log files or outputs for specific types of log messages, providing a focused view that can be particularly useful in complex systems or during specific types of analysis.

Overall, the introduction of handlers adds a layer of customization to our logging framework, enabling developers to tailor the logging system to their specific needs, whether it's for general application logging or for monitoring specialized aspects of the system.

This approach provides a comprehensive yet straightforward configuration model, allowing developers to easily tailor the logging system to meet their current requirements. The combination of general log levels, marker filters, and handler-specific settings ensures a highly customizable and efficient logging setup.

Performance

Different logging targets (file logging, ...) have an impact on the performance. Therefore we should have a general look at the performance impact and check if we can define "priorities" for targets: In general a file logging is good enough and we should guarantee that a file logging is up to date. But a Loki logging might be working with a Queue and a special thread that sends the logging to a Loki endpoint. By doing so the logging might be slower (speaking in ms) but do not take so much performance.

The repository https://github.com/OpenElements/java-logger-benchmark contains several JMH benchmarks for all the logger libraries that are currently part of our discussed. Here are some test results:

The following table contains the results of the benchmark for logging a simple "hello world" message:

Logger Logging Appender Operations per second
Chronicle Logger FILE_ASYNC 2224759
Log4J2 FILE_ASYNC 902715
SLF4J Simple FILE 300924
Log4J2 FILE 163218
Java Util Logging FILE 103076
Log4J2 FILE_AND_CONSOLE 89460
Java Util Logging CONSOLE 83442
Log4J2 CONSOLE 72365
Log4J2 FILE_ASYNC_AND_CONSOLE 64143
Java Util Logging FILE_AND_CONSOLE 49268

The following table contains the results of the benchmark for executing the LogLikeHell Runnable that contains all possible logging operations:

Logger Logging Appender Operations per second
Chronicle Logger FILE_ASYNC 57270
Log4J2 FILE_ASYNC 33770
Log4J2 FILE 9880
Java Util Logging FILE 6373
SLF4J Simple FILE 6091
Java Util Logging FILE_AND_CONSOLE 1918
Log4J2 FILE_ASYNC_AND_CONSOLE 1610
Java Util Logging CONSOLE 1539
Log4J2 FILE_AND_CONSOLE 1436
Log4J2 CONSOLE 985

The results show that async file logging is the fastest logging and based on the scenario it is much faster than synced logging but it introduces additional transitive dependencies. Console logging should be deactivated for production.

While Chronicle Logger is the fasted logger it provides a binary output that need to parsed in a special way. Based on that Log4J2 with async support (based on Disrupter library) might be the best option.

Test support

Here's an improved version of your documentation, focusing on clarity and the functionality of the WithLoggingMirror annotation in JUnit 5:

To enhance our testing capabilities, we've introduced the WithLoggingMirror annotation in JUnit 5. This annotation is designed to inject a LoggingMirror instance into a test method or test class. It plays a crucial role in facilitating the inspection and verification of logging events generated during a test. Key features and considerations of the WithLoggingMirror annotation are:

Open tasks:

### Tasks
- [ ] #5457
- [ ] https://github.com/hashgraph/hedera-services/issues/9097
- [ ] https://github.com/hashgraph/hedera-services/issues/10907
- [ ] https://github.com/hashgraph/hedera-services/issues/10908
- [ ] https://github.com/hashgraph/hedera-services/issues/10909
- [ ] https://github.com/hashgraph/hedera-services/issues/9373
- [ ] #5458
- [ ] #5459
- [ ] #5460
- [ ] #5461
- [ ] #5462
- [ ] https://github.com/hashgraph/hedera-services/issues/7993
- [ ] https://github.com/hashgraph/hedera-services/issues/8146
- [ ] https://github.com/hashgraph/hedera-services/issues/8148
- [ ] https://github.com/hashgraph/hedera-services/issues/8422
- [ ] https://github.com/hashgraph/hedera-services/issues/8786
- [ ] #9000
- [ ] #9001
- [ ] https://github.com/hashgraph/hedera-services/issues/9374
- [ ] https://github.com/hashgraph/hedera-services/issues/9376
- [ ] https://github.com/hashgraph/hedera-services/issues/9375
- [ ] https://github.com/hashgraph/hedera-services/issues/9377
- [ ] #9573
- [ ] https://github.com/hashgraph/hedera-services/issues/9631
- [ ] https://github.com/hashgraph/hedera-services/issues/10910
- [ ] https://github.com/hashgraph/hedera-services/issues/11161
- [ ] https://github.com/hashgraph/hedera-services/issues/11740
- [ ] https://github.com/hashgraph/hedera-services/issues/11741
- [ ] https://github.com/hashgraph/hedera-services/issues/11868
- [ ] https://github.com/hashgraph/hedera-services/issues/11869
- [ ] #11830
- [ ] https://github.com/hashgraph/hedera-services/issues/11925
- [ ] #12337
- [ ] #12015
- [ ] https://github.com/hashgraph/hedera-services/issues/12346
- [ ] https://github.com/hashgraph/hedera-services/issues/12176
- [ ] https://github.com/hashgraph/hedera-services/issues/12226
- [ ] https://github.com/hashgraph/hedera-services/issues/12507
- [ ] https://github.com/hashgraph/hedera-services/issues/12175
- [ ] https://github.com/hashgraph/hedera-services/issues/12637
- [ ] https://github.com/hashgraph/hedera-services/issues/12638
- [ ] https://github.com/hashgraph/hedera-services/issues/12921
- [ ] #13081
- [ ] https://github.com/hashgraph/hedera-services/issues/13548
hendrikebbers commented 1 year ago

Open Telemetry Logging API: https://opentelemetry.io/docs/reference/specification/logs/

david-bakin-sl commented 1 year ago

👍

The key/value metadata ability is a great plus! At point of adoption there should also be documentation/policy specifying preferred key names for common data types/frequent Hedera concepts - perhaps even an enum type for these names - also perhaps given functors for emitting certain types in a standard way.

Having common key names will help when groveling logs. We have for example multiple accounts that can be logged for events - it would be nice to use the same spelling for a given account type over all log records. Also, there are multiple "addresses" in different forms as well (esp. considering contracts which map between hedera address-like things (ids of entities) to evm addresses) and for those common functors to produce metadata in standardized formats would be good.

Question: Will we be able to configure logging by level+marker? IOW, overall level is INFO but also log Sidecars:DEBUG. (Actually I don't know if we can already do this or not.)

cody-littley commented 1 year ago

I'm ok using a static logging implementation in production environments, but I'd really like to avoid the pattern of building the loggers using a static log manager.

The main reason for this is testing.

hendrikebbers commented 1 year ago

@cody-littley if you really want to use log event as test checks you could for example use the MDC for the current thread and add a specific metadata for the test. You can log events based on that metadata. Multi-node tests are a scenario we need to discuss. My first guess would be to have an individual class loader for each node. But exactly because that points I mentioned it in the doc.

hendrikebbers commented 1 year ago

@david-bakin-sl the configuration of the output (level+marker) depends on the logging implementation that will be used. I would like to exclude that at the moment since this is more relevant to DevOps. As far as I know log4j2 can do all that and in a worst case you need to write your own handler to add such functionality. But that should be true for any logging framework.

cody-littley commented 1 year ago

Most logs probably don't need to be unit tested. But there are a small number of logs that devops keys off of, and changing how those lines appear have the potential to break downstream pipelines. I'm less worried about how we do it (your suggestion might end up being sufficient), but I think we should eventually write these kinds of tests one way or another.

A classloader per local node might do the trick. Perhaps it's worth discussing plans for this sort of testing as a larger group, as our end goal could change our current design choices.

david-bakin-sl commented 1 year ago

Perhaps it's worth discussing plans for this sort of testing as a larger group, as our end goal could change our current design choices

yes, worth talking about (logs should be tested, if not at debug/trace level, because otherwise they can be broken when you need them: I've seen important log messages in production that are totally borked - wrong/missing/badly formatted information that can't be used. Because: never tested).

but i would also mention that the services repo now contains nice log4j "appender" that can be used in unit tests - you "attach" it to a logger then it gets log messages too - that kind of thing should be supported (though with less gratuituous flexibility)

hendrikebbers commented 11 months ago

Datamodel of OpenTelemetry Logging: https://opentelemetry.io/docs/specs/otel/logs/data-model/

hendrikebbers commented 11 months ago

OpenTelemetry Logging uses number to define the log level (short would work): https://opentelemetry.io/docs/specs/otel/logs/data-model/#field-severitynumber

hendrikebbers commented 11 months ago

We need to have a look at https://github.com/OpenHFT/Chronicle-Logger for a possible logging implementation

rbair23 commented 11 months ago

One discussion I didn't find in the description above was about using SLF4J directly vs. a custom logging facade. I have been a champion of the idea of a custom logging facade, but I have doubts as to whether it is worth the effort and we should use SLF4J and our own logging implementation.

I am certain we do not want to use log4j2. I'm also reasonably certain that we want to integrate logging with our configuration system. I'm also reasonably certain that we will have performance requirements for our logging system that is different than other systems.

My original suspicion was that we would require fewer features than SLF4J offered, and thus, should have a different facade. I also went and looked at the SLF4J sources and was surprised at how complicated the implementation of the facade was. I expected it to be all interfaces, but it isn't. And actually, I recently learned that the JVM does not effectively inline through interfaces, which makes me wonder whether a high performance logging system should be defined via interfaces or classes.

I am also interested in removing as many 3rd party dependencies as we can.

So now that we have a long list of wants, many if not most of which are handled by SLF4J, I want to revisit whether it makes sense to have our own facade or not. Perhaps part of that criteria is some analysis of the performance implications. I think it would be good if we could say definitively "yes, we need our own facade, and the reason why is [fill in the blank]".

hendrikebbers commented 11 months ago

Performance of system.out:

hendrikebbers commented 11 months ago

@rbair23 the idea is to have our own facade (api). SLF4J might come into play to have a facade for all the logging frameworks that are used by our 3rdParty dependencies. As a future step we should provide direct adaptors for our own facade. But by having SLF4J in place at the beginning we only need a SLF4J->Swirlds-Logging Adapter at the start and can concentrate on the logging API, implementation (chronicle-logging?), and how the logging is used (console, file, Loki).

hendrikebbers commented 11 months ago

colors on console: https://github.com/jhannes/logevents

cody-littley commented 11 months ago

The custom logging facade should support markers (that are identified by a name that is defined as a String). This is currently heavily used at least in the platform codebase. A marker should be an optional parameter that is only used if it makes sense. An ERROR marker does not make sense for example since this can be handled by the level of the logging event.

This might be worth discussion. In my opinion, the platform should stop using markers the way we current use them. I think the net effect of markers, as used today, is more of a hindrance than a help.

lpetrovic05 commented 11 months ago

On the topic of static loggers:

On the topic of markers:

cody-littley commented 11 months ago

Log4j allows you to name a logger instance with a string. We mostly just use the class name, but technically it accepts a string too.

If the new framework had this capability, the platform would be capable of creating its own scoped loggers with a few small wrapper classes. There would be no need for services or other users to use non-static loggers, and the platform team would be free to do fancy things internally to get better tests.

hendrikebbers commented 11 months ago

@cody-littley I did some benchmark tests on logger creation and that is always super fast. Still want to measure the memory overhead of non static loggers. But I do not miss that wish ;) [while still assuming that we can do it in a different way....]

hendrikebbers commented 11 months ago

@cody-littley some questions to better understand your needs:

cody-littley commented 11 months ago

Can you define what different kind of logger configuration you need for unit tests?

Mostly, we just need the ability to intercept the logs using java and do validation on them.

Do you need a custom logger for all unit tests or only for specific ones?

Ideally, I'd like to set it up so that if something of WARNING level or higher is logged in any unit test, the test fails by default. For tests where we expect errors or warnings, we'd have to explicitly allow them.

This isn't something we expect the logging framework to do for us. If we can intercept the logs (as opposed to having to attempt to parse them from a file), then we would handle all the rest.

Should a logger that is configured for a unit tests than used by all logger calls that happen in the test?

Yes. The idea is to make it simple to examine logs for individual tests without cross-test pollution.

How do you assume the logger config looks like / what can be configured?

I imagine for production use cases there would be some setup that tells the logger where to write all of its output files, what filters to apply, etc. For unit tests, there would probably be a simple method that just passes each log message to a custom handler or something similar.

Do you need a specific config for individual loggers (like the logger for the class Foo) or a specific config for the logger system?

As long as we can get a callback for logs written during a test, I imagine 99.9999% of tests are going to have the exact same log configuration.

jsync-swirlds commented 11 months ago

I have a lot of thoughts on diagnostics and diagnostic frameworks (calling it logs and logging isn't really accurate), far too much to put in a comment, but here are some highlights.

  1. The default/built-in appenders/handlers/etc... for frameworks are rarely the highest performing, JUL is notorious that the default handlers are slow without good reason (and performance comparisons are highly misleading as a result). Any back-end framework we use we should probably write our own output classes.
  2. Key/Value information is extremely useful, but forcing things to String imposes a surprisingly high cost (toString is often quite expensive). A framework that will allow arbitrary Objects (much as JUL does) and defers creating strings (as JUL did before Java 17) can be surprisingly valuable. I would propose that we follow that approach, to avoid turning anything into a String as long as possible, so that fewer things actually become strings.
    • We might allow a Map instead of an Array of Objects, or we could stick with a simple Array.
    • The parameter entries in message strings should follow the newer Java StringFormat style, rather than the older MessageFormat style. This allows for easier reordering and type specification, as well as possible precision and length limits. It is also much more familiar to community developers.
  3. If we can manage efficient binary serialization, then moving the actual "log" generation off the main node to a "sidecar" node, or similar, can be a very good performance option. The sooner diagnostic data leaves the active machine, the more likely we are to have it in a failure scenario, and the more flexible we can be with how it is processed.
    • If we use a Protocol Buffer format for diagnostic records internally (and encourage any Key/Value entries to be protobuf encodable), this may be a good option.
  4. Static "loggers" are a bit of a headache for lots of reasons. If we can avoid statics (but retain singleton semantics, much as Log4J or JUL do), then we can improve both code maintenance and testability.
    • On testability, one very effective option is to have a "VerifyingTestFilter" class that is added as a filter on all "loggers" during unit test only; this filter then throws AssertionError for any event that matches configured threshold (e.g. anything above "WARNING" or, if we use a more flexible named-numbers level approach, anything above a particular numeric threshold). That would fail any test that produced "significant" diagnostics.
    • The hierarchical configuration possible for JUL and Log4J works well here, apply the unit testing filter to the "root logger" and it then applies to everything (except when specifically excluded, e.g. for misbehaving external libraries).
    • If we are pushing diagnostics out of the system early (right after basic filtering, for example), it isn't too hard to have configurations for testing environments that push data to short-term storage, files on local machines, etc... for immediate examination without a lot of extra effort or cost needed.
  5. For facades, there are quite a lot of these, it turns out. We might look at Log5J, Log6J, Logback, jboss-logging, etc... for additional inspiration. Most "Java Enterprise" systems include a custom logging facade as well; it seems to be a common problem with no "universal" solution, though clearly not for lack of effort.
  6. Overhead of things like lambda suppliers can be very high, and easily surprise an engineer using a facade. If we can avoid these (e.g. via static type handlers rather than a lambda supplier), I suspect we will avoid a lot of unexpected and untestable performance surprises down the road.
  7. APIs with varargs semantics (even if simulated, c.f. Map.of) tend to be the most popular. If we can provide at least some level of varargs support, we may see better uptake of things like detail objects and key/value parameters.
hendrikebbers commented 11 months ago

@cody-littley ok, now I understand your usecase. Yes, we should provide such specific logging handler for tests.

Questions:

hendrikebbers commented 11 months ago

@jsync-swirlds

The default/built-in appenders/handlers/etc... for frameworks are rarely the highest performing, JUL is notorious that the default handlers are slow without good reason (and performance comparisons are highly misleading as a result). Any back-end framework we use we should probably write our own output classes.

yes, but if we start with a fast one we can postpone that task ;) At the end I assume that we will have our custom handlers that for example have a hard coded message pattern since we want to parse the messages to have them in Grafana Loki.

Key/Value information is extremely useful, but forcing things to String imposes a surprisingly high cost (toString is often quite expensive).

We plan to always create file based logging and not forward the logging to any logging tool directly. That will be done by parsing the log files by the tools. By doing so we need to write all informations at the end as UTF-8 strings. Based on that I do not see any benefit of allowing anything else next to string and primitives. If we allow Object then people will just add everything without thinking about the possible performance. If we only allow String people need to call toString() and maybe think about that performance impact.

If we can manage efficient binary serialization, then moving the actual "log" generation off the main node to a "sidecar" node, or similar, can be a very good performance option.

Already thought about that one. In the last version the diagrams had a "binary output" instead of a file output. I assume that we need to decide how much more performance such change has and if it is really worse the effort. In general I like the idea but assume that we can move that to the next iteration once general details are defined.

Static "loggers" are a bit of a headache for lots of reasons. If we can avoid statics (but retain singleton semantics, much as Log4J or JUL do), then we can improve both code maintenance and testability.

I still do not see that point. If the underlying config changes between tests I do not see any problem with static loggers as long as the net config is used. Since @cody-littley already mentioned the same point I will do another round and think about that topic. Additional thoughts are welcome :)

Overhead of things like lambda suppliers can be very high, and easily surprise an engineer using a facade.

So your idea is to not add lambda support?

APIs with varargs semantics (even if simulated, c.f. Map.of) tend to be the most popular.

That's why I added such methods to the API draft: https://github.com/hashgraph/hedera-services/pull/8145/files#diff-5995d829016d5b9dc1087bacd648a2817cf6697cadbbe0aa6403fb1cdd456dd6 Maybe you have not seen it or do I miss something?

Example methods from API:

void error(String message, Throwable throwable, Object... args);

void error(String message, Throwable throwable, Object arg1);

void error(String message, Throwable throwable, Object arg1, Object arg2);
hendrikebbers commented 11 months ago

My thought on static VS non-static loggers:

In general we have 4 patterns how we can use a logger (please correct me if you see more general options). I created a small sample for each pattern that shows how the logger is created, used and reconfigured (like it is needed for unit tests).

Option 1: One non-static logger per instance with static factory

public class MyService1 {

    public void doSomething() {
        final Logger logger = StaticLoggerFactory.get(MyService1.class);
        logger.info("MyService1 is doing something");
    }

    public void execute() {
        final Logger logger = StaticLoggerFactory.get(MyService1.class);
        logger.info("MyService1 is executing");
    }

    public void reconfigureLoggerForSpecialTestCase() {
        StaticLoggerFactory.updateLevel(MyService1.class, Level.TRACE);
        StaticLoggerFactory.updateLevel("some.other.package", Level.TRACE);
    }

}

The service creates a new logger whenever it needs to log something. The loggers are configured and created by a static factory. Since the logger is created every time it is needed a change in the configuration must not update any existing logger. The configuration must only affect the loggers that will be created in the future.

Option 2: One non-static logger per logger call with static factory

public class MyService2 {

    private final Logger logger = StaticLoggerFactory.get(MyService2.class);

    public void doSomething() {
        logger.info("MyService2 is doing something");
    }

    public void execute() {
        logger.info("MyService2 is executing");
    }

    public void reconfigureLoggerForSpecialTestCase() {
        logger.setLevel(Level.TRACE); // this could be an option to only update the logger for one service instance
        StaticLoggerFactory.updateLevel(MyService2.class, Level.TRACE);
        StaticLoggerFactory.updateLevel("some.other.package", Level.TRACE);
    }
}

The service creates a new logger for each instance of the service. The loggers are configured and created by a static factory. The factory can be configured to update the log level for all loggers created by the factory. We have 2 different possibilities how the configuration is defined:

Both options can be implemented.

Next to that having a non-static logger would allow us to set the level for each logger individually. Currently I do not see any strong reason for that use case.

Option 3: One non-static logger per instance with non-static factory

public class MyService3 {

    private final Logger logger;

    public MyService3(final LoggerFactory loggerFactory) {
        this.logger = loggerFactory.get(MyService3.class.getName());
    }

    public void doSomething() {
        logger.info("MyService2 is doing something");
    }

    public void execute() {
        logger.info("MyService2 is executing");
    }

    public void reconfigureLoggerForSpecialTestCase(final LoggerFactory loggerFactory) {
        logger.setLevel(Level.TRACE); // this could be an option to only update the logger for one service instance
        loggerFactory.updateLevel(MyService2.class, Level.TRACE);
        loggerFactory.updateLevel("some.other.package", Level.TRACE);
    }
}

The service creates a new logger for each instance of the service. The loggers are configured and created by a non-static factory. By doing so we need to pass the factory to every class that needs logging. In the platform we already have the PlatformContext that we pass to a lot of classes but logging might even be used by utility functions for example. Passing the factory to each class really creates a lot of overhead.

The factory can be configured to update the log level for all loggers created by the factory. Again we need access to the factory instance in that case. We have 2 different possibilities how the configuration is defined:

Both options can be implemented.

Next to that having a non-static logger would allow us to set the level for each logger individually. Currently I do not see any strong reason for that use case.

Option 4: A static logger per type with static factory

public class MyService4 {

    private final static Logger logger = StaticLoggerFactory.get(MyService4.class);

    public void doSomething() {
        logger.info("MyService2 is doing something");
    }

    public void execute() {
        logger.info("MyService2 is executing");
    }

    public void reconfigureLoggerForSpecialTestCase() {
        StaticLoggerFactory.updateLevel(MyService1.class, Level.TRACE);
        StaticLoggerFactory.updateLevel("some.other.package", Level.TRACE);
    }
}

The common approach how loggers are used in java: a logger is defined as a static field in the class. The logger is created by calling a static factory method.

The loggers are configured and created by a static factory. That call will always update all loggers.

Conclusion

I do not see any benefit in an approach that use a non-static logger. Especially the option with non-static factories creates a lot of overhead that we should try to avoid. Based on that I do not get the benefit that @cody-littley @lpetrovic05 and @jsync-swirlds see in non-static loggers. I assume that I miss something here and hope that the given options provide a good base for future discussions.

hendrikebbers commented 11 months ago

I assume that the problem that @cody-littley sees in a static logger is not really the static state of the logger but the static state of the factory / logger config. If I understand it correctly @cody-littley wants to write tests like that:

public class MyTest {

    @BeforeEach
    void init() {
        //By doing each logging call that has WARN or ERROR as level will fail the test
        LoggingConfig.append("", new FailOnWarnAppender());
    }

    @Test
    void testServiceExecution() {
        //given
        final MyService service = new MyService4();

        //when
        final Result result = service.execute();

        //then
        assertThat(result).isNotNull();
    }
}

In that case any test fails if a message is logged by WARN or ERROR level. If that is not the case for all tests of a module we will have a problem in executing such tests in parallel. Luckily we can use a much easier solution that we had for config, for example. Since we will have the FailOnWarnAppender used in like 99% of all tests we can simply provide a JUnitExtension that sets the handler automatically. The class will look like this:

@FailOnWarnLog
public class MyTest {

    @Test
    void testServiceExecution() {
        //given
        final MyService service = new MyService4();

        //when
        final Result result = service.execute();

        //then
        assertThat(result).isNotNull();
    }
}

Now we can configure JUnit to execute all tests that are annotated by the annotation in parallel and all tests that do not have the annotation in parallel but do not mix that groups.

hendrikebbers commented 11 months ago

I had a chat with @timo0 to discuss other test scenarios. Let's assume you want to check that a method looks a specific method. I assume that is what is needed by some people. While I hope that such test is a very untypical scenario for testing I do not see why that should lead in changing anything in the testable code like passing a logger to the class or method that can be mocked. For such uncommon scenario we can add functionality to the logger API that helps to create that unit tests. When discussion it with Timo I named the feature logging mirror since it mirrors the logging of a function or class to the test. An test that uses such functionality might look like that:

LogMirror mirror = LogFactory.createMirror(ClassUnderTest.class); // Create a mirror for the logger of the class (classname)

ClassUnderTest.callMethodUnderTest();

assertEquals(1, mirror.warnings().size());
assertEquals("Content of message", mirror.warnings().get(0).getMessage());

LogFactory.dispose(mirror);

The given code is not thread safe but I assume that we do not have that many tests for such a scenario that we can not easily define such tests as single threaded.

hendrikebbers commented 11 months ago

A lib that does exactly what I named LogMirror: https://github.com/Hakky54/log-captor

lpetrovic05 commented 11 months ago

I had a chat with @timo0 to discuss other test scenarios. Let's assume you want to check that a method looks a specific method. I assume that is what is needed by some people. While I hope that such test is a very untypical scenario for testing I do not see why that should lead in changing anything in the testable code like passing a logger to the class or method that can be mocked. For such uncommon scenario we can add functionality to the logger API that helps to create that unit tests. When discussion it with Timo I named the feature logging mirror since it mirrors the logging of a function or class to the test. An test that uses such functionality might look like that:

LogMirror mirror = LogFactory.createMirror(ClassUnderTest.class); // Create a mirror for the logger of the class (classname)

ClassUnderTest.callMethodUnderTest();

assertEquals(1, mirror.warnings().size());
assertEquals("Content of message", mirror.warnings().get(0).getMessage());

LogFactory.dispose(mirror);

The given code is not thread safe but I assume that we do not have that many tests for such a scenario that we can not easily define such tests as single threaded.

Considering the fact the we look at logs in the integration tests, this would not be untypical. This also doesn't work well if we have multiple instances of the same class in a test.

hendrikebbers commented 11 months ago

I we do not want to have that one in the Logging API we could provide a JUnit extension that sets System.out / System.err and provides similar functionalities. Only difference here is that the Logger must log to Console in that case.

hendrikebbers commented 11 months ago

I had a call with @lpetrovic05 about a special use case:

Let's assume you write a kind of integration test that starts 4 platform instances in the same JVM. Today integration tests like that always analyze the logging to get parts of the results that must be tested. Today multiple platforms are not started in the same JVM for tests but it might become an option in future.

In that case all 4 platforms would log to the same output (file/console/...) and the problem that Lazar sees is that you can not extract the logging for a single platform anymore.

From my point of view that is not really a problem. Since we want to add a context / metadata to the logging I assume that each log message that is generated contains the id of the parent platform as metadata. By doing so the log can easily be filtered.

hendrikebbers commented 11 months ago

[!IMPORTANT]
When talking about static VS non-static logger we sometimes mean the logger and sometimes mean the logger factory. For our discussions we need to be more concrete here :)

hendrikebbers commented 11 months ago

[!IMPORTANT]
The Logging API must be usable at least for all modules in the repo. That includes base / hashgraph & data / services. Since not all of the modules have access to the platform context it is not a solution to add the logger factory to the platform context. That will not work for all the other modules and would end to pass the platform context or a logger factory even to static methods that need logger access.

jsync-swirlds commented 11 months ago

Important When talking about static VS non-static logger we sometimes mean the logger and sometimes mean the logger factory. For our discussions we need to be more concrete here :)

Speaking for myself, the logger factory should always be static. The loggers themselves, however, should not be. It's not just testing, it also increases code maintenance because static definitions have to be changed in many more scenarios than non-static, so having a non-static logger (e.g. private final ClassLogger = LoggerFactory.get(getClass().getCanonicalName());) leads to a simpler and more consistent usage pattern which almost never needs to be updated.

cody-littley commented 11 months ago

@hendrikebbers

Should the FailOnWarnHandler be activated by default for each test and needs to be deactivated by config if it should not be used?

Ideally yes, although we will probably have to fix a large number of tests to make that possible.

I assume that we can provide such handler in our test-framework module and have no need to create custom handlers for tests in general, correct? having 1 or 2 custom ones is fine but we should avoid to make the setup more complexe.

Most tests can probably use the exact same validation code, so it makes sense to have something in the framework.

cody-littley commented 11 months ago

In addition to unit testing, there is one other use case that it would be helpful to have non-static loggers.

When coding, a major tool we use is running a small four node cluster inside a single JVM. It's super simple and fast, and we can do things like halt all four nodes using a debugger.

When testing this way, it would be really nice if we could have logging that is scoped to a node. When scoped to a class, log statements from different nodes are, for all intents and purposes, indistinguishable from each other.

Eventually, we even hope to write junit tests that spin up multiple nodes. We could replace 30 minute JRS tests with 30 second, single machine tests with the exact same coverage as before. If we have no way of separating one node's logs from another's, then this style of testing is basically impossible.

jsync-swirlds commented 11 months ago

The default/built-in appenders/handlers/etc... for frameworks are rarely the highest performing, JUL is notorious that the default handlers are slow without good reason (and performance comparisons are highly misleading as a result). Any back-end framework we use we should probably write our own output classes.

yes, but if we start with a fast one we can postpone that task ;) At the end I assume that we will have our custom handlers that for example have a hard coded message pattern since we want to parse the messages to have them in Grafana Loki.

The issue comes in that the "performance testing" only tests the performance of the output classes, and very thoroughly hides underlying platform performance. For instance, on Java 8, with a "Handler" that performs equally to the Log4J appender, JUL outperforms Log4J by quite a lot. Also, the use of Suppliers vs. type interceptors can also change performance by a huge amount. What I'm really pointing out is that performance metrics and comparisons for most "logging" systems is very nearly useless unless you dive very deep and work out what part of each system is fast or slow and how that interacts with your own current and planned usage. It gets even worse in that, as your usage patterns change, one choice can go from the best to the worst choice without warning or any simple mechanism to change it.

Key/Value information is extremely useful, but forcing things to String imposes a surprisingly high cost (toString is often quite expensive).

We plan to always create file based logging and not forward the logging to any logging tool directly. That will be done by parsing the log files by the tools. By doing so we need to write all informations at the end as UTF-8 strings. Based on that I do not see any benefit of allowing anything else next to string and primitives. If we allow Object then people will just add everything without thinking about the possible performance. If we only allow String people need to call toString() and maybe think about that performance impact.

File based logging is about the worst thing we could do from a performance perspective. Translating to strings is costly to begin with. Parsing those strings is even more wasted processing. Furthermore, every file write takes away from the filesystem writes or reads available to the main node software, and we're about to become extremely sensitive to filesystem performance. In my opinion writing files and then parsing that output again is the very worst approach we can take. Further, creating a slower and more costly framework to support the "files forever" assumption is tying our legs together before starting a footrace. I advocate that we should view local file logging as a last resort or development environment option, and build on the assumption we cannot rely on local files in almost all situations. The expectation to send diagnostic data off-node at an extremely early stage, with efficient binary serialization, should be the primary design goal. As to people adding Objects, that's exactly what we should support efficiently. Yes, people will just add Objects and not think much about cost, and (within some limits, c.f. type interceptors noted elsewhere) that's exactly what we should encourage. Again, it isn't that people add Objects, it's that we are inappropriately turning everything into strings when we should avoid creating any strings at all.

Fundamentally this is a design philosophy issue. Everyone tries to rebuild "syslog" with their logger because the words "logger" and "logging" innately carry the assumption of line-oriented text output. Part of why we have so many "logging" frameworks (almost everyone does one) is because of this fundamental error in concept. If we follow that path and build a system optimized for line-oriented text output, however, we will also, inevitably, fail to accomplish what is actually needed. Our goal should be to produce diagnostic data efficiently and in a manner that can be delivered efficiently to systems that manage those diagnostics (note, Grafana is Swirlds Labs' choice currently, but is not the only choice node operators will use). A binary output delivered via local network (out of band and on different links from the Hashgraph traffic) to a separate system for initial processing, compression, and redelivery to remote management systems is the first path we will need if we want this to work (text files can be added as an alternate approach later).

If we can manage efficient binary serialization, then moving the actual "log" generation off the main node to a "sidecar" node, or similar, can be a very good performance option.

Already thought about that one. In the last version the diagrams had a "binary output" instead of a file output. I assume that we need to decide how much more performance such change has and if it is really worse the effort. In general I like the idea but assume that we can move that to the next iteration once general details are defined.

The primary note here is that binary output should be primary and network based rather than an afterthought or add-on. File-based outputs should be a separate component that intercepts the binary output and serializes that to a local text file.

Static "loggers" are a bit of a headache for lots of reasons. If we can avoid statics (but retain singleton semantics, much as Log4J or JUL do), then we can improve both code maintenance and testability.

I still do not see that point. If the underlying config changes between tests I do not see any problem with static loggers as long as the net config is used. Since @cody-littley already mentioned the same point I will do another round and think about that topic. Additional thoughts are welcome :)

Static "logger" instances (the factory should remain static, by the way) require more maintenance and are much harder to intercept or spy. The use of static versus non-static "logger" instances is an usage style concern, however, not really a concern of the framework (except that the framework should ensure both options work correctly and neither is difficult to accomplish).

Overhead of things like lambda suppliers can be very high, and easily surprise an engineer using a facade.

So your idea is to not add lambda support?

Correct, I would recommend we defer lambdas and Suppliers to later iterations, and clearly mark those options as discouraged in any code that might be performance critical. I would also argue that we should not include Suppliers in the log record. The metadata and arguments should just be Object (and might be a Supplier, but the transmitters and formatters can deal with that). I would further recommend the ability to add type interceptors (adding classes in the configuration to handle specific types in args and metadata so those types can be used "natively" without a supplier) as a preferred alternative to Suppliers. Further, I believe we must support adding arbitrary "args" to "logger" calls that are not intended to be in the "message" (i.e. will not transform into inline inserts in a single "log line") but are retained and transmitted as additional detail to the diagnostic management system of choice. e.g.

ClassLogger.log(Level.INFO, "The merge process encountered %2$d pause events, which exceeds threshold of %3$d.", mergeConfig, pauseThreshold, pauseCount, mergeEventTracker);

The example should retain all 4 objects, as close to the original binary content as possible, until delivered to the final destination system (Grafana, ELK, GCP monitoring, HDFS, etc...) which will store that data in whichever form it prefers (text lines, JSON, custom binary, etc...) for analysis and presentation. Forcing them to string (particularly if toString is used, and worse if they must be stuffed in the message content to be retained) is counterproductive and will lead to yet another framework that is not meaningfully better than the hundred other similar frameworks.

APIs with varargs semantics (even if simulated, c.f. Map.of) tend to be the most popular.

That's why I added such methods to the API draft: https://github.com/hashgraph/hedera-services/pull/8145/files#diff-5995d829016d5b9dc1087bacd648a2817cf6697cadbbe0aa6403fb1cdd456dd6 Maybe you have not seen it or do I miss something?

Example methods from API:

void error(String message, Throwable throwable, Object... args);

void error(String message, Throwable throwable, Object arg1);

void error(String message, Throwable throwable, Object arg1, Object arg2);

I haven't had time to review the full code details yet (and it's been changing a good bit). The above options are roughly what I would hope to see (although I'm not a huge fan of level-named "convenience" methods, but that's just preference).

artemananiev commented 11 months ago

A few random thoughts from my side.

I don't quite understand why / what for we use markers today. As Cody mentioned above, there is no need to have logger names match package names. If multiple classes create a logger with the same name, like EVENT_STREAM or MERKLE_DB, it would work just fine. Configuration would be simplified, too. Today we have having ability to set levels/appenders/etc. per package and ability to do the same per marker. I've always found it confusing. Is there really a need to do something like this: log all events for this marker at level INFO, but for this package (which may have classes that use the same marker) as DEBUG?

Ability to have global / per-thread context looks very useful. Today this context is usually just a part of the message, which makes them hard to read. It would be better if it looked like Prometheus metrics (but with logs instead of metrics):

Something happened {nodeId = "foo", round = "bar", ...}

These dimensions can be specified directly for a message, and everything from the context is appended automatically, too. What about events that span multiple threads? For example, we need to hash a tree. Hashing is multi-threaded. It would be very helpful to have a round ID in the context, but it can't be local to a thread, nor global to the process.

Testing. Two aspects here. First, it would be great to test log messages to make sure they are usable (contain critical info) in production. Second, sometimes the only way to check if everything is right or wrong is to check the logs. There should be a way to access log messages from test code.

hendrikebbers commented 11 months ago

@jsync-swirlds

For instance, on Java 8, with a "Handler" that performs equally to the Log4J appender, JUL outperforms Log4J by quite a lot.

Good to know. Let's define it like that: Today we use Log4J and that could be what we use as a first impl of our facade. Once that is done a deep dive in performance of log system and common / custom handlers is needed.

File based logging is about the worst thing we could do from a performance perspective.

No, the worst thing is console logging :D Having file logging is currently a request that comes from DevOps. My idea was a "push to LOKI" appender before I talked to DevOps. But again. That is something that can be change later since the appender should be implementation detail.

As to people adding Objects, that's exactly what we should support efficiently.

But then we need custom converters for all the objects or call toString() at the end anyway. Independent of the appender we need to serialize the objects. I do not see the benefit.

Static "logger" instances (the factory should remain static, by the way) require more maintenance and are much harder to intercept or spy.

As long as the factory is static I'm fine. But I still do not see how a static logger require more maintenance or how you want to spy it (without adding code that we do not want like a setter for the logger). Maybe I need a code example for that.

Correct, I would recommend we defer lambdas and Suppliers to later iterations, and clearly mark those options as discouraged in any code that might be performance critical.

Fine for me to not have that method in version 1 if the performance is really a problem. Do you have a good read with benchmark results about that?

The metadata and arguments should just be Object (and might be a Supplier, but the transmitters and formatters can deal with that).

From my point of view it makes the code less readable if some of the args are suppliers. In addition you would end with the same performance problem as if the full message would be provided by a Supplier.

Further, I believe we must support adding arbitrary "args" to "logger" calls that are not intended to be in the "message" (i.e. will not transform into inline inserts in a single "log line") but are retained and transmitted as additional detail to the diagnostic management system of choice.

That is already part of the API. Instead of adding them just as messages you add them as context to the log:

logger.withContext("name", "value").log("Hello {}", name);

Forcing them to string (particularly if toString is used, and worse if they must be stuffed in the message content to be retained) is counterproductive and will lead to yet another framework that is not meaningfully better than the hundred other similar frameworks.

Again: How will you serialize them? Please give a more concrete example / idea.

hendrikebbers commented 11 months ago

@artemananiev

I don't quite understand why / what for we use markers today.

Because we use marker in a way how they should not be used :D Adding a marker to just every call makes zero sense.

As Cody mentioned above, there is no need to have logger names match package names. If multiple classes create a logger with the same name, like EVENT_STREAM or MERKLE_DB, it would work just fine.

I do not know if we want to change the concept of all other loggers here. We have modules that all have a unique package name. I would say that we can easily use that name. By doing so we can still change the logging for one explicit class. Think about a util class that is called thousands of times and produces logging. Maybe you want to deactivate that. Being able to define log levels based on logger names (that are based on the class) and markers is a really flexible system from my point of view. But a good point we can discuss in the meeting today.

These dimensions can be specified directly for a message, and everything from the context is appended automatically, too.

That is exactly how my demo is working. Logging output of my super simple logger impl currently looks like that:

2023-08-24 08:39:56 INFO [main] com.swirlds.logging.v2.demo.LoggingDemo - Hello world! - C:{app=demo, context=value, transaction=17}

These dimensions can be specified directly for a message, and everything from the context is appended automatically, too. What about events that span multiple threads?

While that is an open problem today we can solve it once we have a resource manager that is the official place to create executors. We can easily provide APIs that forward the context of a calling thread to a pool thread while that one is executing a task (similar to how tracing is working in micro services).

Testing.

The Logger API will provide functionality for testing. Currently I really like the idea of the Logger Mirror and hope that it fits for all common use cases.

hendrikebbers commented 11 months ago

@cody-littley

When coding, a major tool we use is running a small four node cluster inside a single JVM.

I did a zoom call with @lpetrovic05 and talked exactly about that. Today that would be a problem. But once we have a resource manager I assume that each thread only belongs to 1 platform (ignoring main thread, etc.). In that case the platform id will automatically be added to the context of the thread and therefore will be part of each log message. That should solve the problem

jasperpotts commented 11 months ago

Bunch of thoughts.

Static

I like the simplicity of static but share Cody's concern of being able to run more than one node in a JVM would be nice. Maybe it is OK for those to have mixed logs or maybe we can be cleaver and at the point the static loggers send data back to central logger we can look at the threads calling and based on which node the thread belongs to send it to appropriate logs.

Custom Facade

I like the idea of our own facade as it gives us the opportunity to have our own backend implementation. Also if we can remove 3rd party libs that need logging then we can drop SLF4J and Log4J etc from class path. Also there are plans to have our own very fast disk access APIs and we can use those in our own implementation writer code.

Markers

We need to make sure these can be implemented fast, Log4J markers come at a horrible performance cost.