NcsaRequestLogger enhancements

huxi commented 8 years ago

I’d like to enhance NcsaRequestLogger a bit:

change the DateTimeFormatter used to always use Locale.ENGLISH (not Zone). This is a trivial change, simply adding .withLocale(Locale.ENGLISH); to the private static FORMATTER.

A timestamp like [14/Mär/2016:15:01:10 +0100] (the output of the current implementation in Germany) is problematic to parse and also a highly uncommon date format. I consider this one a bug.
add status-code-based Markers to the logger calls, e.g. “status-2xx”, “status-4xx” etc.) to enable better filtering options in the logger configuration. I mean general markers for all status codes of a given class (Informational 1xx, Successful 2xx, etc.), not markers for specific status codes. "status-4xx" but not "status-404".
Use slf4j message pattern for formatting instead of the currently used String.format.

This provides better performance, formatting is performed lazily and it also opens the opportunity to filter log-events based on the message pattern. I'm aiming for two distinct patterns: one including the optional id and one without.
this one might be a bit controversial...

I’d suggest to log at WARN level in case of 4xx and 5xx or if status code is not in the range 100-599...
write related tests.

Current spec only checks if info is called twice for two requests to Ratpack.

I'd start implementing those changes and preparing a PR after the core devs all had a chance to comment.

danhyun commented 8 years ago

As far as log level for error codes, I'm fine with WARN but I do know that there are other use cases (hyperic) where WARNs will start cascading error alerts.

I'd be happy to see this implemented

@ratpack/committers thoughts?

ldaley commented 8 years ago

Forcing the locale to English makes sense.

Wouldn't the status code markers then make us not be using NCSA format?

Slf4j message pattern… yes please.

Regarding the warning logging, I don't think that should be the default. I could be convinced of an opt in though.

More tests… yes please.

huxi commented 8 years ago

The markers are invisible if not explicitly enabled in the message pattern.

In log4j2, a config making use of markers would look like this

<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="INFO">

  <Appenders>
    <List name="requestList" />
    <List name="serverErrorRequestList">
      <MarkerFilter marker="status-5xx" onMatch="ACCEPT" onMismatch="DENY"/>
    </List>
    <List name="errorRequestList">
      <Filters>
        <MarkerFilter marker="status-5xx" onMatch="ACCEPT" onMismatch="NEUTRAL"/>
        <MarkerFilter marker="status-4xx" onMatch="ACCEPT" onMismatch="DENY"/>
      </Filters>
    </List>
  </Appenders>

  <Loggers>
    <Logger name="ratpack.request" level="INFO" additivity="false">
      <AppenderRef ref="requestList"/>
      <AppenderRef ref="serverErrorRequestList"/>
      <AppenderRef ref="errorRequestList"/>
    </Logger>

    <Root level="INFO"/>
  </Loggers>

</Configuration>

(List appenders are only used for testing...) The above serverErrorRequestList would only receive events with a 5xx status code while errorRequestList would receive both 4xx and 5xx.

My personal use-case is writing 4xx and 5xx events into two separate files while ignoring all other events so performance isn't degraded unnecessarily.

Making the WARN optional sounds like a good plan. I was a bit uneasy about this part myself since it would change the current behavior while all the other changes are "under the hood".

ldaley commented 8 years ago

@huxi sounds good.

huxi commented 8 years ago

Any preference how this option should be exposed in the public API?

Current API is RequestLogger.ncsa() and RequestLogger.ncsa(Logger). This would still generate a RequestLogger that is logging at INFO level regardless of the status code.

One way would be to add RequestLogger.ncsa(boolean loggingErrorStatusCodesAsWarn) and RequestLogger.ncsa(Logger, boolean loggingErrorStatusCodesAsWarn) but I think it would be nicer/cleaner to use a different method names instead of a boolean toggle.

How should I name those methods?

johnrengelman commented 8 years ago

@huxi I would suggest adding a configuration type object to NcsaRequestLogger that would handle these types of configurations. Then in the API, would we would retain RequestLogger.ncsa() (must keep this for compatibility) and add RequestLogger.ncsa(Action<? extends NcsaRequestLogger.Config> configure)

huxi commented 8 years ago

@johnrengelman But that would leak the internal class ratpack.handling.internal.NcsaRequestLogger into the public API, right?

johnrengelman commented 8 years ago

You can make the config class to be part of RequestLogger...just would need to be generic enough... Or It could be a NcsaLoggerConfig that's part of the public API.

ldaley commented 8 years ago

Fundamentally, I think we need to allow the user to specify a function that decides how the request should be logged.

I think we should add this to the RequestLogger class…

interface Router {
  // Receives the formatted log message, and logs it somewhere (e.g. slf4j logger)
  void log(RequestOutcome requestOutcome, String msg, Object[] args)
}

static RequestLogger ncsa(Router router)

The router msg and args params will be in the Slf4j message format. If the logging is being routed directly to slf4j, then they can just be passed through. If they are going elsewhere, users can use Slf4j's MessageFormatter.

huxi commented 8 years ago

@alkemist The problem regarding the Router interface is that logging itself is just one part of the problem. The other part is checking whether logging is actually necessary via isInfoEnabled(Marker) or isWarnEnabled(Marker) respectively.

ldaley commented 8 years ago

The router can do whatever it wants, including these checks.

This is a more general solution that allows the user to log the message however they want.

huxi commented 8 years ago

I was wondering whether logger.info(...)/logger.warn(...) calls should be put into a Blocking block. Logging can be asynchronous but doesn't have to be...

ldaley commented 8 years ago

That gets messy very quickly. I think it's getter to just recommend that people use async logging.

huxi commented 8 years ago

@alkemist That's true re:"router can do whatever it wants" , but the pattern and parameters would have to be created before the Router.log is even called while they won't if isXxxEnabled(Marker) already declines. Only status code is evaluated at that point.

ldaley commented 8 years ago

Let's add a method to RequestLogger …

static String ncsaFormat(RequestOutcome requestOutcome) {
  …
}

That way people can create custom implementations of RequestLogger, and use this method to generate the log line when/if they need to and log however they wish. I don't think it's worth complicating the API by returning a format/args tuple here. Instead we should rely on the user to only call this when it is actually going to be logged.

huxi commented 8 years ago

What would be the advantage vs. current RequestLogger.log and a custom implementation of the interface? I thought ncsa() was just meant to hide the internal class from the public API.

(I have to admit that I really need to take a nap. I might miss something...)

ldaley commented 8 years ago

The useful thing about ncsa() is that it gives you the log line in the right format. What this change is proposing is separating that from how it is logged. So, if you require something more sophisticated than the current strategy you can impl a custom logger but still get the right formatting.

huxi commented 8 years ago

This would probably be a good use-case for logback-access...

I wanted to implement a lb-access RequestLogger after this thingy. The problem is that I would first have to refactor some parts of Logback (see LOGBACK-1142).

ldaley commented 8 years ago

@huxi I've rearranged things to detangle the formatting from the routing, see here: https://github.com/ratpack/ratpack/commit/454e191645481c632e4d54e525a1c8e39f2a1352

WDYT we should do now? We could either add sugar for the marker scheme that you proposed, or leave it as it is and recommend users use a combination of RequestLogger.of() and RequestLogger.ncsaFormat() if they need something more sophisticated.

huxi commented 8 years ago

@alkemist I honestly don't know what to do next.

I initially proposed:

Use slf4j message pattern for formatting instead of the currently used String.format.

This provides better performance, formatting is performed lazily and it also opens the opportunity to filter log-events based on the message pattern. I'm aiming for two distinct patterns: one including the optional id and one without.

to which you replied:

Slf4j message pattern… yes please.

Then I implemented everything as previously discussed, including tests, including fixes to tests that are mocking the logging (Making them fragile to internal changes, naturally. Those test fixes wouldn't have been necessary if the tests had used e.g. the log4j2 test support classes instead of mocking the logging framework) and sent a pull request.

And then you proposed - and implemented - something that is essentially the opposite of my original suggestion, eagerly formatting the log messages in a Formatter instance to always log using a String instead of ever using a message pattern that can be used for filtering and lazy message evaluation.

I obviously failed to explain the rationale behind my proposed changes and don't understand what made you change your mind about it.

ldaley commented 8 years ago

@huxi this isn't necessarily done. One thing that I want to ensure about the end solution is that the what to log is detangled from the where and when.

The performance argument for creating the string doesn't wash. There's no way that parsing out the pattern is faster than building the string. If it has function though, we should propagate it in that fashion. Let me change the formatter to pass the pattern and args.

huxi commented 8 years ago

@alkemist It's faster if it's not done at all because a filter declined creation of the logging output based on Marker, message pattern or arbitrary other rule. In that case, only an Object[] containing the already existing message arguments is created and later dropped.

With my proposal, the actual log message is only ever actually created if the logging framework decides it needs it. With your proposal, the message is always created and then dropped if the logging framework decides it's not interested after all.

In my actual use case, your proposal would format log messages for 1xx, 2xx, 3xx events (i.e. the vast majority) even though I'm actually only interested in 4xx and 5xx messages. And I express my interest in 4xx and 5xx in the logging configuration, not the code of the app.

This "expression of interest" can be arbitrarily complex (Groovy code in the logging configuration vs. just using Level or Marker) and the only way to prevent unnecessary creation of strings is to use message patterns and creating the message itself only if needed, lazily.

It's not just about the immediate performance of formatting the message (slf4j is pretty efficient in that regard and way faster than String.format) but also about additional GC pressure introduced by the creation of actually unnecessary strings.

huxi commented 8 years ago

To give another example: a SocketAppenderwouldn't actually format the message at all since events are transferred using message pattern and argument strings.

ldaley commented 8 years ago

That's all fine. I'm happy to pass the pattern and args around.

That doesn't obviate the need to separate the formatting from the routing though. What I think we'll do is change the formatter to close over the router, so it effectively acts as an inline transform. This is the cheapest arrangement I can think of.

huxi commented 8 years ago

I don't see how putting pattern and args into the interface would help. Logging with arguments ["foo", "bar", "foobar"] will always result in a message like "*foo*bar*foobar*" since the order of the arguments is fixed. One wouldn't be able to create a result like "foobar foo bar" in that way. The only thing one could achieve would be to replace the * above with some other strings. Is that actually a use case? It definitely wouldn't be an ncsaentry anymore.

I think I see what you would like to achieve and separation of concerns is usually a good idea but it directly prevents achieving maximum logging efficiency in this case. The concerns are already separated and formatting the log message is supposed to be the concern of the logging framework.

huxi commented 8 years ago

Let me tackle this from a different perspective since I don't really understand that point myself.

What would a String Formatter.format(String messagePattern, Object[] messageArguments) interface be trying to achieve?

Assumption A

Implementations should perform formatting of slf4j-like message patterns. format("{} {} {}", ["foo", "bar", "foobar"] should always result in the string "foo bar foobar".

Assumption B

Implementations should perform formatting according to different formatter styles. One implementation would understand slf4j-like message patterns while another implementation would format using printf-style placeholders.

If Assumption A is the case then I could only imagine exactly one reasonable implementation of the interface: using org.slf4j.helpers.MessageFormatter, since that implementation is already on the classpath.

I see no benefit of using the Assumption A formatter versus just calling the logger method with the exact same arguments as the formatter and let the logging framework perform the formatting lazily.

If Assumption B is the case then the formatter abstraction would be leaky since formatters wouldn't be exchangeable. A request logger would have to know exactly what kind of pattern the used formatter is expecting and would have to provide messagePattern and messageArguments in the "correct" way.

Neither Assumption A nor B would enable us to use some other more sophisticated message formatting techniques like "{requestId}" pattern with messageArguments being a Map<String,Object>.

ldaley commented 8 years ago

What we want to end up with is something like this:

RequestLogger has-a RequestLogger.Formatter has-a RequestLogger.Router


interface RequestLogger {

  interface Router {
    void log(RequestOutcome requestOutcome, String pattern, Object[] values)
  }

  interface Format {
    RequestLogger logger(Router router)
  }
  …
  Format ncsaFormat()
  …
}

We'd then bust up your change into a Format implementation, and a Router implementation or two. We'd at least ship a simple one that just routes to an slf4j logger at INFO level. We would need to then consider whether the extra functionality you have proposed should be merged into this, or should be a separate Router implementation.

For completeness, a trivial routing implementation would look like this…

RequestLogger.ncsaFormat().logger((requestOutcome, pattern, args) -> {
  if (requestOutcome.getStatus().is5xx()) {
    RequestLogger.LOGGER.error(pattern, args);
  } else {
    RequestLogger.LOGGER.info(pattern, args);
  }
})

huxi commented 8 years ago

But that still wouldn't work for the two different patterns MESSAGE_PATTERN_WITHOUT_ID and MESSAGE_PATTERN_WITH_ID that are used depending on the existence of a requestId.

huxi commented 8 years ago

Ah, just took a look at your latest changes. It will create the args[] regardless of any quick-exit checks like isInfoEnabled(Marker)/isWarnEnabled(Marker).

huxi commented 8 years ago

Such a Router interface would need two methods:

interface Router {
    boolean isProbablyLogging(RequestOutcome requestOutcome);
    void log(RequestOutcome requestOutcome);
}

but even then it would have to resolve the used Markertwice to keep the interface implementation agnostic.

ldaley commented 8 years ago

I think that might just be the cost to pay. If someone really wants to do that, they can put something in front of the formatter and “router”. We have to trade something off here.

huxi commented 8 years ago

The thing is that I don't see the big advantage of the router interface that would justify a performance impact. My implementation of the RequestLogger interface is ideal (to my knowledge*) regarding performance and reduced garbage collection pressure. One could as well argue that someone would just have to implement the RequestLogger interface if they want something else. It's not a complicated interface at all to begin with.

While I see one use case beside WARN/INFO logging (Blocking logging call vs. just logging) I have a feeling that we are currently overengineering a simple, compact piece of code.

(* benchmarking might find out that a lookup on a Map<Integer,Marker> with 599 entries is even faster than my int comparisons to resolve the correct Marker - but I doubt it.)

ldaley commented 8 years ago

Performance is only a part of this. We have to trade that off against convenience, but leave enough space in the pieces that someone can take complete control if they require that level of control.

The key thing that my changes are trying to address is decoupling the formatting from the opinions about how it is to be logged. It's reasonable to expect that someone would want to log in NCSA format (and whatever other formats there end up being) but using a different scheme. That should be possible. It wasn't before.

huxi commented 8 years ago

But then you'd need something like this:

interface Router {
    // not functional.
    // could be split into two functional interfaces, but that would allow invalid combinations.
    boolean isProbablyLogging(RequestOutcome requestOutcome)
    void log(Message message)
}

interface MessageProducer {
  Message produceMessage(RequestOutcome requestOutcome)
}

class Message {
  String pattern
  Object[] arguments
}

Because you wouldn't want to make routing interchangeable while leaving the generation of the message hardcoded. And you'd like to mix and match different message formats with different routing strategies. Why shouldn't they be decoupled? Nothing would prevent this.

I can only guess what a "convenient" builder method in RequestLogger would have to look like.

RequestLogger.with(LoggerFactory.getLogger("my.request.logger")).
    andFormatter(new NcsaFormatter()).
    andRouter(new InfoLevelRouting()).
    build()

I don't think that this is a good idea. I'm just walking down this path to the end.

As I said previously:

This would probably be a good use-case for logback-access...

In that case, the whole routing and message formatting would be handled in the logback-access configuration (i.e. a logback-access.xml file) since it's working directly on an IAccessEvent abstraction.

Such a logback-access RequestLogger implementation wouldn't use the above abstractions at all. Neither would some imaginary SNMP implementation use it. They are strictly limited to implementations using an slf4j (or log4j2) logger as their backend and thus shouldn't be part of any public RequestLogger API. They are implementation details leaking to the surface.

My suggestion was: Deliver one "perfect" implementation that people would receive by calling the ncsa() method as before, an implementation that would suit 99.99% of the users "as is", while not changing the public API or the perceived behavior at all. They would simply gain better performance in Ratpack 1.3 vs. previous versions. Especially, but not exclusively, in the case where logging isn't actually performed. I need exactly this (minimal performance impact in case of no-op) in production and wanted to give those benefits back to the community.

This is what my pull request was about and this is what my pull request achieved, including exhaustive tests. It's not necessarily the end of the story but it was a worthy incremental improvement that could have been merged with no damage done whatsoever.

When I asked

Any preference how this option should be exposed in the public API?

25 days ago, I was already done with all changes except the exposure of the new functionality (logging 4xx and 5xx as WARN instead of INFO, optionally). Seeing that there was no immediate consensus about how to expose the new functionality, I decided not to expose it in the public API at all, for the time being, and just created the pull request.

And now we are discussing API additions for the 0.01% users that aren't happy with the default behavior, under the assumption that implementing a single method interface (RequestLogger) wouldn't be convenient enough for them. An API that would directly interfere with my specific need (me being one of those 0.01%, i.e. the target audience of that API) for maximum no-op performance, therefore forcing me to implement the single method interface in my own code base all over again.

Isn't that kind of ironic?

ratpack / ratpack

NcsaRequestLogger enhancements #937

Assumption A

Assumption B