spring-attic / spring-cloud-gcp

Integration for Google Cloud Platform APIs with Spring
Apache License 2.0
705 stars 693 forks source link

Stackdriver logging POC #191

Closed wyaeld closed 6 years ago

wyaeld commented 7 years ago

This is a starter for how my project has addressed Spring Boot and Stackdriver Logging, at @meltsufin request on https://github.com/spring-projects/spring-boot/issues/8933#issuecomment-335305580

The challenge is that Stackdriver logging really needs JSON structured outputs, for things like correct timestamping, and multiline records. However for development we just wanted OOTB Spring logback to the console since we are familiar with it.

  1. Spring profile values to switch between application.properties and application-production.properties
  2. set logging.config between different logback.xml and logback-production.xml
  3. have the production config use a custom layout pattern that emits Json structured for stackdriver.
  4. Excluding spring-starter-logging but including all the X-to-SLF4J libs

logback.xml

<?xml version="1.0" encoding="UTF-8"?>
<configuration>

    <include resource="org/springframework/boot/logging/logback/base.xml"/>

    <root level="info">
        <appender-ref ref="CONSOLE"/>
    </root>

</configuration>

logback-production.xml

<?xml version="1.0" encoding="UTF-8"?>
<configuration>

    <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
            <pattern>
                %d{dd-MM-yyyy HH:mm:ss.SSS} %magenta([%thread]) %highlight(%-5level) %logger{36}.%M - %msg%n
            </pattern>
        </encoder>
    </appender>

    <appender name="CLOUD" class="ch.qos.logback.core.ConsoleAppender">

        <!--https://stackoverflow.com/questions/44164730/gke-stackdriver-java-logback-logging-format-->
        <encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
            <layout class="cne.cinet.admin.GCPCloudLoggingJSONLayout">
                <pattern>%-4relative [%thread] %-5level %logger{35} - %msg</pattern>
            </layout>
        </encoder>

    </appender>

    <root level="info">
        <appender-ref ref="CLOUD"/>
    </root>

</configuration>

Layout class - its Kotlin but you get the idea, port of prior work discussed here https://stackoverflow.com/questions/44164730/gke-stackdriver-java-logback-logging-format

package cne.cinet.admin

import ch.qos.logback.classic.Level.DEBUG_INT
import ch.qos.logback.classic.Level.ERROR_INT
import ch.qos.logback.classic.Level.INFO_INT
import ch.qos.logback.classic.Level.TRACE_INT
import ch.qos.logback.classic.Level.WARN_INT

import org.json.JSONObject

import ch.qos.logback.classic.Level
import ch.qos.logback.classic.PatternLayout
import ch.qos.logback.classic.spi.ILoggingEvent

/**
 * Format a LoggingEvent as a single line JSON object
 *
 * <br></br>https://cloud.google.com/appengine/docs/flexible/java/writing-application-logs
 *
 * <br></br>From https://cloud.google.com/appengine/articles/logging
 * <quote>
 * Applications using the flexible environment should write custom log files to the VM's log directory at
 * /var/log/app_engine/custom_logs. These files are automatically collected and made available in the Logs Viewer.
 * Custom log files must have the suffix .log or .log.json. If the suffix is .log.json, the logs must be in JSON
 * format with one JSON object per line. If the suffix is .log, log entries are treated as plain text.
</quote> *
 *
 * Nathan: I can't find a reference to this format on the google pages but I do remember getting the format from some
 * GO code that a googler on the community slack channel referred me to.
 */
/* Must be public for JSON marshalling logic */
data class GCPCloudLoggingTimestamp(var seconds: Long,
                                    var nanos: Int)

/* Must be public for JSON marshalling logic */
data class GCPCloudLoggingEvent(var message: String,
                                var timestamp: GCPCloudLoggingTimestamp,
                                var severity: String?,
                                var traceId: String?)

class GCPCloudLoggingJSONLayout : PatternLayout() {

    override fun doLayout(event: ILoggingEvent): String {
        val formattedMessage = super.doLayout(event)
        return doLayoutInternal(formattedMessage, event)
    }

    /* for testing without having to deal wth the complexity of super.doLayout()
     * Uses formattedMessage instead of event.getMessage() */
    private fun doLayoutInternal(formattedMessage: String, event: ILoggingEvent): String {
        val gcpLogEvent = GCPCloudLoggingEvent(formattedMessage,
                convertTimestampToGCPLogTimestamp(event.timeStamp),
                mapLevelToGCPLevel(event.level),
                null)
        val jsonObj = JSONObject(gcpLogEvent)
        /* Add a newline so that each JSON log entry is on its own line.
         * Note that it is also important that the JSON log entry does not span multiple lines.
         */
        return jsonObj.toString() + "\n"
    }

    override fun getDefaultConverterMap(): Map<String, String> {
        return PatternLayout.defaultConverterMap
    }

    private fun convertTimestampToGCPLogTimestamp(millisSinceEpoch: Long): GCPCloudLoggingTimestamp {
        val nanos = (millisSinceEpoch % 1000).toInt() * 1000000 // strip out just the milliseconds and convert to nanoseconds
        val seconds = millisSinceEpoch / 1000L // remove the milliseconds
        return GCPCloudLoggingTimestamp(seconds, nanos)
    }

    private fun mapLevelToGCPLevel(level: Level): String? {
        return when (level.toInt()) {
            TRACE_INT -> "TRACE"
            DEBUG_INT -> "DEBUG"
            INFO_INT -> "INFO"
            WARN_INT -> "WARN"
            ERROR_INT -> "ERROR"
            else -> null /* This should map to no level in GCP Cloud Logging */
        }
    }

}

application.properties

logging.config=classpath:logback.xml
logging.level.cne.cinet.admin=DEBUG

application-production.properties

logging.config=classpath:logback-production.xml

We are running on GKE, so launch pods with SPRING_PROFILES_ACTIVE=production in order for the right files and overrides to go in.

gradle.build the relevant parts

configurations {
    compile.exclude module : 'spring-boot-starter-logging'
}
dependencies {
    compile('org.springframework.boot:spring-boot-starter-web')

//  Logging configuration to work with Gcloud
    compile 'org.slf4j:slf4j-api:1.7.25'
    compile 'ch.qos.logback:logback-classic:1.2.3'
    compile 'org.slf4j:jul-to-slf4j:1.7.25'
    compile 'org.slf4j:jcl-over-slf4j:1.7.25'
    compile 'org.slf4j:log4j-over-slf4j:1.7.25'
    compile 'org.json:json:20170516'
}
wyaeld commented 7 years ago

Our goals are to have Stackdriver Logging, and also the Stackdriver error reporting integrated into all services. I haven't started on the Error reporting yet.

The tracing mechanisms in Spring Cloud are not of much use to us because we are using Linkerd in front of the service, and its already doing all the tracing we require.

meltsufin commented 7 years ago

Thank you for the detailed explanation of the setup you ended up with. This will be very useful as we start work on #154.

wyaeld commented 7 years ago

@meltsufin you're welcome.

I'm still quite new to spring. Any chance you can point me where to look to setup an error handler/interceptor so I can get the Stackdriver error reporting to work?

meltsufin commented 7 years ago

We don't really have any support for it in the project at the moment. You would basically have to use the Error Reporting client lib.

AFAIK Stackdriver Logging will automatically create error reports when it detects a stack trace in the log stream. However, this only works for non-JSON formatted logs.

We should probably provide some kind of error handler in this project to make it easier to use Stackdriver Error reporting. Would you mind submitting an issue for that? Thanks!

wyaeld commented 7 years ago

I ran some testing trying to see if Error Reports would be created. Didn't seem to work at all when we were using default spring logging. Maybe that only acts if they are correctly mapped to SEVERE.

wyaeld commented 7 years ago

I see this note:

Note: Error logs written to stderr are processed automatically by Stackdriver Error Reporting, without needing to use the Stackdriver Error Reporting package for Java directly.

And the GKE logging page states that both stderr and stdout are being collected.

I'm left wondering if something if my logback configuration - either default or custom behaviour I've introduced, is causing the SEVERE level entries to hit STDOUT when I really want them in STDERR to be picked up. It seems a reasonable assumption that Stackdriver Events can parse the json, and the docs imply its just the stream they come out on, rather than any smart detection of a stacktrace. Logging in java is really not my forte :-(

wyaeld commented 7 years ago

On further investigation, that seems to be the case. Logback ConsoleAppender will write to System.out unless overridden, and Stackdriver Errors will appear to ignore that stream if I'm readin the docs correct. I think this means to get the desired behaviour requires 2 appenders, with mutually exclusive filtering. Something like the config below.

<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
        <target>System.out</target>

        <!-- accept anything that is not  SEVERE -->
        <filter class="ch.qos.logback.classic.filter.LevelFilter">
            <level>SEVERE</level>
            <onMatch>DENY</onMatch>
            <onMismatch>ACCEPT</onMismatch>
        </filter>

        <!--https://stackoverflow.com/questions/44164730/gke-stackdriver-java-logback-logging-format-->
        <encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
            <layout class="cne.cinet.admin.GCPCloudLoggingJSONLayout">
                <pattern>%-4relative [%thread] %-5level %logger{35} - %msg</pattern>
            </layout>
        </encoder>

    </appender>

    <appender name="STDERR" class="ch.qos.logback.core.ConsoleAppender">
        <target>System.err</target>

        <!--https://stackoverflow.com/questions/25935326/how-can-i-configure-logback-conf-to-send-all-messages-to-stderr-->
        <!-- deny all events with a level below SEVERE -->
        <filter class="ch.qos.logback.classic.filter.ThresholdFilter">
            <level>SEVERE</level>
        </filter>
        <encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
            <layout class="cne.cinet.admin.GCPCloudLoggingJSONLayout">
                <pattern>%-4relative [%thread] %-5level %logger{35} - %msg</pattern>
            </layout>
        </encoder>
    </appender>

    <root level="info">
        <appender-ref ref="STDOUT"/>
        <appender-ref ref="STDERR"/>
    </root>

I will try and test later today

wyaeld commented 7 years ago

well, something still doesn't work.

This is my error, and if you check the labels in the json payload you can see stackdriver shows it coming in on stderr now. I compared to the same error an hour ago and confirmed it used to come over stdout.

Unfortunately Stackdriver errors is still blank. Unsure what the next step is.

11:18:42.615
45649 [http-nio-8080-exec-10] ERROR c.c.a.CustomRestExceptionHandler - Service call failed to ServletWebRequest: uri=/capacity;client=10.16.0.1;session=9D69C56AC265BF5335595B77DE2881D0;user=user
{
 insertId:  "1ach8wufgu8nkm"  
 jsonPayload: {
  message:  "45649 [http-nio-8080-exec-10] ERROR c.c.a.CustomRestExceptionHandler - Service call failed to ServletWebRequest: uri=/capacity;client=10.16.0.1;session=9D69C56AC265BF5335595B77DE2881D0;user=user"   
 }
 labels: {
  compute.googleapis.com/resource_name:  "gke-staging-cluster-2-default-pool-56e59b49-p1w0"   
  container.googleapis.com/namespace_name:  "cinet"   
  container.googleapis.com/pod_name:  "cne-admin-5zrq1"   
  container.googleapis.com/stream:  "stderr"   
 }
 logName:  "projects/cinet-staging/logs/cne-admin"  
 receiveTimestamp:  "2017-10-11T22:18:47.406625199Z"  
 resource: {
  labels: {…}   
  type:  "container"   
 }
 severity:  "ERROR"  
 timestamp:  "2017-10-11T22:18:42.615Z"  
}
meltsufin commented 7 years ago

I think the fluentd exception detector might not be matching the lines from Logback. I suspect the log line prefix is throwing it off. See the detector here.

@igorpeshansky, any suggestions?

wyaeld commented 7 years ago

I adjusted the pattern to remove the logline prefix, but still no luck.

Am considering just dropping the autodetection efforts altogether, and dealing with error handling manually.

Feels like the systems are trying to be too clever. A formatted json payload with multiline entries, with severity=ERROR and coming in on stderr isn't good enough?

I've tried using the Google logback appender, but not seeing any content from it at all. Similar reason why used the custom solution above.

saturnism commented 7 years ago

fwiw, most error log entries will be picked up by error reporting automatically. we shouldn't need to call the client lib explicility in most cases.

wyaeld commented 7 years ago

I'd love to see an actual working example of that. Haven't been diving into this for the sake of it ;-)

Attempting to use the Google Logback appender continually results in nothing showing up. My production config currently has the 2 custom json formatted appenders, going to stdout and stderr respectively, and the OOTB google appender, with no filtering. All running at root level.

That should result in both json and normal formatted lines coming through, and error events at the very least being picked up from the fluend detector over the unformatted google appender's output.

But no

igorpeshansky commented 7 years ago

Let's be clear here. The fluentd exception detector simply finds exception stack traces that span multiple lines (and thus may end up as multiple separate log messages) and joins them into single multiline messages. This is to enable the server-side integration between error reporting and logging. Unless that integration knows how to recognize the errors you're looking for, enabling the exception detection plugin won't help. That plugin also works only for line-level log messages — structured logs will be ignored by the plugin. Hope this explains things.

meltsufin commented 7 years ago

The Google Stackdriver Logback appender will not send anything to Error Reporting. I actually think that Stackdriver Logging might want to just automatically create error reports for anything logged at a very high severity level, but that's a separate discussion.

What should work is when an exception stack trace is printed to stdout as in ex.printStackTrace(), it should be automatically collapsed into a single log message and also create an entry in Error Reporting. I've seen this work on App Engine Flex, and I believe it should work on GKE the same.

Here's an example to try on App Engine Flex (and can be adapted to GKE):

  1. git clone https://github.com/GoogleCloudPlatform/getting-started-java
  2. cd getting-started-java/helloworld-servlet
  3. Modify HelloServlet to look like this:

    @SuppressWarnings("serial")
    @WebServlet(name = "helloworld", value = "/" )
    public class HelloServlet extends HttpServlet {
    
    private final static Logger logger = Logger.getLogger(HelloServlet.class.getName());
    
    @Override
    public void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException {
    PrintWriter out = resp.getWriter();
    out.println("Hello, world - Flex Servlet");
    
    logger.info("Test INFO level log");
    logger.warning("Test WARN level log");
    logger.severe("Test SEVERE level log");
    
    throw new RuntimeException("Testing exception.");
    }
    }
  4. Deploy to App Engine with mvn clean appegine:deploy
igorpeshansky commented 7 years ago

I actually think that Stackdriver Logging might want to just automatically create error reports for anything logged at a very high severity level, but that's a separate discussion.

That's exactly what I meant by "server-side integration between error reporting and logging". It's already happening.

What should work is when an exception stack trace is printed to stdout...

Even if you enable the exception detector for stdout, the integration only looks at stderr.

I think what happened in this comment was that there was no stack trace and the format of the exception line was, erm, unexceptional, so it wasn't detected.

wyaeld commented 7 years ago

I'm unsure why the presence of stacktraces is meaningful for Error Reporting. If something is logged to stderr, at severity=error, why should Stackdriver care about the message content? I might want notifcation on something that triggers on business logic failures, rather than actual exceptions. Is that not reasonable?

igorpeshansky commented 7 years ago

Most users don't want to be notified about every line they log to stderr (severity==error is the default for stderr). So Error Reporting detects things that are likely to actually be errors (namely, stack traces) automatically, and lets users do more if they wish. Details here.