openfaas / of-watchdog

Reverse proxy for STDIO and HTTP microservices
MIT License
259 stars 115 forks source link

Support request for massive log lines #100

Closed everesio closed 1 year ago

everesio commented 4 years ago

When log contains log line longer than MaxScanTokenSize (64 * 1024), an error "Error scanning stderr: bufio.Scanner: token too long" is reported. All subsequent requests (when using the logger) are blocked.

Expected Behaviour

Long lines should be logged

Current Behaviour

bufio.NewScanner is initialized with MaxScanTokenSize = 64 * 1024. If log line is longer then this limit, bufio.Scanner: token too long is returned. On error scanner.Scan loop in bindLoggingPipe is finished and there is no pipe reader any more.

Possible Solution

NewScanner can be replaced by NewReaderSize. I will rework https://github.com/openfaas-incubator/of-watchdog/pull/99

Steps to Reproduce (for bugs)

token-too-long.zip

  1. unzip token-too-long.zip
  2. make build run
  3. make test
  4. make LENGTH=66000 test

All subsequent requests are not processed e.g.

  1. make test

Context

This behavior is a blocker as in my use case the logged events can be longer than 64kb.

Your Environment

alexellis commented 4 years ago

Hi, can you put your project on GitHub in its own repo so that we can check it out and see the source code?

I also wanted to know which mode of the watchdog you are using and why you feel it necessary to stream a massive file into the container logs? There may be alternatives which do not require this proposed solution, so I wanted to get a better idea of the exact use-case, can you help us out with that?

Regards,

Alex

everesio commented 4 years ago

Hi Alex, I attached the token-too-long.zip with the test function and Makefile for building and testing, but if you want I can create a repo for this.

Please checkout the code:

  1. template python3-http-debian - openfaas/of-watchdog:0.7.7 as watchdog
  2. test handler
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

def handle(event, context):

    if event.body:
        logger.info('event body length %d', len(event.body))
        logger.info('body %s', str(event.body, "utf-8"))

    return {
        "statusCode": 200,
        "body": "Hello from OpenFaaS!"
    } 
  1. POST event which breaks the function
LENGTH=66000 
cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w ${LENGTH} | head -n 1 | curl -X POST --data "@-" localhost:8080

Kind regards, Michal

ghostbody commented 3 years ago

+1

When trying to log something longer than 65536, "bufio.Scanner: token too long" will occur.

Here is my problem:

image

In scan.go

   144                  // Is the buffer full? If so, resize.
   145                  if s.end == len(s.buf) {
   146                          if len(s.buf) >= s.maxTokenSize {
   147                                  s.setErr(ErrTooLong)
   148                                  return false
   149                          }
   150                          newSize := len(s.buf) * 2
   151                          if newSize > s.maxTokenSize {
   152                                  newSize = s.maxTokenSize
   153                          }
   154                          newBuf := make([]byte, newSize)
   155                          copy(newBuf, s.buf[s.start:s.end])
   156                          s.buf = newBuf
   157                          s.end -= s.start
   158                          s.start = 0
   159                          continue
   160                  }

What's more, the deamon process will be dead after this error.

alexellis commented 3 years ago

/set title: Support request for massive log lines

alexellis commented 3 years ago

Can I ask why you are logging such huge amounts of data to the container logs? This doesn't seem like a normal usecase.

ghostbody commented 3 years ago

@alexellis

Yes, you are right. This is not a normal usecase.

But there all serveral senes:

  1. Debug: Some users needs to debug the function at openfaas platform. And they turn on some debug logs which could be huge.

  2. Error Tracking with logs to ELK (elastic logstash kibana): Some users prefer to send to log to ELK through log stream. The best practice is to use "one line log". Moreover some Java/Python Traceback or exception can be a very large stack.

  3. Fault or Bug in handling: Maybe some logs are not intentionally huge. They just get some big parameter.

The huge log works fine in the handle function but the error is ocurr in the watchdog, and then the handling process dies. It seems better to take care of this problem.

LucasRoesler commented 3 years ago

I think we can consider increasing or even allowing the buffer size to be controlled via an env variable. Even LogStash (in ELK) has a max buffer size https://www.elastic.co/guide/en/logstash/current/plugins-inputs-udp.html#plugins-inputs-udp-buffer_size

Looking through several version

The default buffer size is:

I suspect all log systems will have some kind of limit.

If your request body is very big, I would not recommend logging it. An absurd case like a 10GB binary file as a log output is not going to be a good idea and it certainly is not something that i consider as a "log".

everesio commented 3 years ago

If a log line longer than 64K stop a service to operate, it is a serious problem and it is irrelevant if log collectors can deal with such lines or not.

I found really amusing renaming the title of the issue from "bufio.Scanner: token too long" to "Support request for massive log lines", which implies a feature request and not a bug.

As a comparison please check https://github.com/hashicorp/terraform/issues/20325 This is a similar issue and the terraform maintainer did label the issue as a "bug".

LucasRoesler commented 3 years ago

@everesio I agree, a long log line should not stop the watchdog from working. My main point above is that it seems reasonable to have a buffer and to allow control over the size of that.

Of course, logging shouldn't cause anything to crash, that goes without saying.

LucasRoesler commented 3 years ago

I just read the https://github.com/hashicorp/terraform/issues/20325 it looks like something we could replicate in https://github.com/openfaas/of-watchdog/blob/master/executor/logging.go

everesio commented 3 years ago

Exactly the "scanner.Scan" method it the source of problem. More than month ago I provided a PR https://github.com/openfaas/of-watchdog/pull/99 with the fix. You can rework it or fix in different way. I am currently discouraged to make any source code contributions the this project.

LucasRoesler commented 3 years ago

I started adapting the terraform patch, and it is definitely a bit messy because of the internal use of logger, I have an idea for how to work around it and I will post once I have tested it

LucasRoesler commented 3 years ago

@everesio give this a try https://github.com/openfaas/of-watchdog/pull/107

Kanshiroron commented 1 year ago

@LucasRoesler @alexellis we faced that same issue internally, causing the function to hang because its stdout is locked (as it's piped, but not read). I looked at the PR @LucasRoesler created, and seems that it should solve the issue, any chance to have it merged? Thanks for the help.

alexellis commented 1 year ago

Hi @Kanshiroron

And as far as I was aware we'd already done the work for this in https://github.com/openfaas/of-watchdog/pull/126 - can you confirm you've tried the new settings we have documented?

See log_buffer_size in the README

If you still have issues please raise your own Issue here.

Alex

alexellis commented 1 year ago

/close: resolved

alexellis commented 1 year ago

/lock