Currently have this for the main read loop in internal/ecslog/ecslog.go:
scanner := bufio.NewScanner(in)
for scanner.Scan() {
line := scanner.Text()
// ... use line
}
return scanner.Err()
but that fails with a line longer than 64k with
% ./ecslog ../go-ecslog/cmd/ecslog/testdata/crash-long-line.log
ecslog: error: bufio.Scanner: token too long
% echo $?
1
because of bufio.MaxScanTokenSize. That can be set high, I believe, via
Scanner.Buffer, but eventually that hits a reasonable limit. We still try to read a whole line into mem however long.
Here is are 10GB ... 10MB files that are a single line with no '\n':
python -c '
import sys
token="."*1024
for i in range(10*1024*1024): sys.stdout.write(token)
' >longline.10GB
python -c '
import sys
token="."*1024
for i in range(1024*1024): sys.stdout.write(token)
' >longline.1GB
python -c '
import sys
token="."*1024
for i in range(100*1024): sys.stdout.write(token)
' >longline.100MB
python -c '
import sys
token="."*1024
for i in range(10*1024): sys.stdout.write(token)
' >longline.10MB
Processing those don't go so well (watch mem usage, e.g. via htop -F ecslog):
./ecslog lineline.1GB >/dev/null
because that ReadBytes will keep reading 4kB blocks until it reads the whole
thing into memory. Using 10GB of memory isn't acceptable.
Currently have this for the main read loop in internal/ecslog/ecslog.go:
but that fails with a line longer than 64k with
because of
bufio.MaxScanTokenSize
. That can be set high, I believe, via Scanner.Buffer, but eventually that hits a reasonable limit. We still try to read a whole line into mem however long.The solution is to no longer use
bufio.Scanner
.first attempt
Next tried this:
Here is are 10GB ... 10MB files that are a single line with no '\n':
Processing those don't go so well (watch mem usage, e.g. via
htop -F ecslog
):because that ReadBytes will keep reading 4kB blocks until it reads the whole thing into memory. Using 10GB of memory isn't acceptable.
next attempt:
bufio.Reader.ReadLine
This works well. Fix coming.