Closed russorat closed 4 years ago
@russorat the default buffer size for a bufio.Scanner
is 64KB
. An appropriately sized buffer can be passed to bufio.Scanner
, which appears to be can happen in the code here:
However I think by default the Batcher
is using the package default defined here, which is 500KB
:
So I'm guessing one needs to initialise a Batcher
with MaxFlushBytes
set to something appropriate for your data:
Just a note - the solution to this must not be to allocate the user's batch size as a single heap allocation - that's a pathway to crashing OOM.
I believe the appropriate fix is to improve the influx
tool, to send batches to influxd
. So, based on the following command invocation:
$ influx query -c tools 'from(bucket: "apps") |> range(start: -1m)' -r \
| influx write -c default --format csv -b apps
I propose the fix is to improve:
influx write -c default --format csv -b apps
such that as it receives data from STDIN
, it sends appropriately sized batches to influxd
@rbetts this might be a case where the swagger can read the limit annotations and adjust the batch-size accordingly.
@stuartcarnie Yeah - if the client isn't chunking then it needs to. Seems like a place to start investigating.
Also need to improve the server's error message when the batch exceeds the internal limit.
The client is supposed to be batching writes, so the investigation will continue as to why it is generating the above error.
If one of the field values is larger than the limit, then presumably it's not possible to split the write? Maybe that's what's going on.
Issue moved to OSS 2.0 GA
FYI, Scanner can take a buffer with a pre-allocated size and a max size, allowing you to pick a reasonable small size, and have it grow to a set max size. eg:
scanner := bufio.NewScanner(r)
scanBuf := make([]byte, 4096)
scanner.Buffer(scanBuf, MaxBytes)
Dug in to this a bit more and batching should be working. I believe the output of
$ influx query -c tools 'from(bucket: "apps") |> range(start: -1m)' -r
is producing lines which exceed the scanners default of 64kb or not producing line-delimited text at all. That needs to be confirmed next. If 64kb is a normal length for a single line of CSV from the influx query
command, we will need to increase the size of the scanner.Buffer
as @ssoroka and others have stated.
Some facts to support my hypothesis and the source of the bufio.Scanner: token too long.
error.
bufio.SplitFunc
https://github.com/golang/go/blob/4c4a376736fff47b08ab6053605c3b68d87552b5/src/bufio/scan.go#L279
Batcher
sets it to ScanLines
, meaning a token is a single line:NOTE: this call is redundant, as per the documentation, the default tokenizer for `bufio.NewScanner` is `ScanLines`
bufio.Scanner: token too long.
comes from:https://github.com/golang/go/blob/4c4a376736fff47b08ab6053605c3b68d87552b5/src/bufio/scan.go#L194
buf
exceeds maxTokenSize
or half the magnitude of maxInt
, which is very large for a 64 bit system. The condition is likely hitting maxTokenSize
, which is 64kb by default:https://github.com/golang/go/blob/4c4a376736fff47b08ab6053605c3b68d87552b5/src/bufio/scan.go#L80
influx write
command is receiving excessively long lines or text which is not properly line-delimitedis there a limit on the length of a line protocol record defined anywhere?
Turns out the panic is due to a bug in our LineReader
, due to accessing an array out of bounds per the following panic:
github.com/influxdata/influxdb/v2/pkg/csv2lp.(*LineReader).Read(0xc000537380, 0xc000345392, 0xc6e, 0xc6e, 0x392, 0x0, 0x0)
/Users/stuartcarnie/projects/go/influxdata/influxdbv2/pkg/csv2lp/line_reader.go:87 +0x24d
bufio.(*Reader).fill(0xc0005373e0)
/Users/stuartcarnie/projects/go/google/golang/src/bufio/bufio.go:101 +0x105
bufio.(*Reader).ReadSlice(0xc0005373e0, 0x10000000000000a, 0x7c11558, 0x0, 0x69, 0x6453368, 0xc0001340e0)
/Users/stuartcarnie/projects/go/google/golang/src/bufio/bufio.go:360 +0x3d
encoding/csv.(*Reader).readLine(0xc0003e6900, 0xc000495000, 0x69, 0xc0001340e0, 0x69, 0x0)
/Users/stuartcarnie/projects/go/google/golang/src/encoding/csv/reader.go:218 +0x49
encoding/csv.(*Reader).readRecord(0xc0003e6900, 0xc000103800, 0x11, 0x17, 0xc000103800, 0x11, 0x17, 0x0, 0x0)
/Users/stuartcarnie/projects/go/google/golang/src/encoding/csv/reader.go:266 +0xf2
encoding/csv.(*Reader).Read(0xc0003e6900, 0xc000103800, 0x11, 0x17, 0x0, 0x0)
/Users/stuartcarnie/projects/go/google/golang/src/encoding/csv/reader.go:187 +0x57
github.com/influxdata/influxdb/v2/pkg/csv2lp.(*CsvToLineReader).Read(0xc00032eb40, 0xc000346761, 0x89f, 0x89f, 0x10, 0x1, 0x1)
/Users/stuartcarnie/projects/go/influxdata/influxdbv2/pkg/csv2lp/csv2lp.go:105 +0x87
bufio.(*Scanner).Scan(0xc000727f08, 0xc000727e98)
/Users/stuartcarnie/projects/go/google/golang/src/bufio/scan.go:214 +0xa9
github.com/influxdata/influxdb/v2/write.(*Batcher).read(0xc000118980, 0x52e7ca0, 0xc000498500, 0x52c8840, 0xc00032eb40, 0xc000406240, 0xc000537440)
/Users/stuartcarnie/projects/go/influxdata/influxdbv2/write/batcher.go:69 +0xf0
created by github.com/influxdata/influxdb/v2/write.(*Batcher).Write
/Users/stuartcarnie/projects/go/influxdata/influxdbv2/write/batcher.go:44 +0x1c7
The error occurs on the following line, because i == len(p) - 1
:
Looks like the actual bug is here where n
is assigned to, but the rest of the code is assuming that n
is the same as len(p)
.
I'd be inclined to avoid using n
and just use len(p)
when appropriate - it's no more efficient to assign it to a local variable.
@rogpeppe agree – that is exactly what I have done. Implemented a test which exhibits the out of bounds issue and have a fix pending.
How is that possible? that file is only 9 days old, but this issue is 11 days old? https://github.com/influxdata/influxdb/commits/master/pkg/csv2lp/line_reader.go
Well then! That means in my testing, I uncovered a new bug…
I did replicate and fix both that panic and:
bufio.Scanner: token too long.
i'm very curious to see this pr :)
I'm seeing the error:
Error: Failed to write data: bufio.Scanner: token too long.
when i try to write a csv data that is ~164MB to influxdb. Based on https://stackoverflow.com/questions/21124327/how-to-read-a-text-file-line-by-line-in-go-when-some-lines-are-long-enough-to-ca , it looks like we might need to evaluate if we should switch to bufio.Reader? https://github.com/influxdata/influxdb/blob/f09ee881fb33ca06474ca499deac0c5cd1bd9f91/write/batcher.go#L67
I can send the file offline.