Closed oplehto closed 3 years ago
@rogpeppe can you take a look at this? Thanks.
@oplehto I haven't found anything that stands out so far when inspecting the code (I've also been fuzzing the code for the last hour or so (~120 million execs so far) and haven't found any crashes), so it seems like you've found a nice edge case!
Could you try to reproduce the issue again, but first update to the following branch which I've added a bit of instrumentation to in order to get a better panic on error?
go get github.com/influxdata/line-protocol/v2@v2.2.1-0.20211001085601-62fcfd697adb
Thanks!
Update: please use this version instead:
go get github.com/influxdata/line-protocol/v2@v2.2.1-0.20211001090429-a14672f62d41
@oplehto Just to confirm you tested this with roger's branch above and are still receiving a panic?
FWIW I've been fuzzing the code continuously on a 16 core machine for over 180 hours now and I haven't found a crash yet. So a reproducer from @oplehto would be super useful!
@rogpeppe I agree that it would be nice but the problem is that the incoming data stream is huge. I suspect that the problem is originating from occasional truncated data from embedded systems. The problem is that upgrading these embedded systems is not completely trivial.
I was using #9871 that has the correct parser version. I'll try to modify the code a bit to get a larger chunk of data.
Hi @oplehto - did you manage to make any progress getting a larger chunk of data?
Are you able to share any raw data with us that would help us to narrow this down?
I haven't seen this panic again in our live data stream since we moved all publishers from HTTP to UDP (I'm unable to share the raw pcap data due to its proprietary nature). Even without being able to repro this, it looks very likely that #51 resolves the issue however so I'm fine with closing this issue.
I ran into this panic successively when I was piping a large amount of transient data via Telegraf using this PR: https://github.com/influxdata/telegraf/pull/9685
There is probably some malformed data in the stream but it's difficult to pinpoint as it is a huge amount of transient data. At least some bounds checking and an error that would better show where the syntax issue is problem would be super useful.