humio / issues

Issue Tracker for Humio
4 stars 2 forks source link

Bug: collect() drops columns when the rawstring reaches maxlen #90

Closed Potrik98 closed 5 years ago

Potrik98 commented 5 years ago

I have the following query:

groupBy(alert.category,
        function=[count(as="total_count"),
                  collect([alert.signature_id, alert.signature], multival=false)])
| drop(@rawstring)

which yields different results on 1.5.16--build-6312 and 1.5.23--build-6794. The function collect should collect the values of the fields alert.signature_id and alert.signature, and the collected result should be of length total_count for each alert.category of the result entries. The problem is that on 1.5.23--build-6794, collection is terminated when the length of the rawstring exceeds the limit, but not on 1.5.16--build-6312. Setting maxlen=2000000 instead of the default 2000 fixes the issue and yields the same results on both versions. My opinion is that all the fields should be collected regardless of the value of maxlen (as in 1.5.16), or at least when there is a drop(@rawstring). Optionally, an option to disable collection into a rawstring could be added, thus resolving the entire issue.

mortengrouleff commented 5 years ago

It was a bug in the old version that it kept so much data that it allowed users to crash the system using collect on large amounts of data.