Join operation within two streams limited to 2000 rows

Hi Everyone,

I am struggling over what really seems to be a bug. In my situation I have two streams:

stream A having the energy used by a machine, sampled 2880 times per day;
stream B having the energy cost sampled once per day;

In order to compute the cost at any of the 2880 timings per day (I need a time series for it for other purposes) what I do is:

I create a '_day' column for both the stream A and B storing the day timestamp, I then group both by this new '_day' column;
I then take a LEFT join on stream A and B over the '_day' column in order to obtain a unique stream and be able to compute the energy cost for any of the timings;

This approach works indeed, the very subtle problem is that if you have more than N>2000 rows for stream A, for some reason the joined table will return N-1000 rows.

So for a single day the joined table will return 1880 rows instead of 2880! If i limit the stream A to 2000 rows I will get 2000 rows back, but if I use 2001 rows I will get back 1001 rows.

In this image _value -> is the count of rows.

Thanks to anyone helping me.

Steps to reproduce: List the minimal actions needed to reproduce the behavior.

Create a stream A with at least 2001 rows per day;
Add the _day column to stream A containing the day timestamp -> date.truncate(t: r._time, unit 1d)
Create a stream B with 1 row per day;
Add the _day column to stream B containing the day timestamp -> date.truncate(t: r._time, unit 1d)
Take a letf/full join on the _day column

Expected behavior: Get back a table having the same row numbers than the stream A.

Actual behavior:

Get back a table having the same row numbers than the stream A only if row_numbers <= 2000.

Environment info:

System info: InfluxDB OSS running on docker container
InfluxDB version: 2.6.1

influxdata / influxdb

Join operation within two streams limited to 2000 rows #24065