influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.78k stars 3.54k forks source link

Join operation within two streams limited to 2000 rows #24065

Open benderlana opened 1 year ago

benderlana commented 1 year ago

Hi Everyone,

I am struggling over what really seems to be a bug. In my situation I have two streams:

  1. stream A having the energy used by a machine, sampled 2880 times per day;
  2. stream B having the energy cost sampled once per day;

In order to compute the cost at any of the 2880 timings per day (I need a time series for it for other purposes) what I do is:

  1. I create a '_day' column for both the stream A and B storing the day timestamp, I then group both by this new '_day' column;
  2. I then take a LEFT join on stream A and B over the '_day' column in order to obtain a unique stream and be able to compute the energy cost for any of the timings;

This approach works indeed, the very subtle problem is that if you have more than N>2000 rows for stream A, for some reason the joined table will return N-1000 rows.

So for a single day the joined table will return 1880 rows instead of 2880! If i limit the stream A to 2000 rows I will get 2000 rows back, but if I use 2001 rows I will get back 1001 rows.

In this image _value -> is the count of rows.

image

Thanks to anyone helping me.

Steps to reproduce: List the minimal actions needed to reproduce the behavior.

  1. Create a stream A with at least 2001 rows per day;
  2. Add the _day column to stream A containing the day timestamp -> date.truncate(t: r._time, unit 1d)
  3. Create a stream B with 1 row per day;
  4. Add the _day column to stream B containing the day timestamp -> date.truncate(t: r._time, unit 1d)
  5. Take a letf/full join on the _day column

Expected behavior: Get back a table having the same row numbers than the stream A.

Actual behavior:

Get back a table having the same row numbers than the stream A only if row_numbers <= 2000.

Environment info:

Kratheon commented 1 year ago

Yup, I kinda have a similar behavior here. Any suggestions ?