teragrep / pth_10

Data Processing Language (DPL) translator for Apache Spark
GNU Affero General Public License v3.0
0 stars 2 forks source link

Chart's "by" instruction handles second argument incorrectly #265

Open 51-code opened 2 months ago

51-code commented 2 months ago

Describe the bug

The query:

%dpl
index=crud earliest=-3y
| spath
| chart avg(balance) by _time operation

returns a dataset where the results are grouped by both the operation and the _time column. So like this:

operation  _time              avg(balance)
a      2021-10-31 17:34:36.0   123
b      2021-11-08 21:38:03.0  -123

The avg(balance) is calculated for each _time value and for each operation value. It doesn't make a lot of sense here because the times are so unique that there is probably not multiple operations in the same timestamp.

Expected behavior

The second argument to the by instruction is actually a t_column_Parameter that behaves differently from the first argument which is a t_row_Parameter. It should pivot the dataset so that each unique value in that column are now the new columns in the dataset, like this:

_time                  a     b
2021-10-31 17:34:36.0  123       
2021-11-08 21:38:03.0       -123

How to reproduce

Run the first query in the issue.

Screenshots

Software version

pth_03 version: 5.11.0 pth_06 version: 2.3.0-30-g2cbf0768 pth_10 version: 5.3.0-13-g4a1ff44f

Desktop (please complete the following information if relevant):

Additional context

The number of columns created for the unique values of t_column_Parameter should be limited to 10.