teragrep / pth_10

Data Processing Language (DPL) translator for Apache Spark
GNU Affero General Public License v3.0
0 stars 2 forks source link

Using aggregates after dedup command does not work #232

Closed eemhu closed 5 months ago

eemhu commented 6 months ago

Describe the bug

%dpl
index=alert_examples earliest="01/01/2021:02:34:19"
| rex4j field=_raw "message=(?<message>\"[a-zA-Z\s]+\"[^0-9])"
| rex4j field=message "(?<message>[^\"]+)"
| dedup message
| stats count

complains about output mode:

org.apache.spark.sql.AnalysisException: Complete output mode not supported when there are no streaming aggregations on streaming DataFrames/Datasets;;

Expected behavior

Should work without any errors

How to reproduce

Try example command Screenshots

Software version

4.17.0

Desktop (please complete the following information if relevant):

Additional context

https://github.com/teragrep/pth_10/issues/205

eemhu commented 6 months ago

current PR seems to work until the last batch when count goes from 6 to 0 should be 6 from first to last batch

eemhu commented 6 months ago

could be that first batch is ok, later ones are empty. BatchCollect missing? UI seems to update only at first batch and at last one.

eemhu commented 5 months ago

Internal PR 595 merged, closing