teragrep / pth_10

Data Processing Language (DPL) translator for Apache Spark
GNU Affero General Public License v3.0
0 stars 2 forks source link

sort loses half of data #239

Open 51-code opened 5 months ago

51-code commented 5 months ago

Describe the bug

I have a query that results in 10k rows of data: index=abc earliest=-3y@mon latest=-3y@d

But when sorted: index=abc earliest=-3y@mon latest=-3y@d | sort _raw

It results in only 5k rows of data.

Expected behavior

Sort shouldn't lose any data.

How to reproduce

Run the queries above.

Screenshots

Software version

PTH-10: 4.18.0-8-ge7e4190c

Desktop (please complete the following information if relevant):

Additional context

Probably a problem in the BatchCollect?

eemhu commented 5 months ago

BatchCollect limits data to 5k rows by default, this can be changed with the dpl recall size config.

eemhu commented 5 months ago

Should be noted that BatchCollect has skipLimiting parameter available, but it is not implemented.