lf-edge / ekuiper

Lightweight data stream processing engine for IoT edge
https://ekuiper.org
Apache License 2.0
1.47k stars 414 forks source link

Why do the stdev and var aggregate functions not ignore null values like sum, min, max, and avg ? #2590

Open ngjaying opened 8 months ago

ngjaying commented 8 months ago

Discussed in https://github.com/lf-edge/ekuiper/discussions/2589

Originally posted by **EscanorUt** January 26, 2024 Hello, I've noticed that while using eKuiper, the stdev and var aggregate functions seem to consider null values in their calculations, unlike other aggregate functions like sum, min, max, and avg which ignore null values. Can anyone shed light on the reason behind this behavior and whether there are any workarounds or alternative approaches to handle null values with stdev and var functions in eKuiper Thank you
BNNARAJ commented 5 months ago

Hello @ngjaying sir , I am Prabal Pratap Singh Rathore , second - year student of Btech in Artificial Intelligence and Data Science. I am good at python and several libraries with experience in ML and DL with keras, tensorflow and Pytorch. I want to look into this issue , I am currently exploring eKuiper , So please assign this Good First Issue to me .

BNNARAJ commented 5 months ago

we have to see how this function stdev and var are implemented internally because aggregate functions exclude null values for calculation but for these function involves stastical calculation that can be mislead by null values. So, please can you navigate to the directory or code file to see how they are present in code.

ngjaying commented 4 months ago

@BNNARAJ Sorry for the late response. The functions are in https://github.com/lf-edge/ekuiper/blob/master/internal/binder/function/funcs_agg.go. Actually, you can do a search in the codebase to find it next time.

BNNARAJ commented 4 months ago

Thank you sir

BNNARAJ commented 4 months ago

Hello @ngjaying sir, In the definition of the stddev and var function, we can see the cast function which is reconstructing the float64slice and removing null values "float64Slice, err := cast.ToFloat64Slice(arg0, cast.CONVERT_SAMEKIND, cast.IGNORE_NIL)", so it looks like it does take null values but while calculating it is reconstructing the input array while removing null values by using cast.IGNORE_NIL . As I am a beginner please ensure that I am following the correct understanding and approach.

ngjaying commented 4 months ago

Hi @BNNARAJ, Maybe it was fixed. Could you try to add a ut case to confirm that? If that's the case, we can push the test case as a PR to close this issue. Thanks!

BNNARAJ commented 4 months ago

Hello @ngjaying Sir , I saw there is a test case for that and I added one more nil input and comment that test case for better recognition and made a PR.

ngjaying commented 4 months ago

@EscanorUt Looks like there is no problem for null values. Do you still encounter that issue? If so, could you please provide a test case?

BlancoMY commented 4 months ago

Hello @ngjaying This issue was fixed in https://github.com/lf-edge/ekuiper/pull/2748 so you can close it. Thank you