Closed kimim closed 3 years ago
I agree there are are a valid issues here.
Hmm, it seems to me that it isn't exactly down. I am not sure what mid should do with strings but it seems for strings it does default to 'down' which seems OK to me. For the integer and floating point numbers it just looks like it is not exactly right as the number in between 3,6 is 4.5, not 5.
For integer columns I think mid should first convert the column to floating point space and then work in there. This means that replacing missing on integer columns using the :mid
pathway will result in columns changing datatypes which in this case I think is more correct than ending with the completely wrong answer.
So I think there are three separate issues that should be addressed:
tech.v3.datatype.functional/log
to anything it will result in a floating point representation.Lots of good stuff here; great issue. I really appreciate this careful consideration.
I added a test that I think covers the intended effects of :mid
across a range of datatypes.
It's up in the latest release.
Now I have regression bugs because of this.
Strategy :mid
wasn't a "middle value between two valid numbers". It was meant to be combined :up
and :down
and drag values to meet in the middle.
Example:
; 2 > 2 > 2 > 2 3 < 3 < 3 < 3
[1 2 nil nil nil nil nil nil 3]
; ^
; | a mid point
;; result
[1 2 2 2 2 3 3 3 3]
There was no strategy as described here. I would revert :mid
to the original one and introduced new called :midval
or something like that.
Another bug is in strategy :lerp
. int
type results in float
.
My failing test cases:
:mid
- https://github.com/scicloj/tablecloth/blob/master/test/tablecloth/api/missing_test.clj#L31:mid
- https://github.com/scicloj/tablecloth/blob/master/test/tablecloth/api/missing_test.clj#L33:lerp
- https://github.com/scicloj/tablecloth/blob/master/test/tablecloth/api/missing_test.clj#L62More examples: https://scicloj.github.io/tablecloth/index.html#replace
@kimim replace-missing
was originally in tablecloth
and was moved to tech.ml.dataset
at some point.
Ah, sorry, i will revert changes mid changes and go back to original behavior.
I changed lerp to work in float space to be mathematically correct, same as if you call dfn/log or something like that. I think this is reasonable; what do you think?
Now I understand the :mid strategy. Sorry for the mistaken issue.
Hmmm... The question is if preserving the datatype is expected or not. But I don't have a strong opinion on that and lerp
can change datatype for me.
What about :midval
- I think this is also great strategy to consider.
I agree abour 'midval'.
how about :center for :mid, and :mid for :midval?
:mid
was my own invention, it's not any naming standard, so it can be changed.
I do not have strong bias. Both works for me. Thanks.
:mid
corresponds to :nearest
in left-join-asof. :midpoint
or :midval
as was suggested make the most sense to me.
If no one strongly objects, I will change :mid
to :nearest
and add :midval
.
Perfect!
I'm not going to strongly object, but I tend to favor RH's notion that one should always try to never change things in a non backward compatible way. So in this case, I would add :nearest
and note that :mid
is an alias. I'm not going to object, because I don't think this will affect me :)
That is a good point. I will keep :mid
:-). I also agree that we should not change things in non-backwards-compatible ways.
Done and documented
That's great. Glad to see that tech.ml.dataset is better and better everyday.
Yeah! :) tablecloth
is updated too.
In below test, when replace-missing with :mid strategy, I expected to get middle value between two valid numbers. But the result shows that :down strategy is used: