Open behrica opened 9 months ago
Not sure really what to do here. If you had chosen values that do not round to 0 and 1 you would have gotten an exception, perhaps we should use Math/round as opposed to a pure long cast.
This looks error prone to me, but not sure what to fix neither. The below mapping back works due to the long cast
(->(ds/->dataset {:x [:a :b]})
(ds/categorical->number [:x])
:x
meta
:categorical-map
:lookup-table)
;; => {:a 0, :b 1}
| :x |
|----:|
| 0.0 |
| 1.0 |
I would expect that the above produces a look up map:
{:a 0.0., :b 1.0}
and that all values except 0.0 and 1.0 would fail when mapping back.
The issue there is floating point comparison
This is as well related to the new discussion: https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/invert-categorical-map.20-.20regression.20tests
In order to make categorical mapping related code less brittle, I think we should check and fail in more situations, one is this one:
The initial mapping was derived as x -> 1 and y -> 0, but the current code happily maps back 0.342. This should fail in my view, in the same way as other numbers like 3 and 4 fail: " Unable to find src value for numeric value 0.342"