scicloj / scicloj.ml.smile

A Smile plugin for scicloj.ml
Eclipse Public License 2.0
9 stars 4 forks source link

ml/predict produces extra column #15

Open behrica opened 2 months ago

behrica commented 2 months ago
(ns scicloj.metamorph.type-test
  (:require [tablecloth.api :as tc]
            [scicloj.metamorph.ml :as ml]
            [tech.v3.dataset.modelling :as ds-mod]
            [tech.v3.dataset.categorical :as ds-cat]
            [tech.v3.dataset :as ds]))

(require '[scicloj.ml.smile.classification])
(def trained-model 
  (->
   (tc/dataset {:x1 [1 2 4 5 6 5 6 7]
                :x2 [5 6 6 7 8 2 4 6]
                :y [ :a :b :b :a :a :a :b :b]})
   (ds/categorical->number [:y])
   (ds-mod/set-inference-target :y)
   (ml/train {:model-type :smile.classification/knn})
   )
    )
(->
 (ml/predict (tc/dataset {:x1 [1 2 3]
                          :x2 [5 6 7]}) trained-model)
 (ds-cat/reverse-map-categorical-xforms))

produces

:_unnamed [3 4]:

|         :a |         :b |   2 | :y |
|-----------:|-----------:|----:|----|
| 0.33333333 | 0.66666667 | 0.0 | :b |
| 0.33333333 | 0.66666667 | 0.0 | :b |
| 0.33333333 | 0.66666667 | 0.0 | :b |

But column "2" should not be there. Seems a regression to me.

behrica commented 2 months ago

"2" seems to be the number of labels, it changes when more labels are possible

behrica commented 2 months ago

Its probbaly an issue of , scicloj.ml.smile, to be confirmed

behrica commented 2 months ago

introduced by this commit: https://github.com/scicloj/scicloj.ml.smile/commit/18950160b58f1f06803f81328a29b8c8acdfe802

behrica commented 2 months ago

possible root cause is discussed here: https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/unexpected.20.22distinct.22.20of.20float.20columns