jbrukh / bayesian

Naive Bayesian Classification for Golang.
Other
801 stars 128 forks source link

Return bayesian.Class Instead of Index? #15

Open donatj opened 8 years ago

donatj commented 8 years ago

LogScores, ProbScores, and SafeProbScores all have a return parameter that is the index of the most likely class. I think if you're ever willing to break the current api it would be a ton more useful to return the actual bayesian.Class.

It would make simple usage as below much easier.

As it stands I have to know the index of which I passed it into my classifier. That's knowledge I'd rather not have to know. It's kind of a difficult way to do it.

const (
    Good bayesian.Class = "Good"
    Bad  bayesian.Class = "Bad"
    Ugly bayesian.Class = "Ugly"
)

classifier := bayesian.NewClassifier(Good, Bad, Ugly)

_, c, _, _ := classifier.SafeProbScores(wht)

if c == Ugly {
    fmt.Println("oh no")
}
jbrukh commented 8 years ago

Fair point, all. Will look into it.

mycalf commented 7 years ago

@donatj

func (c *Classifier) GetClass(intx int) string { return string(c.Classes[intx]) }

donatj commented 7 years ago

@mycalf I don't see that in the API.

func (c *Classifier) GetClass(intx int) Class { return c.Classes[intx] }

Returning the actual Class would be more useful than string, but to me it would still make far more sense for the API to return the class rather than the class index throughout.

jbrukh commented 7 years ago

After reviewing:

I'm recalling that the reason I am returning the scores and an index is because different applications may wish to examine the actual scores across all potential classes. If I return a Class and not an index, then I also have no (ready) way of determining what the actual score was for the most likely class. So, some solutions:

  1. Return the index and the class. (Starting to overly crowd in the return prototype, IMO.)
  2. Add an auxiliary GetClass function as @mycalf suggests. (Clunky.)
  3. Return a map which maps classes to scores. (More memory intensive.)
  4. Add helper methods like MostLikelyClass() to encapsulate the complexity of looking up the class if the scores are not being considered. (My recommendation.)

Thoughts?

abh commented 7 years ago

MostLikelyClass() seems like a good compromise..