scikit-activeml / scikit-activeml

scikit-activeml: Python library for active learning on top of scikit-learn
https://scikit-activeml.github.io/scikit-activeml-docs/
BSD 3-Clause "New" or "Revised" License
147 stars 14 forks source link

Implement Classification Mixture Model (CMM) #20

Closed mherde closed 3 years ago

mherde commented 4 years ago

See [1].

[1] Reitmaier, T., & Sick, B. (2013). Let us know your decision: Pool-based active training of a generative classifier with the selection strategy 4DS. Information Sciences, 230, 106-131.

mherde commented 4 years ago

@dakot How to compute class frequency estimates? Currently, I see two alternatives:

mherde commented 4 years ago

Related to issue #4.

dakot commented 4 years ago

see Tuan@IJCNN2020

dakot commented 4 years ago

code in pies

mherde commented 4 years ago

@tpham93 @dakot I've just inspected the following equations in the article of IJCNN2020:

(21) , (22) , (23) .

My first implemention of the CMM is in accordance to these equations. However, I wondered whether the use of the responsibilities in Eq. (22) is the best way. These responsibilities are normalized and consider only the proportion among the similarities/distances to the mixture components. Thus, the model can not differ between two instances with the same responsibilities but with different distances to the mixture components (see picture). As an alternative to the responsibilities, I suggest to use the exponentional of the mahalanobis distance to the mixture components in Eq. (22). What do you think?

example

tpham93 commented 4 years ago

Using the Mahalanobis distance should be possible as well. Will there be a problem with PAL, when sampling near a component increases the n a lot more than sampling an instance far away from that component, i.e., the region around x_2 may require a lot more samples to reach the same n as x_1, given the two components mu_1 and mu_2?

mherde commented 4 years ago

@mherde Add parameter to switch between both versions.