Implement Classification Mixture Model (CMM)

mherde commented 4 years ago

See [1].

[1] Reitmaier, T., & Sick, B. (2013). Let us know your decision: Pool-based active training of a generative classifier with the selection strategy 4DS. Information Sciences, 230, 106-131.

mherde commented 4 years ago

@dakot How to compute class frequency estimates? Currently, I see two alternatives:

using responsibilities (these are normalized),
using exponential of mahalanobis distance to the means of the mixtures.

mherde commented 4 years ago

Related to issue #4.

dakot commented 4 years ago

see Tuan@IJCNN2020

dakot commented 4 years ago

code in pies

mherde commented 4 years ago

@tpham93 @dakot I've just inspected the following equations in the article of IJCNN2020:

(21) $n_{j}=\sum_{j=1}^{J}\phi_{j,c}$ , (22) $n_{\mathbf{x}}=\sum_{j=1}^{J}n_{j}\gamma_{\mathbf{x},j}$ , (23) $\mathbf{k}_{\mathbf{x}}=n_{\mathbf{x}}\mathbf{\hat{p}}_{\mathbf{x}}$ .

My first implemention of the CMM is in accordance to these equations. However, I wondered whether the use of the responsibilities in Eq. (22) is the best way. These responsibilities are normalized and consider only the proportion among the similarities/distances to the mixture components. Thus, the model can not differ between two instances with the same responsibilities but with different distances to the mixture components (see picture). As an alternative to the responsibilities, I suggest to use the exponentional of the mahalanobis distance to the mixture components in Eq. (22). What do you think?

example

tpham93 commented 4 years ago

Using the Mahalanobis distance should be possible as well. Will there be a problem with PAL, when sampling near a component increases the n a lot more than sampling an instance far away from that component, i.e., the region around x_2 may require a lot more samples to reach the same n as x_1, given the two components mu_1 and mu_2?

mherde commented 4 years ago

@mherde Add parameter to switch between both versions.

scikit-activeml / scikit-activeml

Implement Classification Mixture Model (CMM) #20