scikit-learn-contrib / DESlib

A Python library for dynamic classifier and ensemble selection
BSD 3-Clause "New" or "Revised" License
479 stars 106 forks source link

The results of two A Posteriori are not same #214

Closed jayahm closed 4 years ago

jayahm commented 4 years ago

Hi,

I was playing around with the code by repeating the same method twice.

For example:

aposter1 = APosteriori(pool_classifiers, random_state=rng)
aposter1.fit(X_dsel, y_dsel)
print('Classification accuracy of APosteriori1: ', aposter1.score(X_test, y_test))

aposter2 = APosteriori(pool_classifiers, random_state=rng)
aposter2.fit(X_dsel, y_dsel)
print('Classification accuracy of APosteriori2: ', aposter2.score(X_test, y_test))

However, the results are not the same and I'm wondering why

Classification accuracy of APosteriori1:  0.8245614035087719
Classification accuracy of APosteriori2:  0.8333333333333334

The random state is the same too. But, the results of the same method on the same dataset is not the same?

Menelau commented 4 years ago

Hello,

As far as I know, the results are always the same, and the unit tests for the library tests for that based on the scikit-learn check_estimators method.

There may be a problem in a way you are setting the random state. How are you defining the rng variable? Is it an instance of numpy.random.RandomState? That would change the results since the internal value of the rng would be different between the two calls. The internal state of the first call is the seed that you initialized, while the second would have a different internal state since the variable was affected by the first call to the APosteriori method.

jayahm commented 4 years ago

Yes just made simply modification for your "example_heterogeneous.ipynb" file by repeating APosteriori 2 times. You can see the outcome below.

So, do you mean if I assign into two different variables, the results may not be the same?

https://www.dropbox.com/s/ieu34ss1hcpudhx/example_heterogeneous-ap.ipynb?dl=0

Menelau commented 4 years ago

Yes, if more than one base classifier has the same competence level the selected one is picked randomly between them. So the value of the random state passed down to the method influence in this result. That also happens with the A Priori technique.

In the experiment you conducted what is happening is that the random state is set once at the beginning and used for all methods. That guarantees that this script will always produce the same result. However, its value will be different for each method that is being called inside the script as each technique that uses the random_state variable alters its internal state. So the state that is passed down to aposter1 has a different value than aposter2. If you want to calls to the same algorithm have the exaclty same result you will need to pass the same value like:

aposter1 = APosteriori(pool_classifiers, random_state=42)
aposter2 = APosteriori(pool_classifiers, random_state=42)

or define two independent random state variables

rng1 = np.random.RandomState(42)
rng2 = np.random.RandomState(42)
aposter1 = APosteriori(pool_classifiers, random_state=rng1)
aposter2 = APosteriori(pool_classifiers, random_state=rng2)

That will always produce the same result.

jayahm commented 4 years ago

I see. I implemented your suggestion above and the results are the same. Thank you.