Results of the same methods are not the same

jayahm commented 3 years ago

Hi,

I faced some issue with my experiments.

Even though I have set the rendom_satate, the results seem not to be the same (For several methods)..

The first and second experiment was run on two different Jupyter files.

Is this normal or could be some mistake?

Also, the results of single base classifiers are just fine. Except for DS.

jayahm commented 3 years ago

I forgot to mention:

In the first experiment, I trained each DS method separately: method1.fit(..) method2.fit(..)

But for the second experiment, I used loop

list_ds = [method1, method2]

for ... method.fit(...)

jayahm commented 3 years ago

Update:

I found that actually, these DS methods have different results:

APriori
APosteriori
MCB

Menelau commented 3 years ago

Hello,

Can you send me the code you used to run the methods? I need to check the way you are defining and using the random_state.

jayahm commented 3 years ago

Hi

You can find my codes below: https://www.dropbox.com/s/rpyfu5ei22d7btp/For%20DESLIB.zip?dl=0

Apologize if the code is a bit long. You can go to the bottom to see "all results" and compare (the accuracy) between "with-loop" and "without-loop".

I was trying to simply the code but it couldn't show the problem. So, I need to give a bit longer code.

What I did in "without-loop" was manual coding but in "with-loop" I tried to simplify the code from "without-oop" and ended up finding that the results are slightly different for some methods.

jayahm commented 3 years ago

Hi,

I made some further check again.

Other than A Priori, A Posteriori and MCB, DES Clustering also produced inconsistent results in some cases.

While other methods, the results are somehow consistent.

The random_sate is also the same.

Menelau commented 3 years ago

Ok, thanks for sharing the code. I will check it and propose a solution either tomorrow or Wednesday.

jayahm commented 3 years ago

Sure. Thank you for your help. I appreciate that.

jayahm commented 3 years ago

Hi @Menelau

Is there any update on this?

Menelau commented 3 years ago

Hello,

The problem is the way you are setting the random state for the methods. It is exactly the same problem in the previous issue you opened a long time ago: https://github.com/scikit-learn-contrib/DESlib/issues/214

In the experiment you conducted what is happening is that the random state is set once at the beginning and used for all methods. That guarantees that this script will always produce the same result. However, its value will be different for each method that is being called inside the script as each technique that uses the random_state variable alters its internal state and the same variable is being shared among all techniques. So the state that is passed down to the methods will differ. If you want different calls to the DS method to have exactly the same result you need to pass an integer as random state (the same values) rather than a RandomState object whose state will change every time a random operation is made.

If you want two calls of MCB, A posteriori, A priori, and DESClustering to have exactly the same results you will need to always pass an integer value as the random state.

jayahm commented 3 years ago

Hi,

Do you mean, inestead of,

rng = np.random.RandomState(42)

I should do:

rng = 42

Menelau commented 3 years ago

Yes, so in this case, you guarantee that the random seed will be 42 for each technique, and there will be no side-effects caused by the same random state variable being shared between multiple techniques.

I also recommend you to update the code to the new master branch version since we did some changes recently.

Menelau commented 3 years ago

You can read more about different ways of setting the random state on the scikit-learn documentation: https://scikit-learn.org/stable/glossary.html#term-random_state

jayahm commented 3 years ago

I see. Thank you very much for your great help. I'll update the library soon too.

scikit-learn-contrib / DESlib

Results of the same methods are not the same #232