Closed jayahm closed 3 years ago
I forgot to mention:
In the first experiment, I trained each DS method separately: method1.fit(..) method2.fit(..)
But for the second experiment, I used loop
list_ds = [method1, method2]
for ... method.fit(...)
Update:
I found that actually, these DS methods have different results:
Hello,
Can you send me the code you used to run the methods? I need to check the way you are defining and using the random_state.
Hi
You can find my codes below: https://www.dropbox.com/s/rpyfu5ei22d7btp/For%20DESLIB.zip?dl=0
Apologize if the code is a bit long. You can go to the bottom to see "all results" and compare (the accuracy) between "with-loop" and "without-loop".
I was trying to simply the code but it couldn't show the problem. So, I need to give a bit longer code.
What I did in "without-loop" was manual coding but in "with-loop" I tried to simplify the code from "without-oop" and ended up finding that the results are slightly different for some methods.
Hi,
I made some further check again.
Other than A Priori, A Posteriori and MCB, DES Clustering also produced inconsistent results in some cases.
While other methods, the results are somehow consistent.
The random_sate is also the same.
Ok, thanks for sharing the code. I will check it and propose a solution either tomorrow or Wednesday.
Sure. Thank you for your help. I appreciate that.
Hi @Menelau
Is there any update on this?
Hello,
The problem is the way you are setting the random state for the methods. It is exactly the same problem in the previous issue you opened a long time ago: https://github.com/scikit-learn-contrib/DESlib/issues/214
In the experiment you conducted what is happening is that the random state is set once at the beginning and used for all methods. That guarantees that this script will always produce the same result. However, its value will be different for each method that is being called inside the script as each technique that uses the random_state variable alters its internal state and the same variable is being shared among all techniques. So the state that is passed down to the methods will differ. If you want different calls to the DS method to have exactly the same result you need to pass an integer as random state (the same values) rather than a RandomState object whose state will change every time a random operation is made.
If you want two calls of MCB, A posteriori, A priori, and DESClustering to have exactly the same results you will need to always pass an integer value as the random state.
Hi,
Do you mean, inestead of,
rng = np.random.RandomState(42)
I should do:
rng = 42
Yes, so in this case, you guarantee that the random seed will be 42 for each technique, and there will be no side-effects caused by the same random state variable being shared between multiple techniques.
I also recommend you to update the code to the new master branch version since we did some changes recently.
You can read more about different ways of setting the random state on the scikit-learn documentation: https://scikit-learn.org/stable/glossary.html#term-random_state
I see. Thank you very much for your great help. I'll update the library soon too.
Hi,
I faced some issue with my experiments.
Even though I have set the rendom_satate, the results seem not to be the same (For several methods)..
The first and second experiment was run on two different Jupyter files.
Is this normal or could be some mistake?
Also, the results of single base classifiers are just fine. Except for DS.