Notebook 5_clustering description

scatseisnet / scatseisnet

https://scatseisnet.readthedocs.io

GNU General Public License v3.0

32 stars 13 forks source link

Notebook 5_clustering description #25

Closed Lchuang closed 1 year ago

Lchuang commented 1 year ago

In tutorial notebook 5_clustering

Get cluster coordinates in the feature space, it says

We here see that cluster 4 is moslty constrained by the feature 6.

However, based on the plot, it seems like cluster 4 is mostly constrained by the features 1 or 3, And feature 4 seems to mostly constrain cluster 6. Was the order in the text reversed? Thank you. :)

matrix

leonard-seydoux commented 1 year ago

Thanks for raising this bug! So when I run the same notebook, I get the result shown below, which plaids into the direction of our interpretation. I know that k-means works with a random initialization, which is why we specified random_state = 4 in the fourth cell of the notebook when creating an instance k-means from scikit-learn. I am not sure if the problem comes from this place.

@ReneSteinmann, do you ming running the notebook for a check? I created a branch for working on this bug.

output svg

Lchuang commented 1 year ago

After running the series of notebooks for a few times, I think it actually may be due to the random_state not being set in the FastICA in notebook 3_reduction.ipynb

Extract independent features model = FastICA(n_components=10, whiten="unit-variance")

Once I set the random_state to a fixed value, the results became consistent. Thank you for the hint of the random_state for k-means! :D

leonard-seydoux commented 1 year ago

Thanks for pointing this bug! We did not notice that random_state was missing from the dimensionality reduction notebook. This should work better now indeed. Note that I changed the display of feature contribution into cluster, by taking the absolute value of the centroid. It exhibits better the "absolute" contribution.