jplag / JPlag

State-of-the-Art Software Plagiarism & Collusion Detection
https://jplag.github.io/JPlag/
GNU General Public License v3.0
1.33k stars 307 forks source link

Clustering sometimes crashes JPlag #1247

Open uuqjz opened 1 year ago

uuqjz commented 1 year ago

The default clustering method of JPlag is SpectralClustering, which uses org.apache.commons.math3.linear.EigenDecomposition for the eigenvalue decomposition.

EigenDecomposition has a hard-coded limit for the number of iterations until the algorithm must converge or it throws an unhandled MaxCountExceededException. The limit is set to 30 and as a private field without a setter cannot be changed: private byte maxIter = 30;

This results in JPlag sometimes crashing when the eigenvalue decomposition doesn't converge.

tsaglam commented 1 year ago

Right now, this only occurs when using match merging, right?

uuqjz commented 1 year ago

Right now, this only occurs when using match merging, right?

Yes, I only noticed it when using match merging

SimDing commented 9 months ago

When I implemented the spectral clustering I thought that the eigendecomposition should always work on real symmetric matrices (Spectral Theorem), and the Laplacian matrix in SpectralClustering.java is symmetric by construction. Therefore I did not catch that exception... I can not reproduce this. Maybe numerical errors cause this? We could try to set very small entries in the Laplacian to zero. Like any value $x$ with $|x| < eps * ||L||_F /n^2$.