What steps will reproduce the problem?
1. Create 2-dimensional data drawn from 2-dim multivariate Gaussian
distributions with different means variance = 1. e.g 21 different
distributions, lets say 1000 draws. Total at 21.000 points. (have tried many
different variations and does not have any positive effect on the reported
issue)
2. Train sofia-kmeans with any batch size (tested 500:500:5000) and with any
number of k clusters (tested 64 128 256) using mini_batch_kmeans with fixed
random seed.
command line: sofia-kmeans --k 64 --dimensionality 3 --random_seed 124
--init_type random --opt_type mini_batch_kmeans --mini_batch_size 500
--iterations 10 --objective_after_init --objective_after_training
--training_file traindatafile.svmlight --model_out modelfile.sofia
3. Calculate the training error
command line: sofia-kmeans --model_in modelfile.sofia --test_file
traindatafile.svmlight --objective_on_test --cluster_assignments_out
trainingassignments.sofia
4. run this in a loop as a function of number of iterations. i ran [1 10 100e3
500e3 and 1000e3]
What is the expected output? What do you see instead?
I expect that the training error would fall as a function of number of
iterations used. Since it has fixed seed the random initialization is the same.
This occurs until 100e3 then it start to diverge. i.e. the training error
starts increasing dramatically. The training error becomes even larger than the
random initialization. This is very puzzling to me.
What version of the product are you using? On what operating system?
svn checkout http://sofia-ml.googlecode.com/svn/trunk/sofia-ml
sofia-ml-read-only
performed 10/3-2015
OS: Ubuntu 14.04
Please provide any additional information below.
Attached is the commands and output from sofia-kmeans (sofia_kmeans.txt) and
furthermore all model, assignment and datafiles are provided to reproduce these
finding (tmp.zip)
Original issue reported on code.google.com by hr.j...@hotmail.com on 11 Mar 2015 at 12:36
Original issue reported on code.google.com by
hr.j...@hotmail.com
on 11 Mar 2015 at 12:36Attachments: