yogeshkarpate / sofia-ml

Automatically exported from code.google.com/p/sofia-ml
0 stars 0 forks source link

Training Data Format and Class Label for kmeans #6

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,

I have changed my training data into sparse data format you mentioned.
./sofia-kmeans --k 1000 --init_type random --opt_type batch_kmeans --iterations 
1000 --objective_after_init --training_file demo/SMLFAutoTrain1s512val.txt 
--model_out demo/CSMLFAutoTrain1s512val.txt
However, I am getting the following errors:
Reading data from: demo/SMLFAutoTrain1s512val.txt
Error reading file demo/SMLFAutoTrain1s512val.txt
I opened your demo.train, I saw that you have square box at the end of every 
vector. How can I changed my data format to yours since the square box at the 
end may not be the only one? I tried to fetch your demo.train file in matlab, 
and it doesn't let me do that either.

For the example of kmeans:
> ./sofia-kmeans --k 5 --init_type random --opt_type mini_batch_kmeans 
--mini_batch_size 100 --iterations 500 --objective_after_init 
--objective_after_training --training_file demo/demo.train --model_out 
demo/clusters.txt
the above command will return the five centroid location, right?
In this case, since only producing the 5 cluster center location, the class 
label in the training data (demo.train) can be assigned with any values, right? 
Of course, I chose, say, all 1 among these values: 1,0,-1.

I look forward to your clarification. 

Thank you,

Fred

Original issue reported on code.google.com by fredro.h...@gmail.com on 23 Sep 2011 at 3:56

Attachments:

GoogleCodeExporter commented 9 years ago
I have solved the training data by putting '\n' in every line of my training 
data (SMLFAutoTrain1s512val.txt). But I found that a lot zeros in every lines 
after my 78-dimensions in each vector in the output file 
(CSMLFAutoTrain1s512val.txt). How can I run the kmeans program not having so 
much zeros in every lines? What is the first field in every line of my output 
data since they are all zeros? I assume that is the class label. Please correct 
me if I am wrong here. 

Original comment by fredro.h...@gmail.com on 23 Sep 2011 at 4:48