Issues with dimensionality off-by-one

What steps will reproduce the problem?
1. Create this training file:

======= train.txt  =======
1 1:1 2:.1 3:.1 200:1                                                           

1 1:1.2 2:.01 3:.01 200:1                                                       

1 1:3 2:.2 3:.41 200:1                                                          

-1 3:4 200:1                                                                    

-1 2:3 200:1                                                                    

-1 1:.1 2:3 3:2 200:1        
====================
2. ./sofia-ml-read-only/sofia-ml --learner_type pegasos --loop_type stochastic 
--lambda 0.1 --iterations 100000 --dimensionality 200 --training_file train.txt 
--model_out debug-model.txt                                                     

3. debug-model.txt has:
-5.01486 -0.169397 -10.0628 -10.0518 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0\
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

The the model should spit out 201 terms, the first being the bias term. Instead 
it spits out 200, and clips off the last weight. When I set dimensionality to 
201, I get what I would expect:

0.263645 0.561799 -0.509116 -0.382012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 \
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 \
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0.263645  

This was compiled from source a couple weeks ago. The program should probably 
crash if you say dimensionality is 200 and there is a "200:x" term in the 
sparse vector representation, unless the no-bias flag is set.
Original issue reported on code.google.com by justi...@gmail.com on 26 Feb 2013 at 3:24
stormxuwz / sofia-ml

Issues with dimensionality off-by-one #10