meetthakkar88 / accord

Automatically exported from code.google.com/p/accord
0 stars 0 forks source link

SMO breaks using SVM with Bootstrapping #8

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Use a SVM algorithm with SMO and Bootstrapping
2. At some point Bootstrapping can select all negative or all positive samples. 
This breaks the code in 
Accord.NET\Sources\Accord.MachineLearning\VectorMachines\Learning\SequentialMini
malOptimization.cs at line 479
3. It tries to find an upper and lower bound for the + en - samples but if 
there are no + samples, an IndexOutOfRangeException is thrown here. 

What is the expected output? What do you see instead?
I'm not too sure what the algorithm is supposed to do in this case. At least 
catch the Exception?

What version of the product are you using? On what operating system?
Accord.NET v2.7.0 on Windows Vista

Please provide any additional information below.

Original issue reported on code.google.com by koen...@gmail.com on 23 Jul 2012 at 3:13

GoogleCodeExporter commented 9 years ago
I forgot to mention that i have a MulticlassSupportVectorMachine instead of a 
normal one. 

Original comment by koen...@gmail.com on 24 Jul 2012 at 8:57

GoogleCodeExporter commented 9 years ago
Thanks for the bug report. However, I believe the proper way of handling this 
situation would be to catch the exception inside the fitting function 
definition, and then return 0 to the Bootstrapping algorithm. Do you think this 
solution would suffice?

Original comment by cesarso...@gmail.com on 24 Jul 2012 at 3:05

GoogleCodeExporter commented 9 years ago
I was thinking about what the proper way to handle this is, and I'm not quite 
sure actually. Because if you return 0, it means that one cycle produced 0% 
accuracy, where one could argue that you obtain 100% since classification is 
trivial (everything is of the same class, so you cannot make any mistakes 
really). 

What do you think?

Original comment by koen...@gmail.com on 24 Jul 2012 at 3:23

GoogleCodeExporter commented 9 years ago
Indeed, returning 100% accuracy seems better.

Another possible solution would be to add a delegate function to check if a 
sample is degenerate before running the learning algorithm. If the sample is 
degenerate (such as have all labels positive or negative) then the 
bootstrapping algorithm could try generating another sample. However this may 
add some bias in the algorithm, since samples wouldn't be completely random 
anymore. And this could also lead to problems in case it turns difficult to 
generate a valid sample randomly.

Perhaps it would be better to report 100% for the time being. I will 
investigate if there is a better approach. I am also open to suggestions.

Original comment by cesarso...@gmail.com on 24 Jul 2012 at 4:07

GoogleCodeExporter commented 9 years ago
Code has been updated, issue will be fixed on next release.

Original comment by cesarso...@gmail.com on 2 Nov 2012 at 3:06

GoogleCodeExporter commented 9 years ago
Fixed on Accord.NET 2.8.

Original comment by cesarso...@gmail.com on 6 Nov 2012 at 5:23