omyjha / accord

Automatically exported from code.google.com/p/accord
0 stars 0 forks source link

Invalid KDTree unit tests? #84

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
This may be a misinterpretation from my side, but I am having some issues with 
the following `KDTreeTest` unit tests in the *MachineLearning* test library:

* 
[FromDataTest2](https://github.com/accord-net/framework/blob/development/Sources
/Accord.Tests/Accord.Tests.MachineLearning/KDTreeTest.cs#L144)
* 
[NearestTest2](https://github.com/accord-net/framework/blob/development/Sources/
Accord.Tests/Accord.Tests.MachineLearning/KDTreeTest.cs#L263)

If I understand correctly, the points are sorted using one array index at a 
time. To identify the tree root, in `FromDataTest2` you would sort the three 
points {2,3}, {2,4} and {4,3} using index 0, i.e. the first value of the 
respective point array. 

But in this specific case, the first value of the first two point arrays is the 
same. How will the points be ordered then?

I notice that you are using `Array.Sort` to [sort the 
points](https://github.com/accord-net/framework/blob/development/Sources/Accord.
MachineLearning/Structures/KDTree%601.cs#L778). `Array.Sort` uses an *unstable* 
sorting mechanism, so the relative order of items with the same key is not 
preserved, i.e. it is arbitrary.

According to the `FromDataTest2` assertions, the tree root is expected to be 
{2,3}. Do I understand correctly, that the tree root in this case could just as 
well have been set to {2,4}? Or am I missing something in the tree ordering 
theory?

If my observation is correct, and depending on the importance of sorting 
behavior, I would recommend that either:

(a) `Array.Sort` is replaced with a stable sorting algorithm; for example 
`Enumerable.OrderBy` provides stable sorting, or
(b) the points are re-defined to avoid arbitrary results through unstable 
sorting, for example the second point in `FromDataTest2` could be set to {1,4}.

Regards,
Anders @ Cureos

Original issue reported on code.google.com by cureos...@gmail.com on 30 Dec 2013 at 9:37

GoogleCodeExporter commented 9 years ago
Hi Anders,

You are right about the sorting order. I believe this would be particularly 
noticeable if you run the tests on different versions of the .NET runtime, such 
as 4 and 4.5. I think it would be best to write better tests, as it is not 
strictly needed to impose any particular order for the tree. Thanks!

Regards,
Cesar

Original comment by cesarso...@gmail.com on 30 Dec 2013 at 6:22

GoogleCodeExporter commented 9 years ago
Many thanks Cesar, glad I wasn't too far off in my thinking :-)

This is just a hunch, but I am also experiencing issues with the unit test 
"GaussianMixtureModelConstructorTest" in the "GaussianMixtureModelTest" class. 
In the test assertions within the "for" loop, I get an error for sample 0 where 
"gmm.Gaussians.Nearest" computes to 1 where 0 is expected, 
https://github.com/accord-net/framework/blob/development/Sources/Accord.Tests/Ac
cord.Tests.MachineLearning/Clustering/GaussianMixtureModelTest.cs#L94 .

I get the error when I am using a stable sorting algorithm (in my PCL 
libraries) instead of "Array.Sort". This difference is currently the only clue 
I have to explain the test failure. Can you tell whether I am correct in my 
guess, or if I need to dig further to find the true error cause?

Thanks in advance,
Anders

Original comment by cureos...@gmail.com on 30 Dec 2013 at 10:18

GoogleCodeExporter commented 9 years ago
Hi Anders,

This is exactly what I am fixing right now! In a few minutes I will push unit 
test changes to the github repository. Thanks!

Regards,
Cesar

Original comment by cesarso...@gmail.com on 30 Dec 2013 at 10:29

GoogleCodeExporter commented 9 years ago
Hi Anders,

I updated some of the unit tests last week, please let me know if it helps. 
There are still some tests which need to be fixed, though!

Regards,
Cesar

Original comment by cesarso...@gmail.com on 3 Jan 2014 at 12:05

GoogleCodeExporter commented 9 years ago
Hi Cesar,

yes, I found the updates that you pushed the other day, and I have merged 
everything into the PCL repository. The updates worked practically out of the 
box for me, which of course is very satisfying. 

One difference between the .NET and the PCL source code is that I have to 
provide some of the array sorting methods through a different static class, 
since not all *Array.Sort* overloads are included in PCL. But the modifications 
I need to make are still minimal. I have included a class *Arrays* in the 
portable Core project, which contains sorting methods with the same signatures 
as the *System.Array* class.

The previously failing unit test in *GaussianMixtureModelTest* now succeeds. 
However, as far as I can tell the unit tests in *KDTreeTest* have not been 
updated? I still get test failures in these unit tests, but I am not overly 
concerned about this  thanks to your explanations above. When you have a chance 
to update these tests, I intend to merge the updates and I am fairly confident 
that the tests will then succeed :-)

Many thanks for all your work with this!

Best regards,
Anders

Original comment by cureos...@gmail.com on 3 Jan 2014 at 1:39