npinto / asgd

Averaged Stochastic Gradient Descent Classifiers
41 stars 21 forks source link

Ready to test! #2

Closed argoncloud closed 12 years ago

argoncloud commented 13 years ago

I fixed all the (apparent) issues with C and BLAS, and now all BLAS functions are called with the right matrix/version dimensions. I gdb'ed them and they all appear to write in the proper memory location. Also, I cleaned up the C file a little bit. Notice that I created a new branch, caa_blas, for the development of the blas-based algorithm.

Now I should start testing that I implemented the logic of the algorithm as a whole correctly. I think a good approach would be to first have a unit test for each of the three functions, and then some integration test for the algorithm. That would be good especially if you plan to push the software as a public library, and to guarantee the correctness of future edits. What do you think?

For the specifics of testing, I can't exactly replicate the python test, as I doubt that the NumPy Mersenne Twister yields the same sequences as the LCG in glibc when seeded with the same seeds (42 and 43). So, if I want to start by testing everything in C, I should find some other way to do it. Any suggestions about this? I am hoping to make much progress from the testing on.

Finally, let me know when you have a chance to pass me the real testing matrices and the reference to the Python wrapping tools that you suggest, so I can also start looking at these. I know you are busy lately, so no worries.

Let me know...

npinto commented 13 years ago

I fixed all the (apparent) issues with C and BLAS, and now all BLAS functions are called with the right matrix/version dimensions. I gdb'ed them and they all appear to write in the proper memory location. Also, I cleaned up the C file a little bit. Notice that I created a new branch, caa_blas, for the development of the blas-based algorithm

Awesome!

Now I should start testing that I implemented the logic of the algorithm as a whole correctly. I think a good approach would be to first have a unit test for each of the three functions, and then some integration test for the algorithm. That would be good especially if you plan to push the software as a public library, and to guarantee the correctness of future edits. What do you think?

Sounds good.

For the specifics of testing, I can't exactly replicate the python test, as I doubt that the NumPy Mersenne Twister yields the same sequences as the LCG in glibc when seeded with the same seeds (42 and 43). So, if I want to start by testing everything in C, I should find some other way to do it. Any suggestions about this? I am hoping to make much progress from the testing on.

I can provide you with pickle, npy or matlab files with X (n_examples, n_features), y_gt "ground truth" labels (n_examples) and y_gv "given" labels (n_examples) to do the regression tests. Then you can make sure that your internal functions are working in your own framework (C-based if you'd like, even though testing in Python is so much easier, even for C libraries).

Finally, let me know when you have a chance to pass me the real testing matrices and the reference to the Python wrapping tools that you suggest, so I can also start looking at these. I know you are busy lately, so no worries.

BTW, you can also dump your random numbers in a format that both languages can understand.

npinto commented 13 years ago

Quick question: did you insure that there is no leak with valgrind ?

argoncloud commented 13 years ago

I did initially but then I made some more modifications, so I'll have to do it again. I was also planning to use Intel Inspector to double check it.

On 10/17/2011 02:22 PM, Nicolas Pinto wrote:

Quick question: did you insure that there is no leak with valgrind ?

npinto commented 13 years ago

Scripting e.g. valgrind in you test suite might help you get this streamlined.

argoncloud commented 12 years ago

I updated all the unit tests (which pass) except for fit. I tested fit manually and it seems definitely to be working, and I started studying the wrapping so I can make an integration test for fit from Python