microsoft / microsoft-r-open

Microsoft R Open Source
212 stars 69 forks source link

Potential issue when computing SVD on Mac OS #8

Closed matteodefelice closed 5 years ago

matteodefelice commented 7 years ago

I was doing some principal component analysis on my macbook running Microsoft R 3.3.0 when I got some strange results. Double checking with a colleague, I've realised that the output of the SVD function was different from what I may get by using vanilla R.

This is the reproducible result, please load the file (~78 Mb) here: https://dl.dropboxusercontent.com/u/1249990/Cx.rda

With Microsoft R 3.3.0 (x86_64-apple-darwin14.5.0) I get:

>> sv <- svd(Cx)
>> print(sv$d[1:10])

 [1] 122.73664 104.45759  90.52001  87.21890  81.28256  74.33418      73.29427  66.26472  63.51379
[10]  55.20763

Instead on a vanilla R (both with R 3.3 and R 3.3.1 on two different linux machines):

>> sv <- svd(Cx)
>> print(sv$d[1:10])

 [1] 122.73664  34.67177  18.50610  14.04483   8.35690   6.80784   6.14566
 [8]   3.91788   3.76016   2.66381

This is not happening with all the data, if I create some random matrix and I apply svd on that, I get the same results. So, it looks like a sort of numerical instability, isn't it?

UPDATE: I've tried to compute the SVD on the same matrix (Cx) on the same machine (macbook) with the same version of R by using the svd package and finally I get the "right" numbers. Then it seems due to the svd implementation used by Microsoft R Open.

matteodefelice commented 7 years ago

Yes, I've just installed MRO 3.3.1, same numbers...

On Mon, Oct 17, 2016 at 10:18 PM Aaron Grider notifications@github.com wrote:

Hi @matteodefelice https://github.com/matteodefelice. You said you were running on on MRO 3.3.0, are you able to confirm that this issue still exists in MRO 3.3.1?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Microsoft/microsoft-r-open/issues/8#issuecomment-254321052, or mute the thread https://github.com/notifications/unsubscribe-auth/AGEMAofn02XA4bcuFMxdsCJqUFrMWdmEks5q09f7gaJpZM4KXzp8 .

aarongrider commented 7 years ago

Yes, I also confirmed the numbers you are getting. Looking into it.

aarongrider commented 7 years ago

I've narrowed this issue down to the Accelerate library in macOS. Vanilla R ships a different BLAS lib which seems to return different results. On Windows and Linux, we use the MKL math libraries which also seem to return different results.

We are investigating the accuracy of the Accelerate results from the svd(), but as for now we don't have any plans to transition away from this library. Initial research indicates that svd() returning different results using different libraries may be expected, but if we can confirm that the results the Accelerate framework is giving us are inaccurate, there would be a more pressing need to find an alternative.

matteodefelice commented 7 years ago

Any news on this?

jeroenterheerdt commented 5 years ago

Closing this issue since because of lack of activity.