scalanlp / breeze

Breeze is a numerical processing library for Scala.
www.scalanlp.org
Apache License 2.0
3.44k stars 691 forks source link

Some mysterious precision loss going on #823

Closed adampauls closed 2 years ago

adampauls commented 2 years ago

After upgrading to Breeze 2.0, we have found some strange issues where we need to switch from Float to Double to get the right accuracy where we really shouldn't. Here is a somewhat minimized example:

object Test {
  def main(args: Array[String]): Unit = {
    import breeze.linalg._
    import breeze.stats.distributions.RandBasis
    val outputDim = 2
    val inputDim = 4
    val projection =
      convert(DenseMatrix.rand(outputDim, inputDim, RandBasis.withSeed(1).gaussian) / math.sqrt(outputDim), Float)
    val data = DenseMatrix.eye[Float](inputDim)
    val transformed = projection * data
    val corrMatrixFloat = transformed.t * transformed
    val transformedDouble = convert(transformed, Double)
    val corrMatrixDouble = convert(transformedDouble.t * transformedDouble, Float)
    println((corrMatrixDouble - corrMatrixFloat).toString(1000, 1000))
/* prints
-2.9802322E-8  0.0       -7.4505806E-9  0.0  
0.0            1.195261  1.020851       0.0  
-7.4505806E-9  0.0       0.0            0.0  
0.0            0.0       0.0            0.0 
*/
  }
}

As far as I can tell, entry (2, 2) in corrMatrixFloat is just entirely wrong in a way that should not happen because of precision issues.

I'm on a 2019 Macbook Pro.

adampauls commented 2 years ago

Hmm, I suppose this could be caused by cancellation in issues in more complex matrix multiplication algorithms that I've never really understood. Is this just expected behavior of the underlying native libraries?

adampauls commented 2 years ago

I should add that this is a regression: it does not happen on 1.2, but happens on 1.3 and 2.0.

dlwh commented 2 years ago

I can't repro on linux, and I don't have a macos dev environment these days...

This is what I get:

-2.9802322E-8  0.0           -2.2351742E-8  1.4901161E-8  
0.0            1.1920929E-7  0.0            0.0           
-2.2351742E-8  0.0           0.0            1.1920929E-7  
1.4901161E-8   0.0           1.1920929E-7   0.0           

Given that this is in 1.3 and 2.0, I suspect this is an issue with the new netlib library we're using. IIRC basically the only change from 1.2 to 1.3 was the switch to the new netlib library. (Tagging @luhenry so it's on his radar.)

Can you see if you can repro on Linux (with CI or whatever)? If you can't repro on Linux we can proceed to minimizing it to a sequence of BLAS calls...

adampauls commented 2 years ago

A few new bits of information: • This bug is probably relevant https://github.com/luhenry/netlib/issues/6. • The native libraries are not loading:

Nov 11, 2021 8:00:10 AM dev.ludovic.netlib.InstanceBuilder$NativeBLAS getInstanceImpl
WARNING: Failed to load implementation from:dev.ludovic.netlib.blas.JNIBLAS
Nov 11, 2021 8:00:10 AM dev.ludovic.netlib.InstanceBuilder$NativeBLAS getInstanceImpl
WARNING: Failed to load implementation from:dev.ludovic.netlib.blas.ForeignLinkerBLAS
Nov 11, 2021 8:00:10 AM dev.ludovic.netlib.InstanceBuilder$BLAS getInstanceImpl
WARNING: Failed to load implementation from:dev.ludovic.netlib.NativeBLAS
Nov 11, 2021 8:00:10 AM dev.ludovic.netlib.InstanceBuilder$JavaBLAS getInstanceImpl
WARNING: Failed to load implementation from:dev.ludovic.netlib.blas.VectorBLAS

• A colleague was able to reproduce on windows, though for him the VectorBLAS warning did not appear. I tried making sure that VectorBLAS was loading and it also did not fix the issue.

So I think this points to a bug in the Java implementations of some of the BLAS routines? I'll post an issue in netlib.

adampauls commented 2 years ago

This is definitely a bug in netlib: https://github.com/luhenry/netlib/issues/7

luhenry commented 2 years ago

This should be fixed in https://github.com/luhenry/netlib/releases/tag/v2.2.1. I've just pushed the release through Sonatype, so it should show up in Maven repositories in the next hour or so at https://repo1.maven.org/maven2/dev/ludovic/netlib/blas/2.2.1/

dlwh commented 2 years ago

thanks @luhenry! i'll push an RC tomorrow then. @adampauls do you have a preference for which version i do first? (which might be the only one I do tomorrow), say nothing and I'll do Breeze 2.0

On Sun, Nov 14, 2021 at 11:08 PM Ludovic Henry @.***> wrote:

This should be fixed in https://github.com/luhenry/netlib/releases/tag/v2.2.1. I've just pushed the release through Sonatype, so it should show up in Maven repositories in the next hour or so.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/scalanlp/breeze/issues/823#issuecomment-968596093, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAACLIO3IVS4CTKJOH74EH3UMCWWXANCNFSM5HTEEOGA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

dlwh commented 2 years ago

2.0.1-RC1 is being pushed to maven now, lemme know if it fixes it

delenius commented 2 years ago

I'm still seeing the first 3 of the warnings above on 2.0.1-RC1 (not sure about the precision loss). I don't see the VectorBLAS warning. I am on Mac OS.

dlwh commented 2 years ago

The warning just means you're not getting natives, so degraded performance, but it shouldn't mean anything for the precision thing. Not sure what's going on though