phenology / hsr-phenological-modelling

High spatial resolution (HSR) phenological modelling - general respository.
Apache License 2.0
1 stars 0 forks source link

Matrix multiplication using block-matrices consumes too much memory #13

Closed romulogoncalves closed 6 years ago

romulogoncalves commented 6 years ago

When multiplying block-matrices, one of them is converted to a dense matrix. The result is then a large matrix which when it is given to SVD leads to: Message: Job aborted due to stage failure: Task 1 in stage 79.0 failed 4 times, most recent failure: Lost task 1.3 in stage 79.0 (TID 1319, 145.100.58.131, executor 5): java.lang.OutOfMemoryError: Java heap space at org.apache.spark.mllib.linalg.DenseMatrix$.zeros(Matrices.scala:461) at org.apache.spark.mllib.linalg.Matrix$class.multiply(Matrices.scala:105)

romulogoncalves commented 6 years ago

We need maybe to look into something similar to this: https://www.balabit.com/blog/scalable-sparse-matrix-multiplication-in-apache-spark/

romulogoncalves commented 6 years ago

For SVD the matrix is too big for our current platform. We are investigating Randomized SVD which will run outside Spark.