quinngroup / dr1dl-pyspark

Dictionary Learning in PySpark
Apache License 2.0
1 stars 1 forks source link

Nodes/CPUs vs data size experiments #56

Closed magsol closed 8 years ago

magsol commented 8 years ago

We need plots showing the performance of our code as a function of the quantity of resources (nodes / CPUs) we throw at the problem.

In theory, it should be a linear speed-up--as we add more nodes/CPUs, it should run faster on the same dataset (though make sure the number of partitions of the RDD increase accordingly).