quinngroup / dr1dl-pyspark

Dictionary Learning in PySpark
Apache License 2.0
1 stars 1 forks source link

Data size vs speed up experiments #57

Closed magsol closed 8 years ago

magsol commented 8 years ago

Using the same cluster resources, we need a graph showing the performance of our algorithm as the size of the underlying dataset increases. In theory, this should be a linear slowdown.

We could also do a sub-experiment here that varies the number of RDD partitions and observes the performance (holding the cluster resources and the data size constant).