The PySpark API has improved considerably in the last several months--there are now several data structures and distributed methods that can be used in native PySpark.
However, the thunder-project also has very mature Python-based distributed linear algebra structures and methods built on top of Spark that we can use.
The PySpark API has improved considerably in the last several months--there are now several data structures and distributed methods that can be used in native PySpark.
For generating random vectors / matrices:
Distributed data structures and primitives:
However, the thunder-project also has very mature Python-based distributed linear algebra structures and methods built on top of Spark that we can use.