quinngroup / dr1dl-pyspark

Dictionary Learning in PySpark
Apache License 2.0
1 stars 1 forks source link

Dimensional consistency #53

Closed magsol closed 8 years ago

magsol commented 8 years ago

Our efforts to move the analysis from column-based to row-based is incomplete. Consequently, the code is in an inconsistent state, resulting in random crashes depending on whether P > T or T > P. For now, we are assuming the data are row-based; therefore, the matrix S should be P x T, and the corresponding u and v vectors should be the correct size and multiply with S on the correct side.

As per #52, we'll be refactoring this a bit to allow either row or columnar data, but this issue is critical in that it can cause the program to crash.

magsol commented 8 years ago

This appears to be fixed, as long as the data on the filesystem are in row-major format. Closing for now.