Open magsol opened 8 years ago
@magsol , As I know the u vector will reconstruct the Dictionary, as a consequence sparsity of U will lead the sparsity of Dictionary and I think the dictionary could not be sparse . Is there any possibility to shrink the size of our U vector ? here our U vector size is P which is number of observation ( columns) . Again the problem here is the number of observations which are extremely large.Thus is there any programming solution for partitioning the U vector ? for example dividing U to 10 part and then merging them ?I know it seems theoretically impossible, but I just wanted to know for my self.
@quinngroup/bigneuron
I ran
4tasks03.txt
last night, and it executed for an hour before crashing due toOSError: [Errno 28] No space left on device
. The logs are still up; you can view them under "Jobs" in the BlueData web UI.I looked through the logs and found a few figures that are informative. First, the memory usage:
Swap memory (purple) is, far and away, the biggest problem. This means our intermediate results are becoming so large as the job progresses that they completely saturate the available swap space on the hard disks and cause the job to crash. This is problematic for many reasons.
The next figures show the specific jobs that were executing and which led to the crash.
Note the enormous discrepancy in runtimes between the
matrix_vector
andvector_matrix
operations; the latter is 260x slower than the former, and whoseflatMap
operation is likely cause of the swap space saturation.Keep in mind: this is the largest dataset, so we expect it to be challenging. On the other hand, our framework still needs to work; it should scale gracefully.
I have a few ideas on how to mitigate these issues--change the orientation of
S
, optimize the multiplication operations, and others--but I also have a real worry: the amount of swap space we're using shouldn't continually increase. That suggestsu
is becoming less sparse over time, resulting in more multiplication operations invector_matrix
. Is that possible? My understanding what that it should be going in the opposite direction--more sparse.Please advise.