Closed magsol closed 8 years ago
Perfect! Please use T 176 and P 2687340 for the 4tasks03.txt. The 4task01.txt was the largest dataset that worked on the local mode with 16 GB RAM. (-T 176 -P 895780)
What about r and e? I believe we generally set m to 100.
On Thu, Mar 10, 2016 at 4:27 PM milad181 notifications@github.com wrote:
Perfect! Please use T 176 and P 2687340 for the 4tasks03.txt. The 4task01.txt http://bd.hafni.cs.uga.edu/test/4tasks01.txt was the largest dataset that worked on the local mode with 16 GB RAM. (-T 176 -P 895780)
— Reply to this email directly or view it on GitHub https://github.com/quinngroup/pyspark-dictlearning/issues/67#issuecomment-195057055 .
iPhone'd
We generally used r 0.07 -m 5 -e 0.01 to obtain results faster.
@quinngroup/bigneuron
I seem to have a reliable BlueData image working. It's currently crunching the 4tasks03.txt
dataset; so far it's working. I also implemented a few optimizations--broadcasting the random seed at the start of each iteration, and representing v
with a SparseVector
object--to see how they work. They're not fully tested yet so the job may crash at some point.
In the meantime, feel free to use the image and stress test it against either the cluster I've spun up or your own custom cluster. Let me know if there are any problems.
@magsol Dear Dr.quinn, would you please set some credentials for me to work with your cluster ? Thanks
@quinngroup/bigneuron
I'm trying to replicate the errors you're getting from #62 on my on-site BlueData cluster. However, I don't seem to have the datasets you're using. I do have access to
4tasks03.txt
but I don't know what theP
andT
dimensions are, nor what the other parameters (e.g., sparsity levelr
) should be.Please provide that information, as well as a smaller testing dataset (and the associated parameters), and I'll continue testing. Also, happy to provide anyone with credentials to access the BlueData cluster.