BlueData cluster setup for testing

quinngroup / dr1dl-pyspark

Dictionary Learning in PySpark

Apache License 2.0

1 stars 1 forks source link

BlueData cluster setup for testing #67

Closed magsol closed 8 years ago

magsol commented 8 years ago

@quinngroup/bigneuron

I'm trying to replicate the errors you're getting from #62 on my on-site BlueData cluster. However, I don't seem to have the datasets you're using. I do have access to 4tasks03.txt but I don't know what the P and T dimensions are, nor what the other parameters (e.g., sparsity level r) should be.

Please provide that information, as well as a smaller testing dataset (and the associated parameters), and I'll continue testing. Also, happy to provide anyone with credentials to access the BlueData cluster.

milad181 commented 8 years ago

Perfect! Please use T 176 and P 2687340 for the 4tasks03.txt. The 4task01.txt was the largest dataset that worked on the local mode with 16 GB RAM. (-T 176 -P 895780)

magsol commented 8 years ago

What about r and e? I believe we generally set m to 100.

On Thu, Mar 10, 2016 at 4:27 PM milad181 notifications@github.com wrote:

Perfect! Please use T 176 and P 2687340 for the 4tasks03.txt. The 4task01.txt http://bd.hafni.cs.uga.edu/test/4tasks01.txt was the largest dataset that worked on the local mode with 16 GB RAM. (-T 176 -P 895780)

— Reply to this email directly or view it on GitHub https://github.com/quinngroup/pyspark-dictlearning/issues/67#issuecomment-195057055 .

iPhone'd

milad181 commented 8 years ago

We generally used r 0.07 -m 5 -e 0.01 to obtain results faster.

magsol commented 8 years ago

@quinngroup/bigneuron

I seem to have a reliable BlueData image working. It's currently crunching the 4tasks03.txt dataset; so far it's working. I also implemented a few optimizations--broadcasting the random seed at the start of each iteration, and representing v with a SparseVector object--to see how they work. They're not fully tested yet so the job may crash at some point.

In the meantime, feel free to use the image and stress test it against either the cluster I've spun up or your own custom cluster. Let me know if there are any problems.

MOJTABAFA commented 8 years ago

@magsol Dear Dr.quinn, would you please set some credentials for me to work with your cluster ? Thanks