millecker / hadoop-1.0.3-gpu

Apache Hadoop and GPU from Shirahata K. et al. (Hybrid Map Task Scheduling for GPU-Based Heterogeneous Clusters)
http://hadoop.illecker.at
8 stars 8 forks source link

How to run the program using cpu and gpu? #1

Open Mingcong opened 10 years ago

Mingcong commented 10 years ago

Hi Millecker,

I am interested in your project. But I am not sure how to run this program.

Below is my run script: hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=true -output output -cpubin bin/cpu-kmeans -gpubin bin/gpu-kmeans -input input/ik_small

When I run the script, it will run successfully. But I doubt the program is only running on the CPU (not using GPU). Is there any means if I can validate whether the program is executed on CPU and GPU?

Thank you very much. Looking forward to your reply!

Best regards, Mingmcong

koichishirahata commented 10 years ago

Dear Mingmcong,

I am Koichi Shirahata, an author of the original paper of Hadoop with GPU.

You can validate running state (including which Map Task is running on CPU/GPU) via our customized web interface for JobTracker (by default http://localhost:50030/). Please see an attached image, which shows blue bars and green bars. Blue bars mean tasks are running on CPU, and Green bars mean tasks are running on GPU.

You can also find running state in $HADOOP_HOME/logs/hadoop-{username}-jobtracker-{hostname}.log. "trackerRunning{CPU,GPU}Maps" indicates the number of running tasks on CPU/GPU. You can also find other informations as "available{CPU,GPU}MapSlots", "finished{CPU,GPU}Maps", "{CPU,GPU}maptaskmeantime", and "accelarationfactor" by running on GPU.

I hope this reply helps you.

Regards, Koichi

hadoop-hybrid

Koichi Shirahata koichi-s@matsulab.is.titech.ac.jp Ph.D. Student (Satoshi Matsuoka Lab.) Dept. of Mathematical and Computing Sciences Tokyo Institute of Technology

Mingcong commented 10 years ago

Dear Koichi,

Thank you for your help. Now I can see the program is running on CPU and GPU. Can I use hadoop pipes instead of hadoop accels to run the program? Besides, I could not run the program only using GPU. (At least cpu run the one map task) Could you tell me how to run the program only using GPU?

Best regards, Mingcong

2014/1/7 Koichi Shirahata notifications@github.com

Dear Mingmcong,

I am Koichi Shirahata, an author of the original paper of Hadoop with GPU.

You can validate running state (including which Map Task is running on CPU/GPU) via our customized web interface for JobTracker (by default http://localhost:50030/). Please see an attached image, which shows blue bars and green bars. Blue bars mean tasks are running on CPU, and Green bars mean tasks are running on GPU.

You can also find running state in $HADOOP_HOME/logs/hadoop-{username}-jobtracker-{hostname}.log. "trackerRunning{CPU,GPU}Maps" indicates the number of running tasks on CPU/GPU. You can also find other informations as "available{CPU,GPU}MapSlots", "finished{CPU,GPU}Maps", "{CPU,GPU}maptaskmeantime", and "accelarationfactor" by running on GPU.

I hope this reply helps you.

Regards, Koichi

2014/1/8 Mingcong notifications@github.com

Hi Millecker,

I am interested in your project. But I am not sure how to run this program.

Below is my run script $B!' (B hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=true -output output -cpubin bin/cpu-kmeans -gpubin bin/gpu-kmeans -input input/ik_small

When I run the script, it will run successfully. But I doubt the program is only running on the CPU (not using GPU). Is there any means if I can validate whether the program is executed on CPU and GPU?

Thank you very much. Looking forward to your reply!

Best regards, Mingmcong

$B!= (B Reply to this email directly or view it on GitHub< https://github.com/millecker/hadoop-1.0.3-gpu/issues/1> .

$BGrH( (B $B980l (B koichi-s@matsulab.is.titech.ac.jp $BEl5~9)6HBg3X (B $B?tM}!&7W;;2J3X@l96 (B $B>>2,8&5f<< (B

Koichi Shirahata koichi-s@matsulab.is.titech.ac.jp Ph.D. Student (Satoshi Matsuoka Lab.) Dept. of Mathematical and Computing Sciences Tokyo Institute of Technology

— Reply to this email directly or view it on GitHubhttps://github.com/millecker/hadoop-1.0.3-gpu/issues/1#issuecomment-31800162 .

koichishirahata commented 10 years ago

Dear Mingcong,

You can use hadoop pipes instead of hadoop accels, since hadoop pipes and hadoop accels call the same command inside _${HADOOPHOME}/bin/hadoop shell script. Please regard pipes as the command for standard hadoop pipes (i.e. without using GPU) and accels as the command for hybrid execution with two binaries, though these commands execute the same command for now.

You can run all tasks on GPU by setting GPU binary on both -cpubin and -gpubin.

Best regards, Koichi

2014/1/9 Mingcong notifications@github.com

Dear Koichi,

Thank you for your help. Now I can see the program is running on CPU and GPU. Can I use hadoop pipes instead of hadoop accels to run the program? Besides, I could not run the program only using GPU. (At least cpu run the one map task) Could you tell me how to run the program only using GPU?

Best regards, Mingcong

2014/1/7 Koichi Shirahata notifications@github.com

Dear Mingmcong,

I am Koichi Shirahata, an author of the original paper of Hadoop with GPU.

You can validate running state (including which Map Task is running on CPU/GPU) via our customized web interface for JobTracker (by default http://localhost:50030/). Please see an attached image, which shows blue bars and green bars. Blue bars mean tasks are running on CPU, and Green bars mean tasks are running on GPU.

You can also find running state in $HADOOP_HOME/logs/hadoop-{username}-jobtracker-{hostname}.log. "trackerRunning{CPU,GPU}Maps" indicates the number of running tasks on CPU/GPU. You can also find other informations as "available{CPU,GPU}MapSlots", "finished{CPU,GPU}Maps", "{CPU,GPU}maptaskmeantime", and "accelarationfactor" by running on GPU.

I hope this reply helps you.

Regards, Koichi

2014/1/8 Mingcong notifications@github.com

Hi Millecker,

I am interested in your project. But I am not sure how to run this program.

Below is my run script $B!' (B hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=true -output output -cpubin bin/cpu-kmeans -gpubin bin/gpu-kmeans -input input/ik_small

When I run the script, it will run successfully. But I doubt the program is only running on the CPU (not using GPU). Is there any means if I can validate whether the program is executed on CPU and GPU?

Thank you very much. Looking forward to your reply!

Best regards, Mingmcong

$B!= (B Reply to this email directly or view it on GitHub< https://github.com/millecker/hadoop-1.0.3-gpu/issues/1> .

$BGrH( (B $B980l (B koichi-s@matsulab.is.titech.ac.jp $BEl5~9)6HBg3X (B $B?tM}!&7W;;2J3X@l96 (B $B>>2,8&5f<< (B

Koichi Shirahata koichi-s@matsulab.is.titech.ac.jp Ph.D. Student (Satoshi Matsuoka Lab.) Dept. of Mathematical and Computing Sciences Tokyo Institute of Technology

$B!=(B Reply to this email directly or view it on GitHub< https://github.com/millecker/hadoop-1.0.3-gpu/issues/1#issuecomment-31800162>

.

$B!=(B Reply to this email directly or view it on GitHubhttps://github.com/millecker/hadoop-1.0.3-gpu/issues/1#issuecomment-31897927 .

$BGrH((B $B980l(B koichi-s@matsulab.is.titech.ac.jp $BEl5~9)6HBg3X(B $B?tM}!&7W;;2J3X@l96(B $B>>2,8&5f<<(B

Koichi Shirahata koichi-s@matsulab.is.titech.ac.jp Ph.D. Student (Satoshi Matsuoka Lab.) Dept. of Mathematical and Computing Sciences Tokyo Institute of Technology

Mingcong commented 10 years ago

Dear Koichi,

Thank you for your reply.

When I use hadoop pipes run your program, it display as the attached image, which shows the work is also run on GPU and CPU. What is more, the SubmitterToAccels.java has been removed in millecker's version. Could you tell me the relationship between pipes and accels?

I also run kmeans2D-gpu.sh(setting GPU binary on both -cpubin and-gpubin), then it displays the same image. I think the CPU is still run the task.

The parameters in mapred-site.xml are follows:

mapred.tasktracker.map.tasks.maximum 5 mapred.map.tasks 4 mapred.tasktracker.map.cpu.tasks.maximum 2 mapred.tasktracker.map.gpu.tasks.maximum 2

If I want to run the program only using GPU, should I change mapred.tasktracker.map.cpu.tasks.maximum to 0?

Best regards, Mingcong

2014/1/9 Koichi Shirahata notifications@github.com

Dear Mingcong,

You can use hadoop pipes instead of hadoop accels, since hadoop pipes and hadoop accels call the same command inside _${HADOOPHOME}/bin/hadoop shell script. Please regard pipes as the command for standard hadoop pipes (i.e. without using GPU) and accels as the command for hybrid execution with two binaries, though these commands execute the same command for now.

You can run all tasks on GPU by setting GPU binary on both -cpubin and -gpubin.

Best regards, Koichi

2014/1/9 Mingcong notifications@github.com

Dear Koichi,

Thank you for your help. Now I can see the program is running on CPU and GPU. Can I use hadoop pipes instead of hadoop accels to run the program?

Besides, I could not run the program only using GPU. (At least cpu run the one map task) Could you tell me how to run the program only using GPU?

Best regards, Mingcong

2014/1/7 Koichi Shirahata notifications@github.com

Dear Mingmcong,

I am Koichi Shirahata, an author of the original paper of Hadoop with GPU.

You can validate running state (including which Map Task is running on CPU/GPU) via our customized web interface for JobTracker (by default http://localhost:50030/). Please see an attached image, which shows blue bars and green bars. Blue bars mean tasks are running on CPU, and Green bars mean tasks are running on GPU.

You can also find running state in $HADOOP_HOME/logs/hadoop-{username}-jobtracker-{hostname}.log. "trackerRunning{CPU,GPU}Maps" indicates the number of running tasks on CPU/GPU. You can also find other informations as "available{CPU,GPU}MapSlots", "finished{CPU,GPU}Maps", "{CPU,GPU}maptaskmeantime", and "accelarationfactor" by running on GPU.

I hope this reply helps you.

Regards, Koichi

2014/1/8 Mingcong notifications@github.com

Hi Millecker,

I am interested in your project. But I am not sure how to run this program.

Below is my run script $B!' (B hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=true -output output -cpubin bin/cpu-kmeans -gpubin bin/gpu-kmeans -input input/ik_small

When I run the script, it will run successfully. But I doubt the program is only running on the CPU (not using GPU). Is there any means if I can validate whether the program is executed on CPU and GPU?

Thank you very much. Looking forward to your reply!

Best regards, Mingmcong

$B!= (B Reply to this email directly or view it on GitHub< https://github.com/millecker/hadoop-1.0.3-gpu/issues/1> .

$BGrH( (B $B980l (B koichi-s@matsulab.is.titech.ac.jp $BEl5~9)6HBg3X (B $B?tM}!&7W;;2J3X@l96 (B $B>>2,8&5f<< (B

Koichi Shirahata koichi-s@matsulab.is.titech.ac.jp Ph.D. Student (Satoshi Matsuoka Lab.) Dept. of Mathematical and Computing Sciences Tokyo Institute of Technology

$B!= (B Reply to this email directly or view it on GitHub<

https://github.com/millecker/hadoop-1.0.3-gpu/issues/1#issuecomment-31800162

.

$B!= (B Reply to this email directly or view it on GitHub< https://github.com/millecker/hadoop-1.0.3-gpu/issues/1#issuecomment-31897927

.

$BGrH( (B $B980l (B koichi-s@matsulab.is.titech.ac.jp $BEl5~9)6HBg3X (B $B?tM}!&7W;;2J3X@l96 (B $B>>2,8&5f<< (B

Koichi Shirahata koichi-s@matsulab.is.titech.ac.jp Ph.D. Student (Satoshi Matsuoka Lab.) Dept. of Mathematical and Computing Sciences Tokyo Institute of Technology

— Reply to this email directly or view it on GitHubhttps://github.com/millecker/hadoop-1.0.3-gpu/issues/1#issuecomment-31914424 .

millecker commented 10 years ago

Hi Mingcong,

I transferred the work of Koichi et al. [1] from Hadoop 0.20.1 to Hadoop 1.0.3 nearly a year ago. If you observe any problems please use the original version of Koichi, which you can find here [2].

What is more, the SubmitterToAccels.java has been removed in millecker's version. Could you tell me the relationship between pipes and accels?

Normally pipes [3] would execute a default Hadoop pipes job only. (without GPU support) But I think I merged the accels (SubmitterToAccels.java) behavior into the default pipes Submitter [4].

I also run kmeans2D-gpu.sh(setting GPU binary on both -cpubin and-gpubin), then it displays the same image. I think the CPU is still run the task.

If you set a CUDA GPU binary on both -cpubin and-gpubin, then a CPU execution is not possible. The Hadoop job configuration will not contain any path to a CPU binary.

Kind regards,

Martin

[1] http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5708524 [2] https://github.com/koichi626/hadoop-gpu [3] https://github.com/millecker/hadoop-1.0.3-gpu/blob/master/hadoop-1.0.3/bin/hadoop#L284-286 [4] https://github.com/millecker/hadoop-1.0.3-gpu/blob/master/hadoop-1.0.3/src/mapred/org/apache/hadoop/mapred/pipes/Submitter.java#L523-529

Mingcong commented 10 years ago

Dear Koichi and Martin,

Thank you for your patient explanation.

If I want to only use GPU, I set a CUDA GPU binary on both -cpubin and-gpubin and using the following parameters in the configure file of mapred-site.xml.

mapred.map.tasks 4 - mapred.tasktracker.map.cpu.tasks.maximum* - 0*
<property>
<name>mapred.tasktracker.map.gpu.tasks.maximum</name>
<value>4</value>
</property>
<property>

When I run the script, it displays: hduser@master:/usr/local/hadoop-gpu-master/hadoop-gpu-0.20.1$ ./kmeans2D.sh input/ik2_sample rmr: cannot remove output: No such file or directory. 14/01/10 12:46:50 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 14/01/10 12:46:50 INFO mapred.FileInputFormat: Total input paths to process : 1 14/01/10 12:46:50 INFO mapred.JobClient: Running job: job_201401101245_0001 14/01/10 12:46:51 INFO mapred.JobClient: map 0% reduce 0%

The task of map is not started, could you help me solve the problem? By the way, it will work if I don't set mapred.tasktracker.map.cpu.tasks.maximum to 0.

What's more, the accelarationfactor is nearly equal to 1 when I use ik2_sample as the input file of kmeans. How can I improve the accelaration? How about using the bigger input file?

Best regards, Mingcong

2014/1/9 Martin Illecker notifications@github.com

Hi Mingcong,

I transferred the work of Koichi et al. [1] from Hadoop 0.20.1 to Hadoop 1.0.3 nearly a year ago. If you observe any problems please use the original version of Koichi, which you can find here [2].

What is more, the SubmitterToAccels.java has been removed in millecker's version. Could you tell me the relationship between pipes and accels?

Normally pipes [3] would execute a default Hadoop pipes job only. (without GPU support) But I think I merged the accels (SubmitterToAccels.java) behavior into the default pipes Submitter [4].

I also run kmeans2D-gpu.sh(setting GPU binary on both -cpubin and-gpubin), then it displays the same image. I think the CPU is still run the task.

If you set a CUDA GPU binary on both -cpubin and-gpubin, then a CPU execution is not possible. The Hadoop job configuration will not contain any path to a CPU binary.

Kind regards,

Martin

[1] http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5708524 [2] https://github.com/koichi626/hadoop-gpu [3] https://github.com/millecker/hadoop-1.0.3-gpu/blob/master/hadoop-1.0.3/bin/hadoop#L284-286 [4] https://github.com/millecker/hadoop-1.0.3-gpu/blob/master/hadoop-1.0.3/src/mapred/org/apache/hadoop/mapred/pipes/Submitter.java#L523-529

— Reply to this email directly or view it on GitHubhttps://github.com/millecker/hadoop-1.0.3-gpu/issues/1#issuecomment-31986794 .

koichishirahata commented 10 years ago

Hi Mingcong,

Accels does not support mapred.tasktracker.map.cpu.tasks.maximum = 0 since the first task must be run as CPU task in current implementation of our scheduler. But you can run GPU binary as CPU task by setting GPU binary on -cpubin. In that case you can regard CPU task as GPU task.

Regards, Koichi

2014/1/11 Mingcong notifications@github.com

Dear Koichi and Martin,

Thank you for your patient explanation.

If I want to only use GPU, I set a CUDA GPU binary on both -cpubin and-gpubin and using the following parameters in the configure file of mapred-site.xml.

mapred.map.tasks 4 - mapred.tasktracker.map.cpu.tasks.maximum* - 0* mapred.tasktracker.map.gpu.tasks.maximum 4 When I run the script, it displays: hduser@master:/usr/local/hadoop-gpu-master/hadoop-gpu-0.20.1$ ./kmeans2D.sh input/ik2_sample rmr: cannot remove output: No such file or directory. 14/01/10 12:46:50 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 14/01/10 12:46:50 INFO mapred.FileInputFormat: Total input paths to process : 1 14/01/10 12:46:50 INFO mapred.JobClient: Running job: job_201401101245_0001 14/01/10 12:46:51 INFO mapred.JobClient: map 0% reduce 0% The task of map is not started, could you help me solve the problem? By the way, it will work if I _don't set mapred.tasktracker.map.cpu.tasks.maximum to 0._ What's more, the accelarationfactor is nearly equal to 1 when I use ik2_sample as the input file of kmeans. How can I improve the accelaration? How about using the bigger input file? Best regards, Mingcong 2014/1/9 Martin Illecker notifications@github.com > Hi Mingcong, > > I transferred the work of Koichi et al. [1] from Hadoop 0.20.1 to Hadoop > 1.0.3 nearly a year ago. > If you observe any problems please use the original version of Koichi, > which you can find here [2]. > > What is more, the SubmitterToAccels.java has been removed in millecker's > version. Could you tell me the relationship between pipes and accels? > > Normally _pipes_ [3] would execute a default Hadoop pipes job only. > (without GPU support) > But I think I merged the _accels_ (SubmitterToAccels.java) behavior into > the default _pipes_ Submitter [4]. > > I also run kmeans2D-gpu.sh(setting GPU binary on both -cpubin > and-gpubin), > then it displays the same image. I think the CPU is still run the task. > > If you set a CUDA GPU binary on both -cpubin and-gpubin, then a CPU > execution is not possible. The Hadoop job configuration will not contain > any path to a CPU binary. > > Kind regards, > > Martin > > [1] http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5708524 > [2] https://github.com/koichi626/hadoop-gpu > [3] > > https://github.com/millecker/hadoop-1.0.3-gpu/blob/master/hadoop-1.0.3/bin/hadoop#L284-286 > [4] > > https://github.com/millecker/hadoop-1.0.3-gpu/blob/master/hadoop-1.0.3/src/mapred/org/apache/hadoop/mapred/pipes/Submitter.java#L523-529 > > $B!=(B > Reply to this email directly or view it on GitHub< > https://github.com/millecker/hadoop-1.0.3-gpu/issues/1#issuecomment-31986794> > > . $B!=(B Reply to this email directly or view it on GitHubhttps://github.com/millecker/hadoop-1.0.3-gpu/issues/1#issuecomment-32050195 .

$BGrH((B $B980l(B koichi-s@matsulab.is.titech.ac.jp $BEl5~9)6HBg3X(B $B?tM}!&7W;;2J3X@l96(B $B>>2,8&5f<<(B

Koichi Shirahata koichi-s@matsulab.is.titech.ac.jp Ph.D. Student (Satoshi Matsuoka Lab.) Dept. of Mathematical and Computing Sciences Tokyo Institute of Technology

Mingcong commented 10 years ago

Dear Koichi,

Ok. Although mapred.tasktracker.map.cpu.tasks.maximum is equal to 1, all the task is running on GPU when I set a CUDA GPU binary on both -cpubin and -gpubin. However, the accelarationfactor is nearly equal to 1 when I use ik2_sample as the input file of kmeans. How can I improve the acceleration? How about using the bigger input file?

Best regards, Mingcong

2014/1/10 Koichi Shirahata notifications@github.com

Hi Mingcong,

Accels does not support mapred.tasktracker.map.cpu.tasks.maximum = 0 since the first task must be run as CPU task in current implementation of our scheduler. But you can run GPU binary as CPU task by setting GPU binary on -cpubin. In that case you can regard CPU task as GPU task.

Regards, Koichi

2014/1/11 Mingcong notifications@github.com

Dear Koichi and Martin,

Thank you for your patient explanation.

If I want to only use GPU, I set a CUDA GPU binary on both -cpubin and-gpubin and using the following parameters in the configure file of mapred-site.xml.

mapred.map.tasks 4 - mapred.tasktracker.map.cpu.tasks.maximum* - 0* mapred.tasktracker.map.gpu.tasks.maximum 4 When I run the script, it displays: hduser@master:/usr/local/hadoop-gpu-master/hadoop-gpu-0.20.1$ ./kmeans2D.sh input/ik2_sample rmr: cannot remove output: No such file or directory. 14/01/10 12:46:50 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 14/01/10 12:46:50 INFO mapred.FileInputFormat: Total input paths to process : 1 14/01/10 12:46:50 INFO mapred.JobClient: Running job: job_201401101245_0001 14/01/10 12:46:51 INFO mapred.JobClient: map 0% reduce 0% The task of map is not started, could you help me solve the problem? By the way, it will work if I _don't set mapred.tasktracker.map.cpu.tasks.maximum to 0._ What's more, the accelarationfactor is nearly equal to 1 when I use ik2_sample as the input file of kmeans. How can I improve the accelaration? How about using the bigger input file? Best regards, Mingcong 2014/1/9 Martin Illecker notifications@github.com > Hi Mingcong, > > I transferred the work of Koichi et al. [1] from Hadoop 0.20.1 to > Hadoop > 1.0.3 nearly a year ago. > If you observe any problems please use the original version of Koichi, > which you can find here [2]. > > What is more, the SubmitterToAccels.java has been removed in > millecker's > version. Could you tell me the relationship between pipes and accels? > > Normally _pipes_ [3] would execute a default Hadoop pipes job only. > (without GPU support) > But I think I merged the _accels_ (SubmitterToAccels.java) behavior > into > the default _pipes_ Submitter [4]. > > I also run kmeans2D-gpu.sh(setting GPU binary on both -cpubin > and-gpubin), > then it displays the same image. I think the CPU is still run the task. > > If you set a CUDA GPU binary on both -cpubin and-gpubin, then a CPU > execution is not possible. The Hadoop job configuration will not > contain > any path to a CPU binary. > > Kind regards, > > Martin > > [1] http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5708524 > [2] https://github.com/koichi626/hadoop-gpu > [3] https://github.com/millecker/hadoop-1.0.3-gpu/blob/master/hadoop-1.0.3/bin/hadoop#L284-286 > [4] https://github.com/millecker/hadoop-1.0.3-gpu/blob/master/hadoop-1.0.3/src/mapred/org/apache/hadoop/mapred/pipes/Submitter.java#L523-529 > $B!= (B > > Reply to this email directly or view it on GitHub< https://github.com/millecker/hadoop-1.0.3-gpu/issues/1#issuecomment-31986794 > . $B!= (B Reply to this email directly or view it on GitHub< https://github.com/millecker/hadoop-1.0.3-gpu/issues/1#issuecomment-32050195 .

$BGrH( (B $B980l (B koichi-s@matsulab.is.titech.ac.jp $BEl5~9)6HBg3X (B $B?tM}!&7W;;2J3X@l96 (B $B>>2,8&5f<< (B

Koichi Shirahata koichi-s@matsulab.is.titech.ac.jp Ph.D. Student (Satoshi Matsuoka Lab.) Dept. of Mathematical and Computing Sciences Tokyo Institute of Technology

Reply to this email directly or view it on GitHubhttps://github.com/millecker/hadoop-1.0.3-gpu/issues/1#issuecomment-32085317 .

koichishirahata commented 10 years ago

Hi Mingcong,

You should set larger input (e.g. 100MB) in order to accelerate by using GPU. The heavier the Map task size is, the more Map tasks are accelerated. Map tasks become heavy by using large input.

You can download k-means input generator from the following URL. https://github.com/koichi626/hadoop-gpu/tree/master/data/kmeans/input2D

Regards, Koichi

2014/1/11 Mingcong notifications@github.com

Dear Koichi,

Ok. Although mapred.tasktracker.map.cpu.tasks.maximum is equal to 1, all the task is running on GPU when I set a CUDA GPU binary on both -cpubin and -gpubin. However, the accelarationfactor is nearly equal to 1 when I use ik2_sample as the input file of kmeans. How can I improve the acceleration? How about using the bigger input file?

Best regards, Mingcong

2014/1/10 Koichi Shirahata notifications@github.com

Hi Mingcong,

Accels does not support mapred.tasktracker.map.cpu.tasks.maximum = 0 since the first task must be run as CPU task in current implementation of our scheduler. But you can run GPU binary as CPU task by setting GPU binary on -cpubin. In that case you can regard CPU task as GPU task.

Regards, Koichi

2014/1/11 Mingcong notifications@github.com

Dear Koichi and Martin,

Thank you for your patient explanation.

If I want to only use GPU, I set a CUDA GPU binary on both -cpubin and-gpubin and using the following parameters in the configure file of mapred-site.xml.

mapred.map.tasks 4 - mapred.tasktracker.map.cpu.tasks.maximum* - 0* mapred.tasktracker.map.gpu.tasks.maximum 4 When I run the script, it displays: hduser@master:/usr/local/hadoop-gpu-master/hadoop-gpu-0.20.1$ ./kmeans2D.sh input/ik2_sample rmr: cannot remove output: No such file or directory. 14/01/10 12:46:50 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 14/01/10 12:46:50 INFO mapred.FileInputFormat: Total input paths to process : 1 14/01/10 12:46:50 INFO mapred.JobClient: Running job: job_201401101245_0001 14/01/10 12:46:51 INFO mapred.JobClient: map 0% reduce 0% The task of map is not started, could you help me solve the problem? By the way, it will work if I _don't set mapred.tasktracker.map.cpu.tasks.maximum to 0._ What's more, the accelarationfactor is nearly equal to 1 when I use ik2_sample as the input file of kmeans. How can I improve the accelaration? How about using the bigger input file? Best regards, Mingcong 2014/1/9 Martin Illecker notifications@github.com > Hi Mingcong, > > I transferred the work of Koichi et al. [1] from Hadoop 0.20.1 to > Hadoop > 1.0.3 nearly a year ago. > If you observe any problems please use the original version of > Koichi, > which you can find here [2]. > > What is more, the SubmitterToAccels.java has been removed in > millecker's > version. Could you tell me the relationship between pipes and > accels? > > Normally _pipes_ [3] would execute a default Hadoop pipes job only. > (without GPU support) > But I think I merged the _accels_ (SubmitterToAccels.java) behavior > into > the default _pipes_ Submitter [4]. > > I also run kmeans2D-gpu.sh(setting GPU binary on both -cpubin > and-gpubin), > then it displays the same image. I think the CPU is still run the > task. > > If you set a CUDA GPU binary on both -cpubin and-gpubin, then a CPU > execution is not possible. The Hadoop job configuration will not > contain > any path to a CPU binary. > > Kind regards, > > Martin > > [1] http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5708524 > [2] https://github.com/koichi626/hadoop-gpu > [3]

https://github.com/millecker/hadoop-1.0.3-gpu/blob/master/hadoop-1.0.3/bin/hadoop#L284-286

[4]

https://github.com/millecker/hadoop-1.0.3-gpu/blob/master/hadoop-1.0.3/src/mapred/org/apache/hadoop/mapred/pipes/Submitter.java#L523-529

$B!= (B

Reply to this email directly or view it on GitHub<

https://github.com/millecker/hadoop-1.0.3-gpu/issues/1#issuecomment-31986794

.

$B!= (B Reply to this email directly or view it on GitHub<

https://github.com/millecker/hadoop-1.0.3-gpu/issues/1#issuecomment-32050195

.

$BGrH( (B $B980l (B koichi-s@matsulab.is.titech.ac.jp $BEl5~9)6HBg3X (B $B?tM}!&7W;;2J3X@l96 (B $B>>2,8&5f<< (B

Koichi Shirahata koichi-s@matsulab.is.titech.ac.jp Ph.D. Student (Satoshi Matsuoka Lab.) Dept. of Mathematical and Computing Sciences Tokyo Institute of Technology

$B!=(B

Reply to this email directly or view it on GitHub< https://github.com/millecker/hadoop-1.0.3-gpu/issues/1#issuecomment-32085317>

.

$B!=(B Reply to this email directly or view it on GitHubhttps://github.com/millecker/hadoop-1.0.3-gpu/issues/1#issuecomment-32085458 .

$BGrH((B $B980l(B koichi-s@matsulab.is.titech.ac.jp $BEl5~9)6HBg3X(B $B?tM}!&7W;;2J3X@l96(B $B>>2,8&5f<<(B

Koichi Shirahata koichi-s@matsulab.is.titech.ac.jp Ph.D. Student (Satoshi Matsuoka Lab.) Dept. of Mathematical and Computing Sciences Tokyo Institute of Technology

Mingcong commented 10 years ago

Hi Koichi and Martin,

Thanks for your help. As you know, there are several solutions to invoke CUDA codes from the Hadoop framework, such as Hadoop Streaming, Hadoop Pipes,Java Native Interface (JNI), JCUDA and so on. In terms of performance, which solution is better?

Best regards, Mingcong

koichishirahata commented 10 years ago

Hi Mingcong,

Although I have not compared the performance of these solutions, I think there are not so much differences among them in terms of performance. I believe they all conduct I/O through Child JVM, not directly from C++/CUDA. If you could find a solution which conducts I/O directly from/to C++/CUDA, it would improve the performance.

Regards, Koichi

2014/1/14 Mingcong notifications@github.com

Hi Koichi and Martin,

Thanks for your help. As you know, there are several solutions to invoke CUDA codes from the Hadoop framework, such as Hadoop Streaming, Hadoop Pipes,Java Native Interface (JNI), JCUDA and so on. In terms of performance, which solution is better?

Best regards, Mingcong

$B!=(B Reply to this email directly or view it on GitHubhttps://github.com/millecker/hadoop-1.0.3-gpu/issues/1#issuecomment-32206773 .

$BGrH((B $B980l(B koichi-s@matsulab.is.titech.ac.jp $BEl5~9)6HBg3X(B $B?tM}!&7W;;2J3X@l96(B $B>>2,8&5f<<(B

Koichi Shirahata koichi-s@matsulab.is.titech.ac.jp Ph.D. Student (Satoshi Matsuoka Lab.) Dept. of Mathematical and Computing Sciences Tokyo Institute of Technology

Indrashish commented 10 years ago

Hi Koichi/Martin,

This is Indrashish.

I just have a small query. For your paper, the code that you have shared here uses Matrix Multiplication and K-Means algorithm. Is there any other algorithm or benchmarks you have used for your project or which is available to your knowledge. Can you please assist regarding this.

Regards, Indra

millecker commented 10 years ago

Hi Indrashish,

I only have transferred the work of Koichi et al. [1] from Hadoop 0.20.1 to Hadoop 1.0.3 and I'm not an author of this paper. Please contact Koichi [2] for questions regarding the paper. I don't think there are any other algorithms or benchmarks available.

Martin

[1] http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5708524 [2] https://github.com/koichi626/hadoop-gpu

Indrashish commented 10 years ago

Hello Martin,

Thanks a lot for your response.

Hello Koichi,

Do you have any idea regarding the same, or the knowledge of any other benchmarks which embeds Hadoop on GPU ?

Regards, Indra

koichishirahata commented 10 years ago

Hi Indra,

Regarding the paper, I used only Matrix Multiplication and K-Means algorithm as you mentioned. After published the paper, I have also implemented large-scale graph processing applications such as PageRank, Random Walk with Restart, Connected Components etc. on top of our new MapReduce implementation on GPU [1]. These applications are based on GIM-V (Generalized Iterative Matrix-Vector) algorithm, which is a iterative algorithm consisting of two continuous MapReduce stages.

For more detail, please read our paper [1]. You can also download source code we used as baseline CPU implementation in Java, which is published by U Kang et al. [2].

Regards, Koichi

[1] Koichi Shirahata et al. "A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for Large-Scale Heterogeneous Supercomputers", in proceedings of Cluster, Cloud and Grid Computing (CCGrid) 2013, http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6546103

[2] U Kang et al. http://www.cs.cmu.edu/~pegasus/

2014/1/22 Indrashish notifications@github.com

Hello Martin,

Thanks a lot for your response.

Hello Koichi,

Do you have any idea regarding the same, or the knowledge of any other benchmarks which embeds Hadoop on GPU ?

Regards, Indra

$B!=(B Reply to this email directly or view it on GitHub.

$BGrH((B $B980l(B koichi-s@matsulab.is.titech.ac.jp $BEl5~9)6HBg3X(B $B?tM}!&7W;;2J3X@l96(B $B>>2,8&5f<<(B

Koichi Shirahata koichi-s@matsulab.is.titech.ac.jp Ph.D. Student (Satoshi Matsuoka Lab.) Dept. of Mathematical and Computing Sciences Tokyo Institute of Technology

Indrashish commented 10 years ago

Hello Koichi,

Thanks a lot for the details. I went through the link that you provided for the Pegasus paper, however I am unable to find the source code for the benchmarks that you mentioned. Can you please share the code of the benchmarks you implemented (such as PageRank, Random Walk with Restart, Connected Components), or if you can share the link from where I can download the source codes for this algorithms.

Thanks again for your help regarding this.

Regards, Indrashish

koichishirahata commented 10 years ago

Hi Indra,

Thank you for being interested in our paper. The source code of our large-scale graph processing applications has not been uploaded yet. I will prepare for uploading it. I would be grateful if you could be patient.

Regards, Koichi

2014/1/24 Indrashish notifications@github.com

Hello Koichi,

Thanks a lot for the details. I went through the link that you provided for the Pegasus paper, however I am unable to find the source code for the benchmarks that you mentioned. Can you please share the code of the benchmarks you implemented (such as PageRank, Random Walk with Restart, Connected Components), or if you can share the link from where I can download the source codes for this algorithms.

Thanks again for your help regarding this.

Regards, Indrashish

$B!=(B Reply to this email directly or view it on GitHubhttps://github.com/millecker/hadoop-1.0.3-gpu/issues/1#issuecomment-33144562 .

$BGrH((B $B980l(B koichi-s@matsulab.is.titech.ac.jp $BEl5~9)6HBg3X(B $B?tM}!&7W;;2J3X@l96(B $B>>2,8&5f<<(B

Koichi Shirahata koichi-s@matsulab.is.titech.ac.jp Ph.D. Student (Satoshi Matsuoka Lab.) Dept. of Mathematical and Computing Sciences Tokyo Institute of Technology

koichishirahata commented 10 years ago

Hi Indra,

I am pleased to inform you that the source code of our high performance large-scale graph processing applications has been uploaded to github ( https://github.com/koichi626/GraphGPU). Currently the source code supports 1 GPU execution. I am also going to upload multi-GPU version soon. If you face any trouble or have any questions, please let me know.

Regards, Koichi

2014/1/25 Koichi Shirahata koichi-s@matsulab.is.titech.ac.jp

Hi Indra,

Thank you for being interested in our paper. The source code of our large-scale graph processing applications has not been uploaded yet. I will prepare for uploading it. I would be grateful if you could be patient.

Regards, Koichi

2014/1/24 Indrashish notifications@github.com

Hello Koichi,

Thanks a lot for the details. I went through the link that you provided for the Pegasus paper, however I am unable to find the source code for the benchmarks that you mentioned. Can you please share the code of the benchmarks you implemented (such as PageRank, Random Walk with Restart, Connected Components), or if you can share the link from where I can download the source codes for this algorithms.

Thanks again for your help regarding this.

Regards, Indrashish

$B!=(B Reply to this email directly or view it on GitHubhttps://github.com/millecker/hadoop-1.0.3-gpu/issues/1#issuecomment-33144562 .

$BGrH((B $B980l(B koichi-s@matsulab.is.titech.ac.jp $BEl5~9)6HBg3X(B $B?tM}!&7W;;2J3X@l96(B $B>>2,8&5f<<(B

Koichi Shirahata koichi-s@matsulab.is.titech.ac.jp Ph.D. Student (Satoshi Matsuoka Lab.) Dept. of Mathematical and Computing Sciences Tokyo Institute of Technology

$BGrH((B $B980l(B koichi-s@matsulab.is.titech.ac.jp $BEl5~9)6HBg3X(B $B?tM}!&7W;;2J3X@l96(B $B>>2,8&5f<<(B

Koichi Shirahata koichi-s@matsulab.is.titech.ac.jp Ph.D. Student (Satoshi Matsuoka Lab.) Dept. of Mathematical and Computing Sciences Tokyo Institute of Technology

walkertraylor commented 10 years ago

Hi koichi626,

I am interested in your research and wondering if you have uploaded the multi-GPU version somewhere. Thank you, Walker Traylor