mkolod / spark_gpu_demo

Spark GPU demo
0 stars 4 forks source link

This project is based on Spark? #1

Open databig opened 9 years ago

databig commented 9 years ago

FYI @mkolod

mkolod commented 9 years ago

Hi, yes it is. However, it's a very early test - it doesn't yet have tests, continuous integration, instructions how to set things up for different OSes (you need to provide native CUDA libraries), etc. Besides, most GPU computing isn't performant unless one very carefully observes data locality. This was just a test, I'll develop it if/when I have time.

databig commented 9 years ago

@mkolod Tks to your reply. Which GPU Lib will be your GPU backend. Maybe I can enroll in this project. Here is my native Spark POM.XML. Where to change?

<?xml version="1.0" encoding="UTF-8"?>

4.0.0 org.apache apache 14 org.apache.spark spark-parent 1.1.0 pom Spark Project Parent POM http://spark.apache.org/ Apache 2.0 License http://www.apache.org/licenses/LICENSE-2.0.html repo scm:git:git@github.com:apache/spark.git scm:git:https://git-wip-us.apache.org/repos/asf/spark.git scm:git:git@github.com:apache/spark.git v1.1.0-rc4 matei Matei Zaharia matei.zaharia@gmail.com http://www.cs.berkeley.edu/~matei Apache Software Foundation http://spark.apache.org JIRA https://issues.apache.org/jira/browse/SPARK ``` 3.0.4 Dev Mailing List dev@spark.apache.org dev-subscribe@spark.apache.org dev-unsubscribe@spark.apache.org User Mailing List user@spark.apache.org user-subscribe@spark.apache.org user-unsubscribe@spark.apache.org Commits Mailing List commits@spark.apache.org commits-subscribe@spark.apache.org commits-unsubscribe@spark.apache.org core bagel graphx mllib tools streaming sql/catalyst sql/core sql/hive repl assembly external/twitter external/kafka external/flume external/flume-sink external/zeromq external/mqtt examples UTF-8 UTF-8 1.6 spark 2.10.4 2.10 2.0.1 0.18.1 shaded-protobuf org.spark-project.akka 2.2.3-shaded-protobuf 1.7.5 1.2.17 1.0.4 2.4.1 ${hadoop.version} 0.94.6 1.4.0 3.4.5 0.12.0 1.4.3 1.2.3 8.1.14.v20131031 0.3.6 3.0.0 1.7.6 0.7.1 1.8.3 1.1.0 64m 512m central Maven Repository https://repo1.maven.org/maven2 true false apache-repo Apache Repository https://repository.apache.org/content/repositories/releases true false jboss-repo JBoss Repository https://repository.jboss.org/nexus/content/repositories/releases true false mqtt-repo MQTT Repository https://repo.eclipse.org/content/repositories/paho-releases true false cloudera-repo Cloudera Repository https://repository.cloudera.com/artifactory/cloudera-repos true false mapr-repo MapR Repository http://repository.mapr.com/maven true false spring-releases Spring Release Repository https://repo.spring.io/libs-release true false central https://repo1.maven.org/maven2 true false org.eclipse.jetty jetty-util ${jetty.version} org.eclipse.jetty jetty-security ${jetty.version} org.eclipse.jetty jetty-plus ${jetty.version} org.eclipse.jetty jetty-server ${jetty.version} com.google.guava guava 14.0.1 org.apache.commons commons-lang3 3.3.2 commons-codec commons-codec 1.5 org.apache.commons commons-math3 3.3 test com.google.code.findbugs jsr305 1.3.9 org.slf4j slf4j-api ${slf4j.version} org.slf4j slf4j-log4j12 ${slf4j.version} org.slf4j jul-to-slf4j ${slf4j.version} org.slf4j jcl-over-slf4j ${slf4j.version} log4j log4j ${log4j.version} com.ning compress-lzf 1.0.0 org.xerial.snappy snappy-java 1.0.5.3 net.jpountz.lz4 lz4 1.2.0 com.clearspring.analytics stream 2.7.0 it.unimi.dsi fastutil com.google.protobuf protobuf-java ${protobuf.version} com.twitter chill_${scala.binary.version} ${chill.version} org.ow2.asm asm org.ow2.asm asm-commons com.twitter chill-java ${chill.version} org.ow2.asm asm org.ow2.asm asm-commons ${akka.group} akka-actor_${scala.binary.version} ${akka.version} ${akka.group} akka-remote_${scala.binary.version} ${akka.version} ${akka.group} akka-slf4j_${scala.binary.version} ${akka.version} ${akka.group} akka-testkit_${scala.binary.version} ${akka.version} colt colt 1.2.0 org.apache.mesos mesos ${mesos.version} ${mesos.classifier} com.google.protobuf protobuf-java commons-net commons-net 2.2 io.netty netty-all 4.0.23.Final org.apache.derby derby 10.4.2.0 com.codahale.metrics metrics-core ${codahale.metrics.version} com.codahale.metrics metrics-jvm ${codahale.metrics.version} com.codahale.metrics metrics-json ${codahale.metrics.version} com.codahale.metrics metrics-ganglia ${codahale.metrics.version} com.codahale.metrics metrics-graphite ${codahale.metrics.version} org.scala-lang scala-compiler ${scala.version} org.scala-lang scala-reflect ${scala.version} org.scala-lang jline ${scala.version} org.scala-lang scala-library ${scala.version} org.scala-lang scala-actors ${scala.version} org.scala-lang scalap ${scala.version} org.scalatest scalatest_${scala.binary.version} 2.1.5 test org.easymock easymockclassextension 3.1 test asm asm 3.3.1 test org.mockito mockito-all 1.9.0 test org.scalacheck scalacheck_${scala.binary.version} 1.11.3 test junit junit 4.10 test com.novocode junit-interface 0.10 test org.apache.curator curator-recipes 2.4.0 org.jboss.netty netty org.apache.hadoop hadoop-client ${hadoop.version} asm asm org.ow2.asm asm org.jboss.netty netty commons-logging commons-logging org.mortbay.jetty servlet-api-2.5 javax.servlet servlet-api junit junit org.apache.avro avro ${avro.version} org.apache.avro avro-ipc ${avro.version} io.netty netty org.mortbay.jetty jetty org.mortbay.jetty jetty-util org.mortbay.jetty servlet-api org.apache.velocity velocity org.apache.avro avro-mapred ${avro.version} io.netty netty org.mortbay.jetty jetty org.mortbay.jetty jetty-util org.mortbay.jetty servlet-api org.apache.velocity velocity net.java.dev.jets3t jets3t ${jets3t.version} commons-logging commons-logging org.apache.hadoop hadoop-yarn-api ${yarn.version} javax.servlet servlet-api asm asm org.ow2.asm asm org.jboss.netty netty commons-logging commons-logging org.apache.hadoop hadoop-yarn-common ${yarn.version} asm asm org.ow2.asm asm org.jboss.netty netty javax.servlet servlet-api commons-logging commons-logging org.apache.hadoop hadoop-yarn-server-web-proxy ${yarn.version} asm asm org.ow2.asm asm org.jboss.netty netty javax.servlet servlet-api commons-logging commons-logging org.apache.hadoop hadoop-yarn-client ${yarn.version} asm asm org.ow2.asm asm org.jboss.netty netty javax.servlet servlet-api commons-logging commons-logging org.codehaus.jackson jackson-mapper-asl 1.8.8 org.apache.maven.plugins maven-enforcer-plugin 1.3.1 enforce-versions enforce 3.0.4 ${java.version} org.codehaus.mojo build-helper-maven-plugin 1.8 net.alchim31.maven scala-maven-plugin 3.2.0 scala-compile-first process-resources compile scala-test-compile-first process-test-resources testCompile attach-scaladocs verify doc-jar ${scala.version} incremental true -unchecked -deprecation -feature -language:postfixOps -Xms1024m -Xmx1024m -XX:PermSize=${PermGen} -XX:MaxPermSize=${MaxPermGen} -source ${java.version} -target ${java.version} org.scalamacros paradise_${scala.version} ${scala.macros.version} org.apache.maven.plugins maven-compiler-plugin 3.1 ${java.version} ${java.version} UTF-8 1024m true org.apache.maven.plugins maven-surefire-plugin 2.17 true org.scalatest scalatest-maven-plugin 1.0-RC2 ${project.build.directory}/surefire-reports . ${project.build.directory}/SparkTestSuite.txt -Xmx3g -XX:MaxPermSize=${MaxPermGen} -XX:ReservedCodeCacheSize=512m true ${session.executionRootDirectory} 1 test test org.apache.maven.plugins maven-jar-plugin 2.4 org.apache.maven.plugins maven-antrun-plugin 1.7 org.apache.maven.plugins maven-shade-plugin 2.2 org.apache.maven.plugins maven-source-plugin 2.2.1 true create-source-jar jar-no-fork org.apache.maven.plugins maven-clean-plugin 2.5 work checkpoint org.apache.maven.plugins maven-enforcer-plugin org.codehaus.mojo build-helper-maven-plugin add-scala-sources generate-sources add-source src/main/scala add-scala-test-sources generate-test-sources add-test-source src/test/scala net.alchim31.maven scala-maven-plugin org.apache.maven.plugins maven-source-plugin org.scalastyle scalastyle-maven-plugin 0.4.0 false true false false ${basedir}/src/main/scala ${basedir}/src/test/scala scalastyle-config.xml scalastyle-output.xml UTF-8 package check spark-ganglia-lgpl extras/spark-ganglia-lgpl kinesis-asl extras/kinesis-asl java8-tests org.apache.maven.plugins maven-jar-plugin test-jar extras/java8-tests hadoop-0.23 org.apache.avro avro 0.23.10 hadoop-2.2 2.2.0 2.5.0 hadoop-2.3 2.3.0 2.5.0 0.9.0 hadoop-2.4 2.4.0 2.5.0 0.9.0 yarn-alpha yarn yarn yarn mapr3 false 1.0.3-mapr-3.0.3 2.3.0-mapr-4.0.0-FCS 0.94.17-mapr-1405 3.4.5-mapr-1406 mapr4 false 2.3.0-mapr-4.0.0-FCS 2.3.0-mapr-4.0.0-FCS 0.94.17-mapr-1405-4.0.0-FCS 3.4.5-mapr-1406 org.apache.curator curator-recipes 2.4.0 org.apache.zookeeper zookeeper org.apache.zookeeper zookeeper 3.4.5-mapr-1406 hadoop-provided false org.apache.hadoop hadoop-client provided org.apache.hadoop hadoop-yarn-api provided org.apache.hadoop hadoop-yarn-common provided org.apache.hadoop hadoop-yarn-server-web-proxy provided org.apache.hadoop hadoop-yarn-client provided org.apache.avro avro provided org.apache.avro avro-ipc provided org.apache.zookeeper zookeeper ${zookeeper.version} provided hive false sql/hive-thriftserver ```
mkolod commented 9 years ago

Hi,

Actually, I'm still trying to figure it out. This was just a CUDA test (NVIDIA GeForce GT 750M), but I was actually thinking about heterogeneous computing in the long run (OpenCL). I'm just experimenting :) Feel free to fork and then do a pull request - I don't know myself where the project is going yet :)

Thanks!

Marek

On Wed, Mar 11, 2015 at 11:41 AM, databig notifications@github.com wrote:

@mkolod https://github.com/mkolod Tks to your reply. Which GPU Lib will be your GPU backend. Maybe I can enroll in this project. Here is my native Spark POM.XML. Where to change?

<?xml version="1.0" encoding="UTF-8"?>

4.0.0

org.apache apache 14

org.apache.spark spark-parent 1.1.0 pom Spark Project Parent POM http://spark.apache.org/

Apache 2.0 License http://www.apache.org/licenses/LICENSE-2.0.html repo

scm:git:git@github.com:apache/spark.git scm:git:https://git-wip-us.apache.org/repos/asf/spark.git scm:git:git@github.com:apache/spark.git v1.1.0-rc4

matei Matei Zaharia matei.zaharia@gmail.com http://www.cs.berkeley.edu/~matei Apache Software Foundation http://spark.apache.org

JIRA https://issues.apache.org/jira/browse/SPARK

3.0.4 Dev Mailing List dev@spark.apache.org dev-subscribe@spark.apache.org dev-unsubscribe@spark.apache.org ``` User Mailing List user@spark.apache.org user-subscribe@spark.apache.org user-unsubscribe@spark.apache.org Commits Mailing List commits@spark.apache.org commits-subscribe@spark.apache.org commits-unsubscribe@spark.apache.org ``` core bagel graphx mllib tools streaming sql/catalyst sql/core sql/hive repl assembly external/twitter external/kafka external/flume external/flume-sink external/zeromq external/mqtt examples UTF-8 UTF-8 ``` 1.6 spark 2.10.4 2.10 2.0.1 0.18.1 shaded-protobuf org.spark-project.akka 2.2.3-shaded-protobuf 1.7.5 1.2.17 1.0.4 2.4.1 ${hadoop.version} 0.94.6 1.4.0 3.4.5 0.12.0 1.4.3 1.2.3 8.1.14.v20131031 0.3.6 3.0.0 1.7.6 0.7.1 1.8.3 1.1.0 64m 512m ``` central Maven Repository https://repo1.maven.org/maven2 true false apache-repo Apache Repository https://repository.apache.org/content/repositories/releases true false jboss-repo JBoss Repository https://repository.jboss.org/nexus/content/repositories/releases true false mqtt-repo MQTT Repository https://repo.eclipse.org/content/repositories/paho-releases true false cloudera-repo Cloudera Repository https://repository.cloudera.com/artifactory/cloudera-repos true false mapr-repo MapR Repository http://repository.mapr.com/maven true false spring-releases Spring Release Repository https://repo.spring.io/libs-release true false central https://repo1.maven.org/maven2 true false org.eclipse.jetty jetty-util ${jetty.version} org.eclipse.jetty jetty-security ${jetty.version} org.eclipse.jetty jetty-plus ${jetty.version} org.eclipse.jetty jetty-server ${jetty.version} com.google.guava guava 14.0.1 org.apache.commons commons-lang3 3.3.2 commons-codec commons-codec 1.5 org.apache.commons commons-math3 3.3 test com.google.code.findbugs jsr305 1.3.9 org.slf4j slf4j-api ${slf4j.version} org.slf4j slf4j-log4j12 ${slf4j.version} org.slf4j jul-to-slf4j ${slf4j.version} org.slf4j jcl-over-slf4j ${slf4j.version} log4j log4j ${log4j.version} com.ning compress-lzf 1.0.0 org.xerial.snappy snappy-java 1.0.5.3 net.jpountz.lz4 lz4 1.2.0 com.clearspring.analytics stream 2.7.0 it.unimi.dsi fastutil com.google.protobuf protobuf-java ${protobuf.version} com.twitter chill_${scala.binary.version} ${chill.version} org.ow2.asm asm org.ow2.asm asm-commons com.twitter chill-java ${chill.version} org.ow2.asm asm org.ow2.asm asm-commons ${akka.group} akka-actor_${scala.binary.version} ${akka.version} ${akka.group} akka-remote_${scala.binary.version} ${akka.version} ${akka.group} akka-slf4j_${scala.binary.version} ${akka.version} ${akka.group} akka-testkit_${scala.binary.version} ${akka.version} colt colt 1.2.0 org.apache.mesos mesos ${mesos.version} ${mesos.classifier} com.google.protobuf protobuf-java commons-net commons-net 2.2 io.netty netty-all 4.0.23.Final org.apache.derby derby 10.4.2.0 com.codahale.metrics metrics-core ${codahale.metrics.version} com.codahale.metrics metrics-jvm ${codahale.metrics.version} com.codahale.metrics metrics-json ${codahale.metrics.version} com.codahale.metrics metrics-ganglia ${codahale.metrics.version} com.codahale.metrics metrics-graphite ${codahale.metrics.version} org.scala-lang scala-compiler ${scala.version} org.scala-lang scala-reflect ${scala.version} org.scala-lang jline ${scala.version} org.scala-lang scala-library ${scala.version} org.scala-lang scala-actors ${scala.version} org.scala-lang scalap ${scala.version} org.scalatest scalatest_${scala.binary.version} 2.1.5 test org.easymock easymockclassextension 3.1 test asm asm 3.3.1 test org.mockito mockito-all 1.9.0 test org.scalacheck scalacheck_${scala.binary.version} 1.11.3 test junit junit 4.10 test com.novocode junit-interface 0.10 test org.apache.curator curator-recipes 2.4.0 org.jboss.netty netty org.apache.hadoop hadoop-client ${hadoop.version} asm asm org.ow2.asm asm org.jboss.netty netty commons-logging commons-logging org.mortbay.jetty servlet-api-2.5 javax.servlet servlet-api junit junit org.apache.avro avro ${avro.version} org.apache.avro avro-ipc ${avro.version} io.netty netty org.mortbay.jetty jetty org.mortbay.jetty jetty-util org.mortbay.jetty servlet-api org.apache.velocity velocity org.apache.avro avro-mapred ${avro.version} io.netty netty org.mortbay.jetty jetty org.mortbay.jetty jetty-util org.mortbay.jetty servlet-api org.apache.velocity velocity net.java.dev.jets3t jets3t ${jets3t.version} commons-logging commons-logging org.apache.hadoop hadoop-yarn-api ${yarn.version} javax.servlet servlet-api asm asm org.ow2.asm asm org.jboss.netty netty commons-logging commons-logging org.apache.hadoop hadoop-yarn-common ${yarn.version} asm asm org.ow2.asm asm org.jboss.netty netty javax.servlet servlet-api commons-logging commons-logging org.apache.hadoop hadoop-yarn-server-web-proxy ${yarn.version} asm asm org.ow2.asm asm org.jboss.netty netty javax.servlet servlet-api commons-logging commons-logging org.apache.hadoop hadoop-yarn-client ${yarn.version} asm asm org.ow2.asm asm org.jboss.netty netty javax.servlet servlet-api commons-logging commons-logging org.codehaus.jackson jackson-mapper-asl 1.8.8 org.apache.maven.plugins maven-enforcer-plugin 1.3.1 enforce-versions enforce 3.0.4 ${java.version} org.codehaus.mojo build-helper-maven-plugin 1.8 net.alchim31.maven scala-maven-plugin 3.2.0 scala-compile-first process-resources compile scala-test-compile-first process-test-resources testCompile attach-scaladocs verify doc-jar ${scala.version} incremental true -unchecked -deprecation -feature -language:postfixOps -Xms1024m -Xmx1024m -XX:PermSize=${PermGen} -XX:MaxPermSize=${MaxPermGen} -source ${java.version} -target ${java.version} org.scalamacros paradise_${scala.version} ${scala.macros.version} org.apache.maven.plugins maven-compiler-plugin 3.1 ${java.version} ${java.version} UTF-8 1024m true org.apache.maven.plugins maven-surefire-plugin 2.17 true org.scalatest scalatest-maven-plugin 1.0-RC2 ${project.build.directory}/surefire-reports . ${project.build.directory}/SparkTestSuite.txt -Xmx3g -XX:MaxPermSize=${MaxPermGen} -XX:ReservedCodeCacheSize=512m true ${session.executionRootDirectory} 1 test test org.apache.maven.plugins maven-jar-plugin 2.4 org.apache.maven.plugins maven-antrun-plugin 1.7 org.apache.maven.plugins maven-shade-plugin 2.2 org.apache.maven.plugins maven-source-plugin 2.2.1 true create-source-jar jar-no-fork org.apache.maven.plugins maven-clean-plugin 2.5 work checkpoint ``` org.apache.maven.plugins maven-enforcer-plugin org.codehaus.mojo build-helper-maven-plugin add-scala-sources generate-sources add-source src/main/scala add-scala-test-sources generate-test-sources add-test-source src/test/scala net.alchim31.maven scala-maven-plugin org.apache.maven.plugins maven-source-plugin org.scalastyle scalastyle-maven-plugin 0.4.0 false true false false ${basedir}/src/main/scala ${basedir}/src/test/scala scalastyle-config.xml scalastyle-output.xml UTF-8 package check ``` ``` spark-ganglia-lgpl extras/spark-ganglia-lgpl kinesis-asl extras/kinesis-asl java8-tests org.apache.maven.plugins maven-jar-plugin test-jar extras/java8-tests hadoop-0.23 org.apache.avro avro 0.23.10 hadoop-2.2 2.2.0 2.5.0 hadoop-2.3 2.3.0 2.5.0 0.9.0 hadoop-2.4 2.4.0 2.5.0 0.9.0 yarn-alpha yarn yarn yarn mapr3 false 1.0.3-mapr-3.0.3 2.3.0-mapr-4.0.0-FCS 0.94.17-mapr-1405 3.4.5-mapr-1406 mapr4 false 2.3.0-mapr-4.0.0-FCS 2.3.0-mapr-4.0.0-FCS 0.94.17-mapr-1405-4.0.0-FCS 3.4.5-mapr-1406 org.apache.curator curator-recipes 2.4.0 org.apache.zookeeper zookeeper org.apache.zookeeper zookeeper 3.4.5-mapr-1406 hadoop-provided false org.apache.hadoop hadoop-client provided org.apache.hadoop hadoop-yarn-api provided org.apache.hadoop hadoop-yarn-common provided org.apache.hadoop hadoop-yarn-server-web-proxy provided org.apache.hadoop hadoop-yarn-client provided org.apache.avro avro provided org.apache.avro avro-ipc provided org.apache.zookeeper zookeeper ${zookeeper.version} provided hive false sql/hive-thriftserver ```

— Reply to this email directly or view it on GitHub https://github.com/mkolod/spark_gpu_demo/issues/1#issuecomment-78342687.

mkolod commented 9 years ago

@databig I think the first thing that would be good to do would be to figure out how to resolve the native library issue. I tried setting java.library.path, to no avail. The only thing that worked was sticking the dylib files (OS X, could be so for Linux or dll for Windows) at the root. Neither IntelliJ nor SBT were happy with any other setup, even though System.loadLibrary() should pick up the libraries from java.library.path JVM variable, or from DYLD_LIBRARY_PATH env variable (on OSX, LD_LIBRARY_PATH) on Linux, etc. I don't mind adding Maven, actually. Most people use Maven so feel free to do a PR and I could add it.