UnsatisfiedLinkError at ai.rapids.cudf.Table.gdfReadCSV

leizhanggit commented 5 years ago

Hi all,

I am trying the demo on databricks: https://github.com/rapidsai/spark-examples/blob/master/docs/databricks.md I followed every step, and then an error pop out, when spark read csv and train the model:

object Benchmark {
  def time[R](phase: String)(block: => R): (R, Float) = {
    val t0 = System.currentTimeMillis
    val result = block // call-by-name
    val t1 = System.currentTimeMillis
    println("Elapsed time [" + phase + "]: " + ((t1 - t0).toFloat / 1000) + "s")
    (result, (t1 - t0).toFloat / 1000)
  }
}

// Start training
println("\n------ Training ------")
val (xgbClassificationModel, _) = Benchmark.time("train") {
  xgbClassifier.fit(trainSet)
}

The error message is:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, 10.139.64.7, executor 0): java.lang.UnsatisfiedLinkError: ai.rapids.cudf.Table.gdfReadCSV([Ljava/lang/String;[Ljava/lang/String;[Ljava/lang/String;Ljava/lang/String;JJIBBB[Ljava/lang/String;[Ljava/lang/String;[Ljava/lang/String;)[J

I am also trying to find APIs of GpuDataset, but I can not find any. I can not run the following code, as no "count" function exists. BTW, what is the relation between GpuDataset and cuDF?

val trainSet = reader.csv(trainPath)
val evalSet = reader.csv(evalPath)
trainSet.count()

tgravescs commented 5 years ago

Can you please provide more specific information about your Databricks setup where you got the error message. What runtime, what type of worker, what exactly is in your init script, what is the path to your init script in the cluster advanced settings?

GPUDataset has a very limited api right now, mostly just Read and then for use with XGBoost. If you need to do other ETL processing you will have to use normal Spark for that. We are working on getting more ETL operators implemented but it will through a different plugin and using Spark 3.0. You can find GPUDataset in the XGBoost code: https://github.com/rapidsai/xgboost/blob/rapids-spark/jvm-packages/xgboost4j-spark/src/main/scala/ml/dmlc/xgboost4j/scala/spark/rapids/GpuDataset.scala

tgravescs commented 5 years ago

Note, I tested this out this morning to make sure Databricks didn't make any changes and it did work for me. Here is the configuration I was using:

Cluster Mode: Standard runtime: 5.4 ML (includes Apache Spark 2.4.3, GPU, Scala 2.11) Python Version 3 Disable autoscale WorkerType: p3.2xlarge 1 worker DriverType p3.2xlarge Advanced Options default except for:

add ssh keys
Init Scripts: dbfs:/databricks/scripts/init.sh
Downloaded the 3 jars and put them into dbfs as instructions specified. Keep track of the paths it uses as you need them for the init.sh script

my init.sh contents are below. You will have to update this script to match the file names you uploaded- MAKE SURE THIS IS CORRECT as this could cause your error. To be clear in the script below update the file names:

/7194b940_46ba_41ef_8a41_e8e71130484b-xgboost4j_0_90_1_Beta-6dea8.jar, /6587ff10_a44f_4590_8828_473d6d72e350-cudf_0_8-aaf1d.jar, and d541fac4_424f_459c_86d5_166d1047c664-xgboost4j_spark_0_90_1_Beta-bcdc8.jar

to match what you uploaded.

sudo cp /dbfs/FileStore/jars/7194b940_46ba_41ef_8a41_e8e71130484b-xgboost4j_0_90_1_Beta-6dea8.jar /databricks/jars/spark--maven-trees--ml--xgboost--ml.dmlc--xgboost4j--ml.dmlcxgboost4j0.81.jar sudo cp /dbfs/FileStore/jars/6587ff10_a44f_4590_8828_473d6d72e350-cudf_0_8-aaf1d.jar /databricks/jars/ sudo cp /dbfs/FileStore/jars/d541fac4_424f_459c_86d5_166d1047c664-xgboost4j_spark_0_90_1_Beta-bcdc8.jar /databricks/jars/spark--maven-trees--ml--xgboost--ml.dmlc--xgboost4j-spark--ml.dmlcxgboost4j-spark0.81.jar

used the dbfs cli to upload it: dbfs cp init.sh dbfs:/databricks/scripts/init.sh

start cluster import notebook and run.

leizhanggit commented 5 years ago

Thank you so much! The problem is in my init.sh script. Now the code works!

tgravescs commented 5 years ago

Great, glad to hear, I will close the issue then.

rapidsai / spark-examples

UnsatisfiedLinkError at ai.rapids.cudf.Table.gdfReadCSV #31