twitter / hadoop-lzo

Refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20
GNU General Public License v3.0
545 stars 328 forks source link

lzo with gradle #130

Open Arnold1 opened 7 years ago

Arnold1 commented 7 years ago

hi,

i use apache spark, scala and use gradle to build my app. for testing i need lzo codec installed...

i still run into th following error. any idea? the error is not directly related to this repo. but still i would like to know how to use lzo codec within my test

i followed these steps: https://gist.github.com/zedar/c43cbc7ff7f98abee885

here is how i edited my build.gradle:

repositories {
  mavenLocal()
  mavenCentral()
  maven { url "https://repository.cloudera.com/artifactory/cloudera-repos/" }

  maven {
    url "http://maven.twttr.com/"
  }
}

dependencies {
  compile "org.scala-lang:scala-library:$versions.scala_full"
  compile "org.scala-lang:scala-compiler:$versions.scala_full"
  compile "ch.qos.logback:logback-classic:$versions.logback"
  compile "ch.qos.logback:logback-core:$versions.logback"
  compile "com.typesafe.scala-logging:scala-logging_$versions.scala:$versions.scala_logging"
  compile "com.github.scopt:scopt_$versions.scala:$versions.scopt"
  compile "org.apache.spark:spark-core_$versions.scala:$versions.spark"
  compile "org.apache.spark:spark-sql_$versions.scala:$versions.spark"
  compile "org.apache.spark:spark-streaming_$versions.scala:$versions.spark"
  compile "org.apache.spark:spark-hive_$versions.scala:$versions.spark"
  compile "org.slf4j:log4j-over-slf4j:$versions.log4j_over_slf4j"
  compile "com.typesafe:config:$versions.typesafe_config"
  testCompile "com.holdenkarau:spark-testing-base_$versions.scala:${versions.spark}_$versions.spark_testing_base"
  testCompile "org.mockito:mockito-core:$versions.mockito"
  scoverage "org.scoverage:scalac-scoverage-plugin_$versions.scala:$versions.scoverage", "org.scoverage:scalac-scoverage-runtime_$versions.scala:$versions.scoverage"
  testRuntime "org.pegdown:pegdown:$versions.pegdown"
  testCompile "org.scalatest:scalatest_$versions.scala:$versions.scalatest"
  testCompile group: 'com.hadoop.gplcompression', name: 'hadoop-lzo', version: '0.4.17'
  testCompile group: 'org.apache.zookeeper', name: 'zookeeper', version: '3.4.10'
}

it seems spark.read.csv can still not read my .txt.lzo file: error:

- Should load from raw data *** FAILED *** (739 milliseconds)
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 0.0 failed 1 times, most recent failure: Lost task 3.0 in stage 0.0 (TID 3, localhost, executor driver): java.lang.NumberFormatException: For input string: ":�:h}P~j09��}10827#��)80.02,�w0:45:5"�n+<�79<790127827391"
   at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)