locationtech / rasterframes

Geospatial Raster support for Spark DataFrames
http://rasterframes.io
Apache License 2.0
246 stars 45 forks source link

Attempting to shade scala-logging dependency #293

Closed echeipesh closed 5 years ago

echeipesh commented 5 years ago

This issue was encountered while we were attempting to shade the com.typesafe.scala-logging classes in an effort to get RasterFrames running in DataBricks environment. DataBricks itself brings ins com.typesafe.scala-logging version 2.1.2 on the class path which is not binary compatible with version 3.9.0 used by RasterFrames and will result in InstantiationException

Reproduce

The failure to shade can be reproduced locally and shows ups as a failed import.

The attempt is done against current develop by trying to hook into the mechanism in RFAssemblyPlugin

diff --git a/project/RFAssemblyPlugin.scala b/project/RFAssemblyPlugin.scala
index 8f69776f..223257a2 100644
--- a/project/RFAssemblyPlugin.scala
+++ b/project/RFAssemblyPlugin.scala
@@ -53,7 +53,8 @@ object RFAssemblyPlugin extends AutoPlugin {
         "com.amazonaws",
         "org.apache.avro",
         "org.apache.http",
-        "com.google.guava"
+        "com.google.guava",
+        "com.typesafe.scalalogging"
       )
       shadePrefixes.map(p ⇒ ShadeRule.rename(s"$p.**" -> s"rf.shaded.$p.@1").inAll)
     },

Then creating assembly using:

sbt  'project pyrasterframes' assembly

Inspecting the assembly jar we see the shading took effect:

~/p/r/p/t/scala-2.11 ❯❯❯ unzip -l pyrasterframes-assembly-0.8.0-SNAPSHOT.jar | grep scalalogging                             develop ✚ ✱
        0  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/
      590  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/CanLog$class.class
      888  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/CanLog.class
     1002  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LazyLogging$class.class
      589  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LazyLogging.class
     3875  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/Logger$.class
    15727  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/Logger.class
     5104  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anon$1.class
     2766  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$1.class
     1366  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$2.class
     1604  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$3.class
     2196  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$4.class
     2203  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$debugMessageArgs$1.class
     2251  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$debugMessageArgsMarker$1.class
     2202  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$errorMessageArgs$1.class
     2250  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$errorMessageArgsMarker$1.class
     6090  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$formatArgs$1.class
     2199  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$infoMessageArgs$1.class
     2248  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$infoMessageArgsMarker$1.class
     2203  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$traceMessageArgs$1.class
     2252  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$traceMessageArgsMarker$1.class
     2199  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$warnMessageArgs$1.class
     2247  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$warnMessageArgsMarker$1.class
    31565  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$.class
     9902  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro.class
    13906  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicit.class
     2292  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro$$anonfun$debugMessageArgs$1.class
     2340  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro$$anonfun$debugMessageArgsMarker$1.class
     2291  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro$$anonfun$errorMessageArgs$1.class
     2339  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro$$anonfun$errorMessageArgsMarker$1.class
     2289  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro$$anonfun$infoMessageArgs$1.class
     2337  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro$$anonfun$infoMessageArgsMarker$1.class
     2292  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro$$anonfun$traceMessageArgs$1.class
     2340  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro$$anonfun$traceMessageArgsMarker$1.class
     2288  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro$$anonfun$warnMessageArgs$1.class
     2337  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro$$anonfun$warnMessageArgsMarker$1.class
    46510  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro$.class
    11097  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro.class
     1030  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/StrictLogging$class.class
      689  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/StrictLogging.class
      449  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/package$.class
      667  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/package.class

Trying to use the assembly goes south:

~/p/r/p/t/scala-2.11 ❯❯❯ spark-shell --jars /Users/eugene/proj/rasterframes/pyrasterframes/target/scala-2.11/pyrasterframes-assembly-0.8.0-SNAPSHOT.jar
19/08/17 13:12:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://10.1.240.31:4040
Spark context available as 'sc' (master = local[*], app id = local-1566061963577).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.3
      /_/

Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_211)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.locationtech.rasterframes._
<console>:23: error: Symbol 'type <none>.scalalogging.LazyLogging' is missing from the classpath.
This symbol is required by 'package org.locationtech.rasterframes.package'.
Make sure that type LazyLogging is in your classpath and check for conflicting dependencies with `-Ylog-classpath`.
A full rebuild may help if 'package.class' was compiled against an incompatible version of <none>.scalalogging.
       import org.locationtech.rasterframes._
              ^

Trying to see what is going on in package org.locationtech.rasterframes.package everything seems hunky-dory:

~/p/r/p/t/s/j/o/l/rasterframes ❯❯❯ javap package\$                                                                                                                      develop ✚ ✱
Warning: Binary file package$ contains org.locationtech.rasterframes.package$
Compiled from "rasterframes.scala"
public final class org.locationtech.rasterframes.package$ implements org.locationtech.rasterframes.StandardColumns,org.locationtech.rasterframes.RasterFunctions,org.locationtech.rasterframes.util.ZeroSevenCompatibilityKit$RasterFunctions,org.locationtech.rasterframes.extensions.Implicits,org.locationtech.rasterframes.jts.Implicits,org.locationtech.rasterframes.encoders.StandardEncoders,org.locationtech.geomesa.spark.jts.DataFrameFunctions$Library,rf.shaded.com.typesafe.scalalogging.LazyLogging {
  public static final org.locationtech.rasterframes.package$ MODULE$;
  public static {};
  public rf.shaded.com.typesafe.scalalogging.Logger logger();
  public org.apache.spark.sql.TypedColumn<java.lang.Object, org.locationtech.jts.geom.Geometry> st_translate(org.apache.spark.sql.Column, org.apache.spark.sql.Column, org.apache.spark.sql.Column);

Note that it implements the shaded version rf.shaded.com.typesafe.scalalogging.LazyLogging

metasim commented 5 years ago

@echeipesh Note: I've not tried to run against Spark 2.4.x. So maybe there's something wrong there.

I wonder if some of this has to do with Scala logging using macros? Maybe an argument to get rid of it, and just use Java APIs?

If you're willing to try something, find any places where a package.scala class inherits from LazyLogger, and remove it, replacing call sites with something in the companion class or whatever... just to see if hiding it a bit more help.

metasim commented 5 years ago

@echeipesh Also, what version of Java is that running under?

pomadchin commented 5 years ago

I would say that figuring out how to shade this dependency would be enough. I don't have arguments to get rid of scala logging: It is more performant: inlines calls and makes calls to the underlying logger only in case the warning level allows it (sounds a bit naive but even these two things will reduce logging overhead).

echeipesh commented 5 years ago

@metasim This is all on Java 1.8 Based on your hunch removing the inheritance of LazyLogger from rasterframes package did the trick:

diff --git a/core/src/main/scala/org/locationtech/rasterframes/rasterframes.scala b/core/src/main/scala/org/locationtech/rasterframes/rasterframes.scala
index 1517e8f0..2b908c7f 100644
--- a/core/src/main/scala/org/locationtech/rasterframes/rasterframes.scala
+++ b/core/src/main/scala/org/locationtech/rasterframes/rasterframes.scala
@@ -21,7 +21,7 @@

 package org.locationtech
 import com.typesafe.config.ConfigFactory
-import com.typesafe.scalalogging.LazyLogging
+import com.typesafe.scalalogging.Logger
 import geotrellis.raster.isData
 import geotrellis.raster.{Tile, TileFeature}
 import geotrellis.spark.{ContextRDD, Metadata, SpaceTimeKey, SpatialKey, TileLayerMetadata}
@@ -33,6 +33,7 @@ import org.locationtech.rasterframes.encoders.StandardEncoders
 import org.locationtech.rasterframes.extensions.Implicits
 import org.locationtech.rasterframes.model.TileDimensions
 import org.locationtech.rasterframes.util.ZeroSevenCompatibilityKit
+import org.slf4j.LoggerFactory
 import shapeless.tag.@@

 import scala.reflect.runtime.universe._
@@ -43,8 +44,10 @@ package object rasterframes extends StandardColumns
   with Implicits
   with rasterframes.jts.Implicits
   with StandardEncoders
-  with DataFrameFunctions.Library
-  with LazyLogging {
+  with DataFrameFunctions.Library {
+
+  @transient
+  protected lazy val logger: Logger = Logger(LoggerFactory.getLogger(getClass.getName))

   @transient
   private[rasterframes]

I have not tested it on DataBricks yet, but this produces an assembly where the imports of RasterFrame packages happens fine. I'll keep you posted.

metasim commented 5 years ago

Reproduced. I'll go ahead and fix this and any similar cases.

metasim commented 5 years ago

A bit further, but serialization error (despite transient??):

$ spark-shell --jars  /Users/sfitch/Coding/OSS/locationtech-rasterframes/pyrasterframes/target/scala-2.11/pyrasterframes-assembly-0.8.1-SNAPSHOT.jar
...

Spark context available as 'sc' (master = local[*], app id = local-1566321608265).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.3
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.locationtech.rasterframes._
import org.locationtech.rasterframes._
scala> import org.locationtech.rasterframes.datasource.raster._
import org.locationtech.rasterframes.datasource.raster._
scala> val rf = spark.read.raster.load("https://s22s-test-geotiffs.s3.amazonaws.com/luray_snp/B02.tif")
java.lang.AssertionError: assertion failed: unsafe symbol Logger (child of package scalalogging) in runtime reflection universe
    at scala.reflect.internal.Symbols$Symbol.<init>(Symbols.scala:184)
    at scala.reflect.internal.Symbols$TypeSymbol.<init>(Symbols.scala:3009)
    at scala.reflect.internal.Symbols$ClassSymbol.<init>(Symbols.scala:3201)
    at scala.reflect.internal.Symbols$StubClassSymbol.<init>(Symbols.scala:3496)
    at scala.reflect.internal.Symbols$Symbol.newStubSymbol(Symbols.scala:498)
    at scala.reflect.internal.pickling.UnPickler$Scan.readExtSymbol$1(UnPickler

...
metasim commented 5 years ago

Calling spark.withRasterFrames triggers it too.

metasim commented 5 years ago

Caught the exception in the debugger after commenting out RF code triggering the AssertionError, and maybe (?) anything inheriting from LazyLogging has a problem? In the backtrace below, error occurred when Scala Reflection was attempting to resolve the type structure for Tile.

Screen Shot 2019-08-20 at 2 00 09 PM

Found this message on the stack in a frame further up, which makes me wonder if shading did't transform all references in all libraries:

missing or invalid dependency detected while loading class file 'geotrellis.raster.Tile'.
Could not access type LazyLogging in package com.typesafe.scalalogging,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'geotrellis.raster.Tile' was compiled against an incompatible version of com.typesafe.scalalogging.
metasim commented 5 years ago

Ran javap directly:

$ javap -cp  /Users/sfitch/Coding/OSS/locationtech-rasterframes/pyrasterframes/target/scala-2.11/pyrasterframes-assembly-0.8.1-SNAPSHOT.jar geotrellis.raster.Tile | grep scalalogging
public interface geotrellis.raster.Tile extends geotrellis.raster.CellGrid, geotrellis.raster.IterableTile, geotrellis.raster.MappableTile<geotrellis.raster.Tile>, rf.shaded.com.typesafe.scalalogging.LazyLogging {

and in the spark-shell:

$ echo ':javap geotrellis.raster.Tile' | spark-shell --jars  /Users/sfitch/Coding/OSS/locationtech-rasterframes/pyrasterframes/target/scala-2.11/pyrasterframes-assembly-0.8.1-SNAPSHOT.jar | grep scalalogging
public interface geotrellis.raster.Tile extends geotrellis.raster.CellGrid, geotrellis.raster.IterableTile, geotrellis.raster.MappableTile<geotrellis.raster.Tile>, rf.shaded.com.typesafe.scalalogging.LazyLogging
    #3 = Utf8               Ljava/lang/Object;Lgeotrellis/raster/CellGrid;Lgeotrellis/raster/IterableTile;Lgeotrellis/raster/MappableTile<Lgeotrellis/raster/Tile;>;Lrf/shaded/com/typesafe/scalalogging/LazyLogging;
   #12 = Utf8               rf/shaded/com/typesafe/scalalogging/LazyLogging
   #13 = Class              #12           // rf/shaded/com/typesafe/scalalogging/LazyLogging
Signature: #3                           // Ljava/lang/Object;Lgeotrellis/raster/CellGrid;Lgeotrellis/raster/IterableTile;Lgeotrellis/raster/MappableTile<Lgeotrellis/raster/Tile;>;Lrf/shaded/com/typesafe/scalalogging/LazyLogging;

and both show all references as having been shaded. :-/

metasim commented 5 years ago

I think the problem might have something to do with Spark making heavy use of Scala TypeTags, which are compile-time generated constructs, likely reflecting the type paths as they were at compile time.

metasim commented 5 years ago

@echeipesh I'm worried that this problem isn't solvable without removing LazyLogging from Tile. My current theory is that you can't shade any type that is processed by the Spark SQL Encoder subsystem due to its dependency on compile-time TypeTags. Since Tile is core to RasterFrames, it's hard to see a workaround on this side of the fence, except perhaps some variant of this:

https://stackoverflow.com/a/40404371/296509

I'm not a fan because it means RasterFrames is essentially requiring an isolated version of  GeoTrellis that has scalalogging shaded for it before its used by RasterFrames.

metasim commented 5 years ago

Branch with moving LazyLogging around in RasterFrames.

https://github.com/s22s/rasterframes/commit/c07ccea238b900473eb471b7b97b99549e6ceca8