Closed pankajmi closed 11 years ago
I can't work on this issue without having exact reproduction steps and a full log. Do some projects take longer time than the others? Could it be dependent on the library that you are using? The last portion of the log indicates that multiple jars include org/apache/http/annotation/Immutable.class
. Is there more merging that's going on?
It's intermittent, sometime it stucks forever.
full stack trace
[info] Loading global plugins from /Users/pankajmittal/.sbt/plugins
[info] Loading project definition from /Users/pankajmittal/workspace/workflow/project
[warn] Multiple resolvers having different access mechanism configured with same name 'local'. To avoid conflict, Remove duplicate project resolvers (`resolvers`) or rename publishing resolver (`publishTo`).
[info] Set current project to workflow (in build file:/Users/pankajmittal/workspace/workflow/)
> project taskmanager
[info] Set current project to taskmanager (in build file:/Users/pankajmittal/workspace/workflow/)
> assembly
[info] Packaging /Users/pankajmittal/workspace/workflow/taskmanager/target/scala-2.9.2/taskmanager_2.9.2-1.0-sources.jar ...
[info] Done packaging.
[info] Compiling 10 Scala sources to /Users/pankajmittal/workspace/workflow/taskmanager/target/scala-2.9.2/classes...
[info] Compiling 3 Scala sources to /Users/pankajmittal/workspace/workflow/taskmanager/target/scala-2.9.2/classes...
[warn] /Users/pankajmittal/workspace/workflow/taskmanager/src/main/scala/com/livestream/taskmanager/TaskManagerServiceConfig.scala:7: class ServerConfig in package config is deprecated: no direct replacement
[warn] class TaskManagerServiceConfig extends ServerConfig[TaskManagerService] {
[warn] ^
[warn] one warning found
[info] No tests to run for taskmanager/test:test
[info] Including antlr-2.7.2.jar
[info] Including c3p0-0.9.1.2.jar
[info] Including logback-classic-1.0.1.jar
[info] Including logback-core-1.0.1.jar
[info] Including jsr305-1.3.9.jar
[info] Including guava-13.0.jar
[info] Including h2-1.3.170.jar
[info] Including amqp-client-3.0.0.jar
[info] Including paranamer-2.4.1.jar
[info] Including finagle-core-6.0.3.jar
[info] Including finagle-http-6.0.3.jar
[info] Including finagle-ostrich4-6.0.3.jar
[info] Including ostrich-9.0.4.jar
[info] Including scala-json-3.0.1.jar
[info] Including util-codec-6.0.4.jar
[info] Including util-collection-6.0.4.jar
[info] Including util-core-6.0.4.jar
[info] Including util-eval-6.0.4.jar
[info] Including util-hashing-6.0.4.jar
[info] Including util-jvm-6.0.4.jar
[info] Including util-logging-6.0.4.jar
[info] Including akka-actor-2.0.5.jar
[info] Including config-0.5.0.jar
[info] Including commons-chain-1.1.jar
[info] Including commons-codec-1.6.jar
[info] Including commons-collections-3.2.1.jar
[info] Including commons-digester-1.8.jar
[info] Including commons-lang-2.6.jar
[info] Including commons-logging-1.1.1.jar
[info] Including commons-net-3.1.jar
[info] Including commons-pool-1.6.jar
[info] Including commons-validator-1.3.1.jar
[info] Including dom4j-1.1.jar
[info] Including netty-3.5.5.Final.jar
[info] Including mysql-connector-java-5.1.6.jar
[info] Including jna-3.3.0.jar
[info] Including lift-json_2.9.2-2.5-SNAPSHOT.jar
[info] Including opencsv-1.8.jar
[info] Including httpclient-4.1.3.jar
[info] Including httpcore-nio-4.2.jar
[info] Including httpcore-4.2.jar
[info] Including httpmime-4.1.3.jar
[info] Including struts-core-1.3.8.jar
[info] Including struts-taglib-1.3.8.jar
[info] Including struts-tiles-1.3.8.jar
[info] Including velocity-tools-2.0.jar
[info] Including velocity-1.7.jar
[info] Including quartz-2.1.3.jar
[info] Including scalap-2.9.2.jar
[info] Including slf4j-api-1.6.4.jar
[info] Including squartz_2.9.2-1.0-SNAPSHOT.jar
[info] Including syslog4j-0.9.30.jar
[info] Including oro-2.0.8.jar
[info] Including postgresql-9.1-901.jdbc4.jar
[info] Including jedis-2.1.0.jar
[info] Including sslext-1.2-0.jar
[info] Including scala-compiler.jar
[info] Including scala-library.jar
[info] Including java-gearman-service-0.6.6.jar
[info] Including gdata-client-1.0.jar
[info] Including gdata-core-1.0.jar
[info] Including gdata-spreadsheet-3.0.jar
[info] Including google-oauth-client-1.10.0-beta.jar
[info] Merging 'License.txt' with strategy 'rename'
[info] Merging 'NOTICE.txt' with strategy 'rename'
[info] Merging 'META-INF/NOTICE.txt' with strategy 'rename'
[info] Merging 'META-INF/NOTICE' with strategy 'rename'
[info] Merging 'META-INF/license' with strategy 'rename'
[info] Merging 'META-INF/LICENSE.txt' with strategy 'rename'
[info] Merging 'LICENSE.txt' with strategy 'rename'
[info] Merging 'META-INF/LICENSE' with strategy 'rename'
[info] Merging 'org/apache/http/annotation/ThreadSafe.class' with strategy 'deduplicate'
[info] Merging 'org/apache/http/annotation/NotThreadSafe.class' with strategy 'deduplicate'
[info] Merging 'META-INF/MANIFEST.MF' with strategy 'discard'
[info] Merging 'org/apache/http/annotation/GuardedBy.class' with strategy 'deduplicate'
[info] Merging 'META-INF/INDEX.LIST' with strategy 'discard'
[info] Merging 'reference.conf' with strategy 'concat'
[info] Merging 'META-INF/services/java.sql.Driver' with strategy 'filterDistinctLines'
[info] Merging 'org/apache/http/annotation/Immutable.class' with strategy 'deduplicate'
[info] SHA-1: WrappedArray(9, 3, 98, -116, -112, 53, 88, -58, 19, -63, 22, 57, 99, 107, 15, 63, -114, 113, 53, 87)
[info] Packaging /Users/pankajmittal/workspace/workflow/taskmanager/target/taskmanager-assembly-1.0.jar ...
[info] Done packaging.
[success] Total time: 809 s, completed 1 Feb, 2013 5:56:44 PM
sbt version : 0.12.2 system info : Software OS X 10.8.2 (12C60), Process: 2.5 GHz intel core i5, Memory: 4gb 1333 MHz DDR3
Let me know if you want me to monitor on system during process. I monitored memory almost 600mb was free, cpu was 97% idle.
Have you tried running jstack <pid>
while the process hung? If you could post its output one could at least see in which call it is spending its time. (Maybe run it a view times to see if it returns similar results)
Caching feature was requested in #59 and was added to the latest 0.8.5. This calculates SHA-1 of every single *.class file prior to packaging. Trying running clean
and test
first to see if the assemblying speeds up.
I have the same issue with SBT Assembly 0.8.5, Scala 2.9.2 and SBT 0.12.2. Here is my jstack output: https://gist.github.com/4707228
As I said it's intermittent, it doesn't happen always. Here are few jstacks when it happened last time.
Same here! and I dont know if it finish soon!
jstack: https://gist.github.com/amrnt/82a716d70b9e1919bbcd
I'm on Macbook Air 4GB - Scala 2.10.0
It seems like most of your jstacks outputs are in some phase of calculating the hash, which is consistent with my theory that this was introduces as part of #59. I could add a setting to enable/disable caching behavior and turn it off by default.
After I updated my sbt from 0.12.0 to 0.12.2 It works fine!
Really? @pankajmi Could you please try sbt 0.12.2, and see if the situation changes for you too?
0.8.6 is out with default caching turned off.
I was having problem with 0.12.2, as I mentioned above. Let me try 0.8.6 version and see, is it going to have any side effects ?
I tried sbt-assembly version 0.8.6 with sbt 0.12.2. , see time taken below -
[info] Including antlr-2.7.2.jar
[info] Including c3p0-0.9.1.2.jar
[info] Including logback-classic-1.0.1.jar
[info] Including logback-core-1.0.1.jar
[info] Including jsr305-1.3.9.jar
[info] Including guava-13.0.jar
[info] Including h2-1.3.170.jar
[info] Including paranamer-2.4.1.jar
[info] Including finagle-core-6.0.3.jar
[info] Including finagle-http-6.0.3.jar
[info] Including finagle-ostrich4-6.0.3.jar
[info] Including ostrich-9.0.4.jar
[info] Including scala-json-3.0.1.jar
[info] Including util-codec-6.0.4.jar
[info] Including util-collection-6.0.4.jar
[info] Including util-core-6.0.4.jar
[info] Including util-eval-6.0.4.jar
[info] Including util-hashing-6.0.4.jar
[info] Including util-jvm-6.0.4.jar
[info] Including util-logging-6.0.4.jar
[info] Including akka-actor-2.0.5.jar
[info] Including config-0.5.0.jar
[info] Including commons-chain-1.1.jar
[info] Including commons-codec-1.6.jar
[info] Including commons-collections-3.2.1.jar
[info] Including commons-digester-1.8.jar
[info] Including commons-lang-2.6.jar
[info] Including commons-logging-1.1.1.jar
[info] Including commons-net-3.1.jar
[info] Including commons-pool-1.6.jar
[info] Including commons-validator-1.3.1.jar
[info] Including dom4j-1.1.jar
[info] Including netty-3.5.5.Final.jar
[info] Including mysql-connector-java-5.1.6.jar
[info] Including jna-3.3.0.jar
[info] Including lift-json_2.9.2-2.5-SNAPSHOT.jar
[info] Including opencsv-1.8.jar
[info] Including httpclient-4.1.3.jar
[info] Including httpcore-nio-4.2.jar
[info] Including httpcore-4.2.jar
[info] Including httpmime-4.1.3.jar
[info] Including struts-core-1.3.8.jar
[info] Including struts-taglib-1.3.8.jar
[info] Including struts-tiles-1.3.8.jar
[info] Including velocity-tools-2.0.jar
[info] Including velocity-1.7.jar
[info] Including quartz-2.1.3.jar
[info] Including scalap-2.9.2.jar
[info] Including slf4j-api-1.6.4.jar
[info] Including squartz_2.9.2-1.0-SNAPSHOT.jar
[info] Including syslog4j-0.9.30.jar
[info] Including oro-2.0.8.jar
[info] Including postgresql-9.1-901.jdbc4.jar
[info] Including jedis-2.1.0.jar
[info] Including sslext-1.2-0.jar
[info] Including scala-compiler.jar
[info] Including scala-library.jar
[info] Including java-gearman-service-0.6.6.jar
[info] Including gdata-client-1.0.jar
[info] Including gdata-core-1.0.jar
[info] Including gdata-spreadsheet-3.0.jar
[info] Including google-oauth-client-1.10.0-beta.jar
[info] Merging 'License.txt' with strategy 'rename'
[info] Merging 'NOTICE.txt' with strategy 'rename'
[info] Merging 'META-INF/NOTICE.txt' with strategy 'rename'
[info] Merging 'META-INF/NOTICE' with strategy 'rename'
[info] Merging 'META-INF/license' with strategy 'rename'
[info] Merging 'META-INF/LICENSE.txt' with strategy 'rename'
[info] Merging 'LICENSE.txt' with strategy 'rename'
[info] Merging 'META-INF/LICENSE' with strategy 'rename'
[info] Merging 'org/apache/http/annotation/ThreadSafe.class' with strategy 'deduplicate'
[info] Merging 'org/apache/http/annotation/NotThreadSafe.class' with strategy 'deduplicate'
[info] Merging 'META-INF/MANIFEST.MF' with strategy 'discard'
[info] Merging 'org/apache/http/annotation/GuardedBy.class' with strategy 'deduplicate'
[info] Merging 'META-INF/INDEX.LIST' with strategy 'discard'
[info] Merging 'reference.conf' with strategy 'concat'
[info] Merging 'META-INF/services/java.sql.Driver' with strategy 'filterDistinctLines'
[info] Merging 'org/apache/http/annotation/Immutable.class' with strategy 'deduplicate'
[info] Packaging /Users/pankajmittal/workspace/workflow/taskmanager/target/taskmanager-assembly-1.0.jar ...
[info] Done packaging.
[success] Total time: 342 s, completed 13 Feb, 2013 12:52:58 PM
Still taking time, and jstack is as follows - https://gist.github.com/pankajmi/4942861
That's less than half the time as your original report 724 s, and the jstack shows that it's actually making jar instead of doing hash.
Creating assembly takes time because we are unzipping all jars and zipping them back again. If you want to avoid that you could take a look at https://github.com/sbt/sbt-onejar.
@eed3si9n :Thanks for reply, actually I tried sbt-onejar just to check the size of jar that gets generated and surprised to see it generates even smaller jar though it simply puts all in one jar without unzipping. Can you please tell me what other advantages we can get from sbt-assembly apart from assuming it is going to generate one small jar after all merge and all ? I actually thought sbt-assembly should generate smaller.
sbt-assembly is simpler. see sbt-onejar's page for the comparison.
I found sbt-assembly takes much longer to compile than one-jar but it is much faster to execute the jar once compiled. As an example, one of my one-jar application takes nearly 30seconds to launch where the same sbt-assembly version is almost immediate.
I am assuming the difference is the class clashing is determined at compile time with sbt-assembly and with one-jar the specialized class loader is occurring at run-time.
@aradke this is an interesting observation. Here's another speculation why onejar would be slower: You can access entries of a zip file without extracting the whole archive. However, you can not access entries of a zip file which lies inside another zip file without extracting the inner zip file. So, to find a class (at least the first) with the one-jar approach the JVM has to extract all of the dependency jars once to find out where the class lies and then load it.
Still, 30 seconds sounds very slow.
Just released 0.9.0 incorporating the pull req #83. This caches the jar unzipping results and cuts down the assembly time for second run onwards: http://notes.implicit.ly/post/51259892611/sbt-assembly-0-9-0
Using sbt-assembly 0.9.0, sbt 0.12.3 and scala 2.10, I generate a single 76M jar file in 1612 seconds! Is there a way to cut assembly time further?
@gip is that for the first run or the second? caching should improve the performance for the second run forward.
Just released 0.10.1 with some more performance improvements (#96) that should make the run time more consistent: http://notes.implicit.ly/post/65751699253/sbt-assembly-0-10-1
I'm seeing this issue on Ubuntu (run within VirtualBox on OS X). Assembly typically takes 10+ mins, or just crashes the whole machine. When I run it from with OS X however, compile time is < 30 seconds.
It always hangs whilst/or just after including dependencies. Each include seems to take slightly longer than the one before it. sbt compile
works fine.
Running SBT 0.13.5, Scala 2.10.3 and SBT Assembly 0.11.2.
Not a killer, but odd!
@joeroot I have the same issue of sbt assembly
taking much longer in a VirtualBox guest (even with access to all cores) than the host. Did you ever find a way around this?
That's probably because disk i/o is more expensive inside a VM (those files need to be read in order to determine their SHA1 hashes)
@fblundun: I never found a way around this, instead I run set assembly
from within Mac, which is slightly frustrating.
@iangelov: It could be i/o, but that seems quite extreme. My projects don't have that many files, and I tend to end up crashing, rather than completing very slowly.
One idea that was suggested by @jsuereth was that sbt-assembly try reading into the JAR directly without unzipping first. ymmv/pr welcome on this front.
If you're using vboxfs
(the Vagrant default) there's no way around this. It's slow. You should use nfs
instead.
@eed3si9n I have the exact same problem; my project is barebones, spark, akka, scala 2.11.8 - was there an option to disable calculation of SHA1 that was added? Assembly is taking 774+ seconds
import Dependencies._
resolvers ++= Seq(
Resolver.sonatypeRepo("releases"),
Resolver.sonatypeRepo("snapshots")
)
lazy val root = (project in file(".")).
settings(
inThisBuild(List(
organization := "me.free",
scalaVersion := "2.11.8",
version := "0.1.0-SNAPSHOT"
)),
name := "BondAgent",
libraryDependencies ++= Seq(akkDeps ++ Seq(scalaCheck)
++ Seq(shapelessDep)
++ sparkDeps
++ Seq(redisScala)
++ argonautDeps ++ sparkConflictResolverDeps).flatten
)
assemblyMergeStrategy in assembly := {
case PathList("org", "aopalliance", xs@_*) => MergeStrategy.first
case PathList("javax", "inject", xs@_*) => MergeStrategy.first
case PathList("javax", "servlet", xs@_*) => MergeStrategy.first
case PathList("javax", "activation", xs@_*) => MergeStrategy.first
case PathList("org", "xml-apis", x@_*) => MergeStrategy.first
case PathList("javax", "stax-api", x@_*) => MergeStrategy.first
case PathList("org", "commons-collections", x@_*) => MergeStrategy.first
case PathList("org", "jcl-over-slf4j", x@_*) => MergeStrategy.first
case PathList("org", "apache", xs@_*) => MergeStrategy.first
case PathList("com", "google", xs@_*) => MergeStrategy.first
case PathList("com", "esotericsoftware", xs@_*) => MergeStrategy.first
case PathList("com", "codahale", xs@_*) => MergeStrategy.first
case PathList("com", "yammer", xs@_*) => MergeStrategy.first
case "about.html" => MergeStrategy.rename
case PathList("META-INF", xs@_*) => MergeStrategy.discard
case "META-INF/ECLIPSEF.RSA" => MergeStrategy.first
case "META-INF/mailcap" => MergeStrategy.first
case "META-INF/mimetypes.default" => MergeStrategy.first
case "plugin.properties" => MergeStrategy.first
case "log4j.properties" => MergeStrategy.first
case "overview.html" => MergeStrategy.first
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
assemblyOption in assembly := (assemblyOption in assembly).value//.copy(includeScala = false)
assemblyJarName in assembly := "BondAgent.jar"
This is my assembly sbt -
import Versions._
import sbt._
object Dependencies {
lazy val scalaCheck: ModuleID = "org.scalacheck" %% "scalacheck" % "1.12.2" % "test"
lazy val akkaStream: ModuleID = "com.typesafe.akka" %% "akka-stream" % akkaStreamV
lazy val akkaActor: ModuleID = "com.typesafe.akka" %% "akka-actor" % akkaV
lazy val akkaSlf4j: ModuleID = "com.typesafe.akka" %% "akka-slf4j" % akkaV
lazy val akkDeps = Seq(akkaStream, akkaActor, akkaSlf4j)
lazy val sparkCore: ModuleID = "org.apache.spark" %% "spark-core" % sparkCoreV
lazy val sparkSql: ModuleID = "org.apache.spark" %% "spark-sql" % sparkCoreV
lazy val sparkDeps = Seq(sparkCore.exclude("org.eclipse.jetty.orbit", "javax.servlet").
exclude("org.eclipse.jetty.orbit", "javax.transaction").
exclude("org.eclipse.jetty.orbit", "javax.mail").
exclude("org.eclipse.jetty.orbit", "javax.activation").
exclude("commons-beanutils", "commons-beanutils-core").
exclude("commons-collections", "commons-collections").
exclude("commons-collections", "commons-collections").
exclude("xml-apis", "xml-apis").
exclude("javax.xml.stream", "stax-api")
exclude("com.esotericsoftware.minlog", "minlog"), sparkSql.exclude("org.eclipse.jetty.orbit", "javax.servlet").
exclude("org.eclipse.jetty.orbit", "javax.transaction").
exclude("org.eclipse.jetty.orbit", "javax.mail").
exclude("org.eclipse.jetty.orbit", "javax.activation").
exclude("commons-beanutils", "commons-beanutils-core").
exclude("commons-collections", "commons-collections").
exclude("commons-collections", "commons-collections").
exclude("xml-apis", "xml-apis").
exclude("javax.xml.stream", "stax-api")
exclude("com.esotericsoftware.minlog", "minlog"))
lazy val funcDeps: ModuleID = "org.typelevel" %% "cats-core" % catsCoreV
lazy val shapelessDep: ModuleID = "com.chuusai" %% "shapeless" % shapelessV
lazy val hadoopClientDep: ModuleID = "org.apache.hadoop" % "hadoop-client" % hadoopCV
lazy val parquetClient: ModuleID = "org.apache.parquet" % "parquet" % parquetV
lazy val redisScala: ModuleID = "com.github.etaty" %% "rediscala" % redisScalaV
lazy val argonautDeps: Seq[ModuleID] = Seq("io.argonaut" %% "argonaut").map(_ % argonautV)
lazy val sparkConflictResolverDeps: Seq[ModuleID] = Seq("com.fasterxml.jackson.core" % "jackson-core" % "2.8.7",
// test deps
"com.fasterxml.jackson.core" % "jackson-databind" % "2.8.7",
"com.fasterxml.jackson.core" % "jackson-annotations" % "2.8.7",
"com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.8.7",
// With spark, to resolve java.lang.ClassNotFoundException: org.w3c.dom.ElementTraversal (error)
//"xml-apis" % "xml-apis" % "1.4.01",
// With spark, to resolve large stack trace on java.lang.ClassNotFoundException: de.unkrig.jdisasm.Disassembler (warning)
"org.codehaus.janino" % "janino" % "3.0.7")
}
Not sure how to make it fast but it is affecting my development timelines - any help, suggestions, rants are welcome...
@AnirudhVyas are you building in a VM?
no - build on macbook os x
@AnirudhVyas Do you have SSD drive on your MacBook? See https://github.com/sbt/sbt-assembly#other-things, and try turning off any of the optional features and see if it helps your situation.
I have an ssd, let me see thanks. Will report back...
not much difference - well I gained about 100 seconds if thats a consolation ... do you know where the bottleneck is? it seems to be taking a while in packaging... Honestly I have been using this awesome plugin for a while, but I never used it with spark, I used to tar up, thought I'd give it a try while putting it along with sbt on a fresh project - something is missing ...
ok for now I made my app a spark submit - i was using a single package for a reason, with provided spark dependencies, less things to deal with, it completes in 180s ... very doable i think. But it would be great if know where the bottleneck was/is so that perhaps it could be improved? I am not implying that you haven't tried already - my apologies if I come across as blunt...
I personally haven't run into performance issues so I haven't really looked at current bottlenecks. If there's some specific issue around, feel free to create a sample project that someone could look into.
I've played a bit with sbt-assembly 0.14.7-SNAPSHOT.
Changing this line:
(for(jar <- libsFiltered.par) yield {
to this line:
(for(jar <- libsFiltered) yield {
increased packaging speed 3x times (from 1544 seconds to 571 seconds).
My fat jar is about 130 MiB. My computer has SSD. Could anyone test this solution on own project?
Required steps:
1) git clone git@github.com:EnverOsmanov/sbt-assembly.git
2) cd ./sbt-assembly
3) sbt publishLocal
4) Set in your project sbt-assembly plugin's version to 0.14.7-SNAPSHOT
@eed3si9n
I'm sorry, I can't share my work project. I've created sample project using some blogpost from the internet. Assembly takes:
libsFiltered.par
=> 588-634 seconds
libsFiltered
=> 320-402 seconds
Fat jar size is 96 MiB.
UPD: on HDD it works much faster, less than 60 seconds.
I guess creating >60k files while unpacking puts major stress on the file system. Doing that non-sequentially by adding par
probably makes things rather worse...
FYI, having this problem and resolution was turning off anti-virus software (mac OS)
In case anyone else has to feel this pain
hello from 2019. I had this exact problem (yes,old project old version of sbt, but still) and doing sbt clean
followed by assembly
solved it.
YMMV
I'm on sbt 0.12.01
and macOS
. Every once in a while, the assembly
goal simply hangs on the packaging jar
step. Will upgrading my sbt
help?
any updates on this thread it's hanging for long time in the ubuntu 18.04 LTS @karthicks @noahlz @devmanhinton ?
Wowwow found my way back to this issue via google. I have no insight except trying what past me did (clean assembly)
I am not sure if I am the only one who is facing this problem but Packaging takes hell lot of time. May be I am missing some settings or something. Last these three steps are the slowest, build stucks here for long long time.
Please let me know if anyone need more info ?