zio / zio-quill

Compile-time Language Integrated Queries for Scala
https://zio.dev/zio-quill
Apache License 2.0
2.15k stars 346 forks source link

OrientDB and Cassandra use lot of memory during builds, Causing Timeouts #1360

Closed deusaquilus closed 5 years ago

deusaquilus commented 5 years ago

This template isn't a strict requirement to open issues, but please try to provide as much information as possible.

Version: 3.0.2-SNAPSHOT Module: All Database: All

After instrumenting some memory metrics, it looks like OrientDB is taking 3GB in the build and Cassandra is taking almost 2 GB in the build:

===== System Memory Stats Start =====
         0.00Mb %MEM  SIZE %MEM CMD
      1720.14Mb 7.6 1761420  7.6 /opt/mssql/bin/sqlservr
      1845.16Mb 6.4 1889444  6.4 java -Xloggc:/var/log/cassandra/gc.log -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:+AlwaysPreTouch -XX:-UseBiasedLocking -XX:+UseTLAB -XX:+ResizeTLAB -XX:+UseNUMA -XX:+PerfDisableSharedMem -Djava.net.preferIPv4Stack=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSWaitDuration=10000 -XX:+CMSParallelInitialMarkEnabled -XX:+CMSEdenChunksRecordAlways -XX:+CMSClassUnloadingEnabled -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M -Xms256m -Xmx256m -Xmn64m -XX:+UseCondCardMark -XX:CompileCommandFile=/etc/cassandra/hotspot_compiler -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar -Dcassandra.jmx.local.port=7199 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password -Djava.library.path=/usr/share/cassandra/lib/sigar-bin -Dcassandra.libjemalloc=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 -XX:OnOutOfMemoryError=kill -9 %p -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra -Dcassandra.storagedir=/var/lib/cassandra -Dcassandra-foreground=yes -cp /etc/cassandra:/usr/share/cassandra/lib/HdrHistogram-2.1.9.jar:/usr/share/cassandra/lib/ST4-4.0.8.jar:/usr/share/cassandra/lib/airline-0.6.jar:/usr/share/cassandra/lib/antlr-runtime-3.5.2.jar:/usr/share/cassandra/lib/asm-5.0.4.jar:/usr/share/cassandra/lib/caffeine-2.2.6.jar:/usr/share/cassandra/lib/cassandra-driver-core-3.0.1-shaded.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.9.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/commons-math3-3.2.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrent-trees-2.4.0.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/ecj-4.4.2.jar:/usr/share/cassandra/lib/guava-18.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.0.6.jar:/usr/share/cassandra/lib/hppc-0.5.4.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.13.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.13.jar:/usr/share/cassandra/lib/jamm-0.3.0.jar:/usr/share/cassandra/lib/javax.inject.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jcl-over-slf4j-1.7.7.jar:/usr/share/cassandra/lib/jctools-core-1.2.1.jar:/usr/share/cassandra/lib/jflex-1.6.0.jar:/usr/share/cassandra/lib/jna-4.2.2.jar:/usr/share/cassandra/lib/joda-time-2.4.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/jstackjunit-0.0.1.jar:/usr/share/cassandra/lib/libthrift-0.9.2.jar:/usr/share/cassandra/lib/log4j-over-slf4j-1.7.7.jar:/usr/share/cassandra/lib/logback-classic-1.1.3.jar:/usr/share/cassandra/lib/logback-core-1.1.3.jar:/usr/share/cassandra/lib/lz4-1.3.0.jar:/usr/share/cassandra/lib/metrics-core-3.1.5.jar:/usr/share/cassandra/lib/metrics-jvm-3.1.5.jar:/usr/share/cassandra/lib/metrics-logback-3.1.5.jar:/usr/share/cassandra/lib/netty-all-4.0.44.Final.jar:/usr/share/cassandra/lib/ohc-core-0.4.4.jar:/usr/share/cassandra/lib/ohc-core-j8-0.4.4.jar:/usr/share/cassandra/lib/reporter-config-base-3.0.3.jar:/usr/share/cassandra/lib/reporter-config3-3.0.3.jar:/usr/share/cassandra/lib/sigar-1.6.4.jar:/usr/share/cassandra/lib/slf4j-api-1.7.7.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.1.1.7.jar:/usr/share/cassandra/lib/snowball-stemmer-1.3.0.581.1.jar:/usr/share/cassandra/lib/stream-2.5.2.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/apache-cassandra-3.11.4.jar:/usr/share/cassandra/apache-cassandra-thrift-3.11.4.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar: org.apache.cassandra.service.CassandraDaemon
         4.97Mb 5.6  5088  5.6 xe_mman_XE
      3545.14Mb 3.2 3630224  3.2 /usr/lib/jvm/java-1.8-openjdk/bin/java -server -Xms2G -Xmx2G -Djna.nosys=true -XX:+HeapDumpOnOutOfMemoryError -Djava.awt.headless=true -Dfile.encoding=UTF8 -Drhino.opt.level=9 -Djava.util.logging.manager=com.orientechnologies.common.log.ShutdownLogManager -Djava.util.logging.config.file=/orientdb/config/orientdb-server-log.properties -Dorientdb.config.file=/orientdb/config/orientdb-server-config.xml -Dorientdb.www.path=/orientdb/www -Dorientdb.build.number=3.0.x@r4a3b7acf5bdffc997f786197a6f896f8d3f16604; 2018-11-21 12:23:22+0000 -cp /orientdb/lib/orientdb-server-3.0.11.jar:/orientdb/lib/*:/orientdb/plugins/* com.orientechnologies.orient.server.OServerMain
        67.72Mb 3.1 69344  3.1 xe_j003_XE
        32.88Mb 2.9 33668  2.9 xe_mmon_XE
        27.72Mb 2.7 28388  2.7 xe_j005_XE
        11.72Mb 2.6 12004  2.6 xe_j000_XE
        27.73Mb 2.6 28400  2.6 xe_j00a_XE
===== System Memory Stats End =====

For an embedded container, it probably needs only small fraction of that. This is causing the build to stall on GC pause:

No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received

The build has been terminated

The default memory settings of OrientDB need to be decreased.

@getquill/maintainers

deusaquilus commented 5 years ago

After a bunch of investigation

deusaquilus commented 5 years ago

Although decreasing pointer space with -XX:CompressedClassSpaceSize does indeed stop the build from timing out due to an excess of GC, the build still fails because of slowness as it runs into the 50 minute Travis limit. Looks like the only way to move forward is to restructure the build, separating out Cassandra, Orient, (and maybe Spark), into their own build stage.

deusaquilus commented 5 years ago

Fixed via #1356