magda-io / magda

A federated, open-source data catalog for all your big data and small data
https://magda.io
Apache License 2.0
509 stars 93 forks source link

Indexer pod will exist when re-indexing data #1291

Open jevy-wangfei opened 6 years ago

jevy-wangfei commented 6 years ago

Problem description

Indexer consume really huge memory when re-indexing data, and the indexer pod will exist when it reaches the pod limitation. But when set the pod without limitation on memory and CPU usage, it will still exist with OutOfMemory Error

Problem reproduction steps

Log of existing when reach limitation


I  [INFO] [06/25/2018 03:15:14.169] [indexer-akka.actor.default-dispatcher-32] [akka.actor.ActorSystemImpl(indexer)] Successfully indexed 1 datasets

I  [INFO] [06/25/2018 03:15:14.183] [indexer-akka.actor.default-dispatcher-32] [akka.actor.ActorSystemImpl(indexer)] Successfully indexed 9 datasets

I  [INFO] [06/25/2018 03:15:27.080] [main] [IndexerApp$(akka://indexer)] Starting Indexer

I  [INFO] [06/25/2018 03:15:27.082] [main] [IndexerApp$(akka://indexer)] Log level is INFO

I  [INFO] [06/25/2018 03:15:27.675] [indexer-akka.actor.default-dispatcher-3] [akka.actor.ActorSystemImpl(indexer)] No password specified, starting without XPack

I  [INFO] [06/25/2018 03:15:35.974] [main] [IndexerApp$(akka://indexer)] Listening on 0.0.0.0:80

Log of existing with OutOfMemory Error


I  [INFO] [06/25/2018 08:52:21.601] [indexer-akka.actor.default-dispatcher-13] [akka.actor.ActorSystemImpl(indexer)] Successfully indexed 11 datasets

E  Uncaught error from thread [indexer-akka.actor.default-dispatcher-13] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[indexer]

E  java.lang.OutOfMemoryError: GC overhead limit exceeded
    at akka.stream.impl.fusing.GraphInterpreterShell$$anonfun$interpreter$1.apply(ActorGraphInterpreter.scala:327)
    at akka.stream.impl.fusing.GraphInterpreterShell$$anonfun$interpreter$1.apply(ActorGraphInterpreter.scala:326)
    at akka.stream.stage.GraphStageLogic$$anon$2.invoke(GraphStage.scala:912)
    at akka.stream.stage.GraphStageLogic$SubSinkInlet$$anonfun$5.apply(GraphStage.scala:1018)
    at akka.stream.stage.GraphStageLogic$SubSinkInlet$$anonfun$5.apply(GraphStage.scala:1018)
    at akka.stream.impl.fusing.SubSink$$anon$3.onPush(StreamOfStreams.scala:625)
    at akka.stream.impl.fusing.GraphInterpreter.processPush(GraphInterpreter.scala:747)
    at akka.stream.impl.fusing.GraphInterpreter.execute(GraphInterpreter.scala:649)
    at akka.stream.impl.fusing.GraphInterpreterShell.runBatch(ActorGraphInterpreter.scala:471)
    at akka.stream.impl.fusing.GraphInterpreterShell.receive(ActorGraphInterpreter.scala:423)
    at akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$processEvent(ActorGraphInterpreter.scala:603)
    at akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$shortCircuitBatch(ActorGraphInterpreter.scala:594)
    at akka.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:619)
    at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
    at akka.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:529)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
    at akka.actor.ActorCell.invoke(ActorCell.scala:495)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
    at akka.dispatch.Mailbox.run(Mailbox.scala:224)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

I  [INFO] [06/25/2018 08:52:44.182] [main] [IndexerApp$(akka://indexer)] Starting Indexer

I  [INFO] [06/25/2018 08:52:44.183] [main] [IndexerApp$(akka://indexer)] Log level is INFO

Screenshot / Design / File reference

Screenshot of existing when reach limitation

image

Screenshot of existing with OutOfMemory Error

image

AlexGilleran commented 6 years ago

Thanks for the report @jevy-wangfei - are you able to share the details of what it was trying to index when it did this? This might be related to https://github.com/magda-io/magda/issues/1068

jevy-wangfei commented 6 years ago

As specified in issue #1068 , we modified the MAX_EVENTS to 10 instead of 100, and was trying re-index all of the datasets harvested from about 30 data sources ( ~70K datasets). From the monitor we found that the indexer would consume really a lot memory continuously without release it. Because of the resource limitation of K8S, indexer will be stoped by K8S when it consume 4.6Gb memory. (Enlarge VM memory usage by adding an environment param JAVA_OPTS in indexer helm chart:

env:
    - name: JAVA_OPTS
      value: -Xmx6114M -Xms6114M -XX:+CMSClassUnloadingEnabled -XX:MaxGCPauseMillis=1000
        -XX:+UseG1GC -XX:GCTimeRatio=3 

)

After switching the indexer to your team pre-builded data61/indexer/v0.0.41-0 (limit DAP connector to harvest only 24 distributions), the indexer could index all of our data, but it consumed 4.4GB memory and never releassed.

So, this may be a bug. image