Closed burke closed 10 years ago
If it's helpful I can get the excerpt from marathon's log. Let me know.
For what it's worth:
2014-03-27_17:25:39.19204 java.lang.NullPointerException
2014-03-27_17:25:39.19205 at mesosphere.marathon.state.AppRepository.store(AppRepository.scala:37)
2014-03-27_17:25:39.19206 at mesosphere.marathon.MarathonScheduler$$anonfun$startApp$1.apply(MarathonScheduler.scala:195)
2014-03-27_17:25:39.19207 at mesosphere.marathon.MarathonScheduler$$anonfun$startApp$1.apply(MarathonScheduler.scala:192)
2014-03-27_17:25:39.19207 at scala.concurrent.Future$$anonfun$flatMap$1.apply(Future.scala:251)
2014-03-27_17:25:39.19208 at scala.concurrent.Future$$anonfun$flatMap$1.apply(Future.scala:249)
2014-03-27_17:25:39.19209 at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
2014-03-27_17:25:39.19210 at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
2014-03-27_17:25:39.19211 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
2014-03-27_17:25:39.19211 at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
2014-03-27_17:25:39.19212 at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
2014-03-27_17:25:39.19213 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
2014-03-27_17:25:39.19214 at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2014-03-27_17:25:39.19216
This sometimes happens; sometimes does not. It feels like Marathon decides at boot time whether or not it will present this bug, then sticks to its guns. Sometimes I can start marathon and it will work just fine, other times I restart it and it will fail. No recompilation necessary.
We thought this indicated a JIT bug, so we tried running marathon with the JIT disabled. This didn't affect the bug. We observed both cases with the JIT disabled.
I'm not really sure how many revisions back this goes.
We managed to "fix" the problem by compiling like so:
mvn -DaddJavacArgs=-g:notc -DaddScalacArgs="-g:line" package && ./bin/build-distribution
With these flags, apps can consistently be pushed, unless we include container info. If we do, this error is ALWAYS produced:
2014-03-27_20:37:32.47641 com.fasterxml.jackson.databind.JsonMappingException: No suitable constructor found for type [simple type, class mesosphere.marathon.ContainerInfo]: can not instantiate from JSON object (need to add/enable type information?)
2014-03-27_20:37:32.47641 at [Source: org.eclipse.jetty.server.HttpInput@24ccab01; line: 1, column: 105] (through reference chain: mesosphere.marathon.api.v1.AppDefinition["container"])
2014-03-27_20:37:32.47642 at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:164)
2014-03-27_20:37:32.47642 at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1078)
...many more lines
If we change the flags to -DaddJavacArgs=-g -DaddScalcArgs=-g:notailcalls
, the error from immediately above no longer happens, and the original error happens about 20% of the time.
Relevant:
java version "1.7.0_51"
OpenJDK Runtime Environment (IcedTea 2.4.4) (7u51-2.4.4-0ubuntu0.12.04.2)
OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode)
Linux docker-test1.chi.shopify.com 3.8.0-35-generic #50~precise1-Ubuntu SMP Wed Dec 4 17:25:51 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
@burke and I tried tried Oracle's 7u51 JDK as well, result: same intermittent behaviour.
As this is running on a fairly large box (32 cores) we tried taskset to pin the application to a single core in an attempt to reduce concurrency but still see the same intermittent behaviour.
Thanks for the report guys. I believe that's the same issue that's been puzzling us for a while now.
Jackson has an issue with Scala case classes on JDK7. It uses reflection and JDK7 doesn't guarantee method ordering, so sometimes it doesn't use the default constructor, but one that takes arguments and just passes null
for everything.
There is an issue for Jackson that seems related: https://github.com/FasterXML/jackson-module-scala/issues/117 According to this, behavior should be consistent on JDK6 since it has guaranteed ordering, but it's still failing for us sometimes on some JDK6 versions. Not sure what Jackson is really doing that breaks but a custom deserializer would probably fix it.
WIP branch here: https://github.com/mesosphere/marathon/commits/wip-deserialization-npe
Duplicate of #181, fixed by merge of PR #215. Closing this for now, please comment if this resurfaces.
Wonderful, thanks @ConnorDoyle!
When I POST a new app without
ports
present, I get an error:When I POST a new app with
ports
present, marathon NullPointerExceptions:This happens when I create a new app in the UI as well.
This is occurring on da21cee8bbef0de7533b703642925df36d811e4f, but not ccd648a51b249925e7c5720779169bdb2b46f20a