jepsen-io / jepsen

A framework for distributed systems verification, with fault injection
6.78k stars 714 forks source link

OOM when cycle used in nemesis gen #512

Closed qvad closed 3 years ago

qvad commented 3 years ago

I'm trying to update ignite test up to 0.2.4 jepsen and currently suite hanged and failed with OOM

 :version "2.8.0",
 :transaction-concurrency
 #object[org.apache.ignite.transactions.TransactionConcurrency 0xd54d0f5 "OPTIMISTIC"]}
13:57:41.129 [main] ERROR jepsen.cli - Oh jeez, I'm sorry, Jepsen broke. Here's why:
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3332)
        at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
        at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
        at java.lang.StringBuffer.append(StringBuffer.java:270)
        at java.io.StringWriter.write(StringWriter.java:101)
        at clojure.core$fn__7299.invokeStatic(core_print.clj:124)
        at clojure.core$fn__7299.invoke(core_print.clj:123)

Following code used:

  (->> (gen/mix operations)
       (gen/stagger 1/10)
       (gen/nemesis
        (cycle [(gen/sleep 5)
                {:type :info, :f :start}
                (gen/sleep 1)
                {:type :info, :f :stop}]))
       (gen/time-limit time-limit)))

When I remove cycle from code - it's started working somehow.

Sorry for noob question, but I don't get how this problem can be solved for now.

aphyr commented 3 years ago

Did you have a full stack trace? This one doesn't include any Jepsen code.On Jul 31, 2021 07:03, Dmitry Sherstobitov @.***> wrote: I'm trying to update ignite test up to 0.2.4 jepsen and currently suite hanged and failed with OOM :version "2.8.0", :transaction-concurrency

object[org.apache.ignite.transactions.TransactionConcurrency 0xd54d0f5 "OPTIMISTIC"]}

13:57:41.129 [main] ERROR jepsen.cli - Oh jeez, I'm sorry, Jepsen broke. Here's why: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) at java.lang.StringBuffer.append(StringBuffer.java:270) at java.io.StringWriter.write(StringWriter.java:101) at clojure.core$fn__7299.invokeStatic(core_print.clj:124) at clojure.core$fn__7299.invoke(core_print.clj:123)

Following code used: (->> (gen/mix operations) (gen/stagger 1/10) (gen/nemesis (cycle [(gen/sleep 5) {:type :info, :f :start} (gen/sleep 1) {:type :info, :f :stop}])) (gen/time-limit time-limit)))

When I remove cycle from code - it's started working somehow. Sorry for noob question, but I don't get how this problem can be solved for now.

—You are receiving this because you are subscribed to this thread.Reply to this email directly, view it on GitHub, or unsubscribe.

qvad commented 3 years ago

Oups, sorry Yes, you are right, there is no jepsen code there

13:57:41.129 [main] ERROR jepsen.cli - Oh jeez, I'm sorry, Jepsen broke. Here's why:
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3332)
        at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
        at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
        at java.lang.StringBuffer.append(StringBuffer.java:270)
        at java.io.StringWriter.write(StringWriter.java:101)
        at clojure.core$fn__7299.invokeStatic(core_print.clj:124)
        at clojure.core$fn__7299.invoke(core_print.clj:123)
        at clojure.lang.MultiFn.invoke(MultiFn.java:234)
        at clojure.core$pr_on.invokeStatic(core.clj:3662)
        at clojure.core$pr_on.invoke(core.clj:3656)
        at clojure.core$print_prefix_map$fn__7352.invoke(core_print.clj:233)
        at clojure.core$print_sequential.invokeStatic(core_print.clj:66)
        at clojure.core$print_prefix_map.invokeStatic(core_print.clj:229)
        at clojure.core$print_map.invokeStatic(core_print.clj:238)
        at clojure.core$fn__7381.invokeStatic(core_print.clj:266)
        at clojure.core$fn__7381.invoke(core_print.clj:263)
        at clojure.lang.MultiFn.invoke(MultiFn.java:234)
        at clojure.core$pr_on.invokeStatic(core.clj:3662)
        at clojure.core$pr_on.invoke(core.clj:3656)
        at clojure.core$print_sequential.invokeStatic(core_print.clj:66)
        at clojure.core$fn__7329.invokeStatic(core_print.clj:174)
        at clojure.core$fn__7329.invoke(core_print.clj:174)
        at clojure.lang.MultiFn.invoke(MultiFn.java:234)
        at clojure.core$pr_on.invokeStatic(core.clj:3662)
        at clojure.core$pr_on.invoke(core.clj:3656)
        at clojure.core$print_prefix_map$fn__7352.invoke(core_print.clj:233)
        at clojure.core$print_sequential.invokeStatic(core_print.clj:66)
        at clojure.core$print_prefix_map.invokeStatic(core_print.clj:229)
        at clojure.core$print_map.invokeStatic(core_print.clj:238)
        at clojure.core$fn__7402.invokeStatic(core_print.clj:320)
        at clojure.core$fn__7402.invoke(core_print.clj:317)
        at clojure.lang.MultiFn.invoke(MultiFn.java:234)

I've collected few stacktraces from this "hanged" state and looks like there is an issue in ignite test code. I will try to investigate this for now.

aphyr commented 3 years ago

My suspicion is that something in the test suite is trying to print the generator and hanging because it's infinite. Jepsen does print the generator, but it binds print-len to avoid infinite loops, so it shouldn't hit this. There might be something I missed though!On Jul 31, 2021 09:19, Dmitry Sherstobitov @.***> wrote: Oups, sorry Yes, you are right, there is no jepsen code there 13:57:41.129 [main] ERROR jepsen.cli - Oh jeez, I'm sorry, Jepsen broke. Here's why: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) at java.lang.StringBuffer.append(StringBuffer.java:270) at java.io.StringWriter.write(StringWriter.java:101) at clojure.core$fn7299.invokeStatic(core_print.clj:124) at clojure.core$fn__7299.invoke(core_print.clj:123) at clojure.lang.MultiFn.invoke(MultiFn.java:234) at clojure.core$pr_on.invokeStatic(core.clj:3662) at clojure.core$pr_on.invoke(core.clj:3656) at clojure.core$print_prefix_map$fn7352.invoke(core_print.clj:233) at clojure.core$print_sequential.invokeStatic(core_print.clj:66) at clojure.core$print_prefix_map.invokeStatic(core_print.clj:229) at clojure.core$print_map.invokeStatic(core_print.clj:238) at clojure.core$fn7381.invokeStatic(core_print.clj:266) at clojure.core$fn__7381.invoke(core_print.clj:263) at clojure.lang.MultiFn.invoke(MultiFn.java:234) at clojure.core$pr_on.invokeStatic(core.clj:3662) at clojure.core$pr_on.invoke(core.clj:3656) at clojure.core$print_sequential.invokeStatic(core_print.clj:66) at clojure.core$fn7329.invokeStatic(core_print.clj:174) at clojure.core$fn7329.invoke(core_print.clj:174) at clojure.lang.MultiFn.invoke(MultiFn.java:234) at clojure.core$pr_on.invokeStatic(core.clj:3662) at clojure.core$pr_on.invoke(core.clj:3656) at clojure.core$print_prefix_map$fn7352.invoke(core_print.clj:233) at clojure.core$print_sequential.invokeStatic(core_print.clj:66) at clojure.core$print_prefix_map.invokeStatic(core_print.clj:229) at clojure.core$print_map.invokeStatic(core_print.clj:238) at clojure.core$fn7402.invokeStatic(core_print.clj:320) at clojure.core$fn7402.invoke(core_print.clj:317) at clojure.lang.MultiFn.invoke(MultiFn.java:234)

I've collected few stacktraces from this "hanged" state and looks like there is an issue in ignite test code. I will try to investigate this for now.

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.

qvad commented 3 years ago

Yes, when I removed all prints that may contain generators issue disappears. And yes, it worked in old jepsen code (0.1.11 or smth). Strange.

qvad commented 3 years ago

Problematic lines of code:

  1. pprint in runner.clj log-test (info "Testing\n" (with-out-str (pprint t)))
  2. info print in ignite.clj basic-test (info :opts options)

In first case I just removed pprint call, as for second line I removed it completely/ Haven't spent much time on investigation of this problem yet. For now it's much more interesting to break something in ignite)

aphyr commented 3 years ago

Yeah, that's trying to print an infinitely long data structure. See jepsen.util/pprint-test (IIRC) for a version which won't get into infinite loops.On Jul 31, 2021 12:30, Dmitry Sherstobitov @.***> wrote: Problematic lines of code: pprint in runner.clj log-test (info "Testing\n" (with-out-str (pprint t)))info print in ignite.clj basic-test (info :opts options) In first case I just removed pprint call, as for second line I removed it completely/ Haven't spent much time on investigation of this problem yet. For now it's much more interesting to break something in ignite)

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.