various malloc errors (memory corruption in GC?)

kbucheli commented 4 years ago

The Problem

My program aborts with malloc errors.

Actual Behavior

The script in (source see next section) terminates with with

malloc(): invalid size (unsorted)
Aborted (core dumped)

and the interesting part of the core dump shows:

 $ coredumpctl dump 13235
           PID: 13235 (raku)
           UID: 1000 (konrad)
           GID: 100 (users)
        Signal: 6 (ABRT)
     Timestamp: Tue 2020-06-02 14:57:43 CEST (1h 33min ago)
  Command Line: raku mem_corrupt.raku
    Executable: /usr/bin/rakudo
 Control Group: /user.slice/user-1000.slice/session-2.scope
          Unit: session-2.scope
         Slice: user-1000.slice
       Session: 2
     Owner UID: 1000 (konrad)
       Boot ID: 7f25b80d5967441284d63f7910e447dc
    Machine ID: 27bff91dbe26453e8890335069c7071a
      Hostname: hellespont
       Storage: /var/lib/systemd/coredump/core.raku.1000.7f25b80d5967441284d63f7910e447dc.13235.1591102663000000000000.lz4 (inaccessible)
       Message: Process 13235 (raku) of user 1000 dumped core.

                Stack trace of thread 13268:
                #0  0x00007fe45eb6e355 raise (libc.so.6 + 0x3c355)
                #1  0x00007fe45eb57853 abort (libc.so.6 + 0x25853)
                #2  0x00007fe45ebb1878 __libc_message (libc.so.6 + 0x7f878)
                #3  0x00007fe45ebb8d3a malloc_printerr (libc.so.6 + 0x86d3a)
                #4  0x00007fe45ebbbe1c _int_malloc (libc.so.6 + 0x89e1c)
                #5  0x00007fe45ebbcb4e _int_realloc (libc.so.6 + 0x8ab4e)
                #6  0x00007fe45ebbdf66 realloc (libc.so.6 + 0x8bf66)
                #7  0x00007fe45ef22e11 n/a (libmoar.so + 0x229e11)
                #8  0x00007fe45ef23139 MVM_gc_collect (libmoar.so + 0x22a139)
                #9  0x00007fe45ef1e292 n/a (libmoar.so + 0x225292)
                #10 0x00007fe45ef1f0d5 MVM_gc_enter_from_allocator (libmoar.so + 0x2260d5)
                #11 0x00007fe45ef1f372 MVM_gc_allocate_nursery (libmoar.so + 0x226372)
                #12 0x00007fe45ef1f64f MVM_gc_allocate_frame (libmoar.so + 0x22664f)
                #13 0x00007fe45eefa865 n/a (libmoar.so + 0x201865)
                #14 0x00007fe45eefab02 MVM_frame_invoke (libmoar.so + 0x201b02)
                #15 0x00007fe45ef45061 n/a (libmoar.so + 0x24c061)
                #16 0x00007fe45eee8b8c MVM_interp_run (libmoar.so + 0x1efb8c)
                #17 0x00007fe45ef01bf1 n/a (libmoar.so + 0x208bf1)
                #18 0x00007fe45e9d4422 start_thread (libpthread.so.0 + 0x9422)
                #19 0x00007fe45ec31bf3 __clone (libc.so.6 + 0xffbf3)

before "golfing" it I have seen various other effects, like the process hanging,

malloc(): corrupted unsorted chunks 3
Aborted

or

malloc(): memory corruption
Aborted

Steps to Reproduce

run

#!/usr/bin/env raku

use Prometheus::Client :metrics;
use Prometheus::Client::Exposition :render;

sub MAIN() {

    my $stopped = Promise.new;
    LEAVE { $stopped.keep }

    # prepare access to metrics
    my $*PROMETHEUS = prepare_metrics($stopped);

    my $metric-worker-started = counter(
        name => 'worker_started',
        documentation => 'total number of started worker runs',
        label-names => <worker pick1 pick2>
    );
    my $metric-worker-finished = counter(
        name => 'worker_finished',
        documentation => 'total number of finished worker runs',
        label-names => <worker pick1 pick2>
    );

    # kick off workers
    my @workers = (^20).map: -> $num {
        start {
            while !$stopped {
                my $pick1 = <a b c d e f g h i j>.pick;
                my $pick2 = <q r s t u v w x y z>.pick;
                my $labels = \(:worker($num.Str), :$pick1, :$pick2);
                $metric-worker-started.labels(|$labels).inc;
                sleep rand/10;
                $metric-worker-finished.labels(|$labels).inc;
            }
        }
    };

    # wait for the end of the world
    react {
        for @workers -> $worker {
            whenever $worker {} # forward exceptions
        }
        # finish up properly with signals
        whenever signal(SIGINT,SIGKILL) { done }
    }

    return;
}

sub prepare_metrics(Promise $stopper) {

    my $registry = Prometheus::Client::CollectorRegistry.new;

    use Cro::HTTP::Router;
    use Cro::HTTP::Server;

    my $metrics-service = Cro::HTTP::Server.new(
        :host<localhost>, :port<10003>,
        application => route {
            get -> 'metrics' {content 'text/plain', render-metrics($registry)}
        },
    );
    $metrics-service.start;
    $stopper.then:{  $metrics-service.stop };
    return $registry;
}

which runs fine until we access the web interface to read the metrics several times

$ while curl http://localhost:10003/metrics; do : ; done

It fails the faster the more metric/label dimensions there are.

Environment

Operating system: Manjaro Linux 20.0.2
Compiler version (perl6 -v or raku -v): 2020.05

kbucheli commented 4 years ago

A version without Cro and the need to trigger it from externally:

#!/usr/bin/env raku

use Prometheus::Client :metrics;
use Prometheus::Client::Exposition :render;

sub MAIN() {

    my $stopped = Promise.new;

    LEAVE { $stopped.keep }

    # prepare access to metrics
    my $*PROMETHEUS = Prometheus::Client::CollectorRegistry.new;

    my $metric-worker-started = counter(
        name => 'worker_started',
        documentation => 'total number of started worker runs',
        label-names => <worker pick1 pick2>
    );
    my $metric-worker-finished = counter(
        name => 'worker_finished',
        documentation => 'total number of finished worker runs',
        label-names => <worker pick1 pick2>
    );

    # kick off workers scheduling
    my @workers = (^20).map: -> $num {
        start {
            while !$stopped {
                my $pick1 = <a b c d e f g h i j>.pick;
                my $pick2 = <q r s t u v w x y z>.pick;
                my $labels = \(:worker($num.Str), :$pick1, :$pick2);
                $metric-worker-started.labels(|$labels).inc;
                sleep rand/10;
                $metric-worker-finished.labels(|$labels).inc;
            }
        }
    };

    # render the metrics
    push @workers, start {
        while !$stopped {
            render-metrics($*PROMETHEUS);
            sleep rand/100;
        }
    };

    # wait for the end of the world
    react {
        for @workers -> $worker {
            whenever $worker {} # forward exceptions
        }
        # finish up properly with signals
        whenever signal(SIGINT,SIGKILL) { done }
    }

    return;
}

Often it breaks fast, but sometimes not.

lizmat commented 4 years ago

Feels to me you're accessing $stopped from multiple threads at the same time without concurrency control? Perhaps $stopped doesn't need to be a Promise but could be an atomicint. And then use an atomic fetch to see if the flag has changed and an atomic update on to mark it kept?

kbucheli commented 4 years ago

Promises are used for concurrency control, no? That part runs since two years in production without issues. The problems started after adding render-metrics which creates many intermediate strings and seams to somehow stress out memory management. Could well be that there is a concurrency issue in Prometheus::Client, but this is how it is supposed to be used. But even if there are concurrency issues in the Raku code it should not be able to corrupt the memory on VM level.

lizmat commented 4 years ago

But even if there are concurrency issues in the Raku code it should not be able to corrupt the memory on VM level.

I'm not sure @jnthn agrees.

kbucheli commented 4 years ago

Interesting, it does atomic operations with Real, e.g. in the Counter class. Does this work?

kbucheli commented 4 years ago

Indeed I found a not thread safe part in Prometheus::Client::Metrics::Group:

method labels(*@label-values, *%labels --> Collector) {
    my $labels-key = self!make-labels(@label-values, %labels);
    %!metrics{ $labels-key } //= $.factory.build(...);
}

where %!metrics can be updated by several threads at the same time and the same key. So I changed it to

has Lock::Async $!label-adding-lock = Lock::Async.new;

method labels(*@label-values, *%labels --> Collector) {
    my $labels-key = self!make-labels(@label-values, %labels);
    with %!metrics{ $labels-key } {
        return $_;
    }
    $!label-adding-lock.protect: {
        return %!metrics{ $labels-key } //= $.factory.build(...);
    };
}

and now the memory corruption does not happen any more with my test script. That made me think:

Maybe the render-metrics task does not add to the problem? It does, because when I remove it and use the original version the corruption does not happen.
Maybe it is not fixed yet, I also should lock all the read access to %!metrics, but my test script is just not as bad to again trigger the corruption.

So I come now to the question I wanted to ask to @jnthn at some point when I see him in person: How safe is unprotected concurrent read and write access to arrays and hashes? Safe in the sense of not breaking internals, not in the sense of not creating race conditions or other logical flaws. There are situations where the above hash initialization code is logically safe, e.g. for a cache of constant values. Rarely one value might be calculated twice, but that does not hurt as it is at the end be the same .

jnthn commented 4 years ago

How safe is unprotected concurrent read and write access to arrays and hashes? Safe in the sense of not breaking internals, not in the sense of not creating race conditions or other logical flaws.

At the moment, it can break things pretty badly on the MoarVM backend. That's considered a shortcoming that should be addressed; exceptions or data inconsistencies are acceptable, but the potential to bring the VM down as is achievable today is not wanted. I'm aware of some work in that direction; it's a non-small amount of effort, since there's no way to fix the current hash data structure used, and so a completely different approach is needed. Arrays also need a rather tedious reorganization, and we probably have to accept some curiosities in terms of growth edges being a few elements off the powers of 2 or other nice round numbers (though it's unlikely anyone will notice) or accept an extra indirection (not so desirable).

There are situations where the above hash initialization code is logically safe, e.g. for a cache of constant values. Rarely one value might be calculated twice, but that does not hurt as it is at the end be the same .

Even when the issues around memory safely are resolved, this is probably still not a good thing to rely on. What's really wanted here is a concurrent hash (along the lines of my Concurrent::Stack and Concurrent::Queue modules, which use lock-free algorithms). I did look at the papers on doing one of those; it turns out it's rather involved. However, those do give the kinds of properties you described (allowing benign recalculations).

kbucheli commented 4 years ago

@jnthn thanks a lot for the valuable insights. I have one last question: is it safe to do concurrent read access while adding one value to a hash? Or do I better lock also read access, eg. with ReadWriteLock?

lizmat commented 4 years ago

In the internals, these situations are generally handled by a Lock and the protect method. In HLL code:

my $lock = Lock.new;
my %hash;
sub get_key($key) {
    $lock.protect: { %hash{$key} }
}
sub set_key($key, $value) {
    $lock.protect: { %hash{$key} = $value }
}

lizmat commented 4 years ago

Can this issue be closed now?

kbucheli commented 4 years ago

Below you find a golfed unit test without external modules:

#!/usr/bin/env raku

use Test;

my $THREADS = 20;
my $ROUNDS  = 10000;

{
    my %hash;
    my @workers = (^$THREADS).map: -> $num {
        start {
            for ^$ROUNDS -> $counter {
                %hash{$num + $THREADS * $counter}++;
            }
        }
    };
    await @workers;
}

VM.request-garbage-collection;

pass 'no memory corruption when adding new hash keys concurrently without locking';
done-testing

If the issue can be closed? It is an open and confirmed bug which is only triggered when the programmer misbehaves :-) and thus understandably has little priority. If you wish I can move it over to the MoreVM project. Personally I would keep it open here, because there might also be other Raku programmers getting the same error and might be tempted to open a new bug. Like this it is documented what can be done or where to look closer. If the issue is closed, then it looks like it is not relevant any more. I think it should be closed when above test passes.

kbucheli commented 4 years ago

Also with the new hash implementation I get memory corruption, but differently ;-):

free(): invalid next size (normal)
Aborted (core dumped)

or

MoarVM panic: Internal error: invalid thread ID 65540 in GC work pass

or simply

Segmentation fault (core dumped)

(tested with v2020.10)

lizmat commented 4 years ago

/me pings @nwc10

nwc10 commented 4 years ago

I do not have any insight in how to fix this.

@jnthn seems to have some idea on how to give "wrong" answers without crashing. I don't know enough about MoarVM concurrency (what is guaranteed, what one can get away with) to even know where to start on this.

(Putting locks around every hash lookup isn't going to fly, and my knowledge of "lock free data structures" is just those four words, as a search term. I think that the "single allocation" approach in MoarVM/MoarVM#1360 gets somewhere nearer to helping, as one can change it to "free at safe point", but that alone isn't everything, as it still leaves assumptions about visibility of memory ordering on multicore CPUs, which is not something I'm competent at)

I can't help here (and even work I have done is now stalled on review)

lizmat commented 1 month ago

Is this still an issue?

kbucheli commented 3 weeks ago

v2024.09 brings

$ ./concurrent_unlocked_hash_access.raku 
MoarVM oops: MVM_str_hash_entry_size called with a stale hashtable pointer
   at ./concurrent_unlocked_hash_access.raku:11  (<ephemeral file>:)
 from SETTING::src/core.c/Promise.rakumod:370  (/usr/share/perl6/runtime/CORE.c.setting.moarvm:)
 from SETTING::src/core.c/ThreadPoolScheduler.rakumod:905  (/usr/share/perl6/runtime/CORE.c.setting.moarvm:)
 from SETTING::src/core.c/ThreadPoolScheduler.rakumod:272  (/usr/share/perl6/runtime/CORE.c.setting.moarvm:)
 from SETTING::src/core.c/ThreadPoolScheduler.rakumod:253  (/usr/share/perl6/runtime/CORE.c.setting.moarvm:)
 from SETTING::src/core.c/ThreadPoolScheduler.rakumod:250  (/usr/share/perl6/runtime/CORE.c.setting.moarvm:run-one)
 from SETTING::src/core.c/ThreadPoolScheduler.rakumod:291  (/usr/share/perl6/runtime/CORE.c.setting.moarvm:)
 from SETTING::src/core.c/Thread.rakumod:69  (/usr/share/perl6/runtime/CORE.c.setting.moarvm:THREAD-ENTRY)

$

which is still a LTA error message

MasterDuke17 commented 3 weeks ago

I think ae024a492f6c3ffb75cea0fab23943788d2fe5be was the commit that enabled str hashes to oops instead of segfault when a concurrent write is detected. What it does for str hashes might be able to port to the other types of hashes. Doing so won't improve the above situation, but would at least put all hash types on equal footing.

rakudo / rakudo