ravelox / pimidi

Raspberry Pi RTP MIDI
162 stars 38 forks source link

Data race accessing context sequence #56

Closed mganss closed 4 years ago

mganss commented 4 years ago

It's not easy to repro this one, but here's a data race reported by AddressSanitizer:

==================
WARNING: ThreadSanitizer: data race (pid=4707)
  Write of size 4 at 0x7b140000a470 by thread T2:
    #0 net_ctx_increment_seq /home/mganss/pimidi/raveloxmidi/src/net_connection.c:419 (raveloxmidi+0x10a4f)
    #1 net_distribute_midi /home/mganss/pimidi/raveloxmidi/src/net_distribute.c:157 (raveloxmidi+0x15e80)
    #2 net_socket_read /home/mganss/pimidi/raveloxmidi/src/net_socket.c:360 (raveloxmidi+0xa3dc)
    #3 net_socket_alsa_listener /home/mganss/pimidi/raveloxmidi/src/net_socket.c:620 (raveloxmidi+0xae7b)
    #4 <null> <null> (libtsan.so.0+0x296ad)

  Previous read of size 4 at 0x7b140000a470 by main thread:
    #0 applemidi_feedback_responder /home/mganss/pimidi/raveloxmidi/src/applemidi_feedback.c:66 (raveloxmidi+0x594b)
    #1 net_socket_read /home/mganss/pimidi/raveloxmidi/src/net_socket.c:315 (raveloxmidi+0xad46)
    #2 net_socket_fd_loop /home/mganss/pimidi/raveloxmidi/src/net_socket.c:508 (raveloxmidi+0xb817)
    #3 main /home/mganss/pimidi/raveloxmidi/src/raveloxmidi.c:118 (raveloxmidi+0x3aae)

  Location is heap block of size 72 at 0x7b140000a460 allocated by main thread:
    #0 calloc <null> (libtsan.so.0+0x2afc3)
    #1 net_ctx_create /home/mganss/pimidi/raveloxmidi/src/net_connection.c:153 (raveloxmidi+0xfece)
    #2 net_ctx_register /home/mganss/pimidi/raveloxmidi/src/net_connection.c:305 (raveloxmidi+0x104d4)
    #3 applemidi_inv_responder /home/mganss/pimidi/raveloxmidi/src/applemidi_inv.c:68 (raveloxmidi+0x540c)
    #4 net_socket_read /home/mganss/pimidi/raveloxmidi/src/net_socket.c:300 (raveloxmidi+0xada5)
    #5 net_socket_fd_loop /home/mganss/pimidi/raveloxmidi/src/net_socket.c:508 (raveloxmidi+0xb817)
    #6 main /home/mganss/pimidi/raveloxmidi/src/raveloxmidi.c:118 (raveloxmidi+0x3aae)

  Thread T2 (tid=4710, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x2bcee)
    #1 net_socket_alsa_loop /home/mganss/pimidi/raveloxmidi/src/net_socket.c:633 (raveloxmidi+0xb56d)
    #2 main /home/mganss/pimidi/raveloxmidi/src/raveloxmidi.c:116 (raveloxmidi+0x3aa9)

SUMMARY: ThreadSanitizer: data race /home/mganss/pimidi/raveloxmidi/src/net_connection.c:419 in net_ctx_increment_seq
==================

The line numbers might be off because this was running on my development branch. applemidi_feedback.c:66 is this:

if( feedback->rtp_seq[1] >= ctx->seq )
{
    logging_printf( LOGGING_DEBUG, "applemidi_feedback_responder: Resetting journal\n" );
    net_ctx_journal_reset(ctx);
}

I'm not sure if this is the right way to go, but I think it could be fixed by pushing the test into net_ctx_journal_reset and inside a lock there plus adding a lock around the access to ctx->seq in net_ctx_increment_seq.

ravelox commented 4 years ago

This might be addressed by your pull request #53 that I applied. Can you see if you can repro this with the latest experimental branch?

mganss commented 4 years ago

Yes, I can still repro. I'll try and make it reproducible through a script.

mganss commented 4 years ago

Here's a repro script. I have rtpMIDI on Windows connected to raveloxmidi and alsa.input_device configured. Compile with -fsanitize=thread and run the script specifying the port number (no arguments lists port numbers).

import sys
import time
import rtmidi

midiout = rtmidi.MidiOut()

if len(sys.argv) == 1:
    available_ports = midiout.get_ports()
    for i, p in enumerate(available_ports):
        print(f"#{i}: {p}")
else:
    midiout.open_port(int(sys.argv[1]))

    with midiout:
        # channel 1, middle C, velocity 112
        note_on = [0x90, 60, 112]
        note_off = [60, 0]
        for i in range(10_000):
            if ((i % 100) == 0):
                print(f"{i} notes sent")
            midiout.send_message(note_on)
            time.sleep(0.01)
            midiout.send_message(note_off)
            time.sleep(0.01)

del midiout

If you run this against the current experimental branch, you'll get other warnings, too, which are due to the concurrent access to packet in net_socket.c (see #51).

ravelox commented 4 years ago

I can't repro this using the script. I see the data race relating to the concurrent access to packet but I don't see the warning you detail above.

mganss commented 4 years ago

What's your setup?

I have Linux (Ubuntu 18.04 x64) running in a Hyper-V VM on the same machine as rtpMIDI on Windows (host).

ravelox commented 4 years ago

I'm running Ubuntu 18.0.4 in a VMware Fusion VM ( 2 cores / 4GB memory ). My apologies. I misread the instructions in your post about the script. I didn't have anything connected so the code wasn't calling the sending routines. I used MacOS Audio MIDI Setup and connected to raveloxmidi on the same host.

I have found a strange data race message relating to the connection context list inside of a mutex lock so I'm dealing with that. When I have that fixed, I'll test again.

ravelox commented 4 years ago

I've fixed up all the ThreadSanitizer warnings I was seeing and corrected a couple of locking issues. I've also allocated a separate data buffer to each file descriptor to avoid shared resources.

Please test out the latest experimental branch.

I need to do another pass with valgrind to make sure that I'm tidying up everything that I'm allocating but I'll do that once you've reviewed the progress so far.

Thanks

Dave

mganss commented 4 years ago

Verified. Thank you!