phunkyg / ReaControl24

Control24 digital control surface protocol middleware
GNU General Public License v3.0
7 stars 6 forks source link

Can't quit #17

Closed phunkyg closed 4 years ago

phunkyg commented 4 years ago

error: [Errno 48] Address already in use

There's been a persistent issue with being unable to quit the scripts lately, not sure if something OS changed or whether something got broke.

Basically, the exit was supposed to be sending CTRL+C into the terminal, or sending the process a SIGINT signal (same thing basically).

The latest branches are even worse. Doing this stops processes partially but leaves other bit lying around.

This can mean the port and address combo are still in use when you try to restart resulting in this error.

This leaves a user having to manually kill processes.

A command to do this is suggested in the kill.sh file provided.

This issue is to track a fix.

phunkyg commented 4 years ago

Fixed in DEV_OtherDevices_Refactor. Tested in OSX Mojave. Needs checking in Windows etc. may behave differently.

lazlooose commented 4 years ago

tested on raspberry pi and it does close all threads. But, only after numerous traceback errors. Also something about this update has completely broken communication between reaper and ReaControl.py

phunkyg commented 4 years ago

Thanks lazloos could you share a log file from your test?

phunkyg commented 4 years ago

@lazlooose I found the issue that was prematurely stopping all the OSC traffic. I'm getting really weird results now when trying to test though. Could you try with the latest version?

lazlooose commented 4 years ago

hey @phunkyg I get an immediate traceback when I run it. terminal window:

Network Handler Started
Options {'debug': True, 'network': 'eth0', 'connect': '192.168.0.15:9125', 'logdir': './logs', 'listen': '192.168.0.34:9124'}
Capture Starting
sniffer loop
Packet Received: to:ff ff ff ff ff ff from:00 a0 7e a0 17 fd prot:88 5f bytes:23 c_cnt:0 s_cnt:4004 retry:0 cmd:0x0 nc:1 f0 13 00 70 00 55 f7 5f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
BCAST d: v: u1:f0 13 00 70 00 55 f7 5f 00 00 00 00 00 00 00
Device detected:   at 00 a0 7e a0 17 fd
to:00 a0 7e a0 17 fd from:b8 27 eb e2 65 64 prot:88 5f
Sending Packet of 30 bytes: 00 a0 7e a0 17 fd b8 27 eb e2 65 64 88 5f 00 10 00 00 00 00 00 01 00 00 00 00 00 00 e2 00
Sending Packet of 45 bytes: 00 a0 7e a0 17 fd b8 27 eb e2 65 64 88 5f 00 1f 02 44 00 00 00 02 00 00 00 00 00 00 00 01 f0 13 01 30 19 00 00 00 00 00 00 00 00 00 f7
No client code for this device
device session 1  closing
device session 1  Waiting for thread shutdown
Exception in thread thread_sniffer:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "ReaControl.py", line 378, in run
    self.packet_handler(*pkt)
  File "ReaControl.py", line 415, in packet_handler
    src_session = DeviceSession(self, self.num_sessions, src_mac, bcast_data)
  File "ReaControl.py", line 692, in __init__
    self.start_client()
  File "ReaControl.py", line 618, in start_client
    self.close()
  File "ReaControl.py", line 713, in close
    self.thread_keepalive.join()
AttributeError: 'DeviceSession' object has no attribute 'thread_keepalive'

^C**KeyboardInterrupt
**Closing
NetworkHandler closing

main.log...

asctimes        names   levelnames      threadNames     funcNames       linenod messages
2020-06-12 02:36:57,700 __main__        INFO    MainThread      __init__        444     Network Handler Started
2020-06-12 02:36:57,700 __main__        DEBUG   MainThread      __init__        445     Options {'debug': True, 'network': 'eth0', 'connect': '192.168.0.15:9125', 'logdir': './logs', 'listen': '192.168.0.$
2020-06-12 02:36:57,785 __main__        DEBUG   thread_sniffer  run     370     Capture Starting
2020-06-12 02:36:57,786 __main__        TRACE   thread_sniffer  trace   92      sniffer loop
2020-06-12 02:36:59,836 __main__        TRACE   thread_sniffer  trace   92      Packet Received: to:ff ff ff ff ff ff from:00 a0 7e a0 17 fd prot:88 5f bytes:23 c_cnt:0 s_cnt:4004 retry:0 cmd:0x0 nc:1 f0 $
2020-06-12 02:36:59,838 __main__        TRACE   thread_sniffer  trace   92      BCAST d: v: u1:f0 13 00 70 00 55 f7 5f 00 00 00 00 00 00 00
2020-06-12 02:36:59,839 __main__        INFO    thread_sniffer  __init__        664     Device detected:   at 00 a0 7e a0 17 fd
2020-06-12 02:36:59,841 __main__        TRACE   thread_sniffer  trace   92      to:00 a0 7e a0 17 fd from:b8 27 eb e2 65 64 prot:88 5f
2020-06-12 02:36:59,844 __main__        TRACE   thread_sniffer  trace   92      Sending Packet of 30 bytes: 00 a0 7e a0 17 fd b8 27 eb e2 65 64 88 5f 00 10 00 00 00 00 00 01 00 00 00 00 00 00 e2 00
2020-06-12 02:36:59,845 __main__        TRACE   thread_sniffer  trace   92      Sending Packet of 45 bytes: 00 a0 7e a0 17 fd b8 27 eb e2 65 64 88 5f 00 1f 02 44 00 00 00 02 00 00 00 00 00 00 00 01 f0 13 $
2020-06-12 02:36:59,846 __main__        ERROR   thread_sniffer  start_client    616     No client code for this device
2020-06-12 02:36:59,847 __main__        INFO    thread_sniffer  close   708     device session 1  closing
2020-06-12 02:36:59,848 __main__        DEBUG   thread_sniffer  close   712     device session 1  Waiting for thread shutdown
2020-06-12 02:38:12,178 __main__        INFO    MainThread      close   475     NetworkHandler closing
2020-06-12 02:38:12,179 __main__        INFO    MainThread      close   485     NetworkHandler closed

that's all I have it didn't create a ReaOscsession log for this session. the previous version I tested with no communication had both logs.

phunkyg commented 4 years ago

That is REALLY WEIRD

Somehow your ProControl is sending sysex as a broadcast packet instead of to a host it has a session with. I've never seen that kind of traffic before.

Suggest turning everything off and on again and giving it another try.

lazlooose commented 4 years ago

Ok so tried it again and getting different behaviour this time it is connecting but is still messed up:

first time it connected but as soon as I opened reaper and reaper started sending osc messages the proC went offline, but you could see that ReaControl.py was receiving the osc messages from prcoontrol:

main.log.12_06.22_08.txt ReaOscsession.log.12_06.22_08.txt

second time proC stayed connected but faders were jumping around (they would stay in one spot for a sec and then jump to another). Moving the faders on the proC or in Reaper seemed to trigger this behaviour to start but the faders don't follow either they just do their own thing and bounce around.

ReaOscsession.log.12_06.22_17.txt main.log.12_06.22_17.txt

I cannot kill from the terminal by control+c I have to open another ssh window and run ./kill.sh

phunkyg commented 4 years ago
/track/1/volume/db [f] [-150.0]
ReaException: Track 0 has no cmdclass volume
 /track/2/volume/db [f] [-150.0]
ReaException: Track 1 has no cmdclass volume

This seems to be the main thing being thrown.

edit: So I've had a search for this OSC address and it down't show in the code anywhere. Is it possible you have mixed up the OSC file from another release?

The other thing I noticed was quite often the desk was saying it wasn't getting ACKs quickly enough, they start retrying packets then which just makes things worse! This is mainly because, in debug mode, everything is too slow to keep the comms up properly. This might explain the 'jumpy' behaviour of faders.

You can get a feel for this by starting another session without debug on to see what kind of performance you're getting in a more 'production' style session.

lazlooose commented 4 years ago

did the OSC file change since you added the self quitting procedure because everything was working fine before that and I didn't change anything, but I can double check.

edit : I downloaded the Procontrol.reaperOSC file again and it seems to have resolved the no cmdclass volume issue.

however the weird behaviour persists. I tested it without debugging and that made no difference.

the desk responds very slowly and the displays are kind of scrambled popping on and off here and there you can tell the info is getting relayed but there is something causing it to slowdown extremely.

ReaOscsession.log.14_06.01_22.txt main.log.14_06.01_22.txt

and log from non debug session:

ReaOscsession.log.14_06.01_20.txt main.log.14_06.01_20.txt

phunkyg commented 4 years ago

Very odd. We are going to have to eliminate some basic stuff...

Can you run a 'top' command before, during and after a ReaControl run, see what the CPU usage is like.

What is the ambient network traffic likely to be? is there anything that can be done to ensure the 3 devices are in a quiet segment?

Can you check memory usage and swap activity?

Could you run it on a more powerful machine than the pi? Which pi model is it?

lazlooose commented 4 years ago

It is a pi model 3b. ReaControl is the only thing running on it aside from my samba share which I only installed after the trouble began so I could access my logs folder. I would have to double check which version of raspbian is running but it should be one from roughly 6 months ago when I switched to the pi.

I see where you are going with this line of inquiry and I will run it but I would point out that if I go back to the commit before the quit fix everything works fine, and I never had any indication of similar troubles until that change. It is possibly a debian based conflict though and I could try running it on my hackintosh setup later too.

you said that it quits fine for you on Mac and it has never worked for me on the pi so it might be a debian issue. Also what version of python are you running? perhaps we have different versions??

phunkyg commented 4 years ago

I have made a small edit after running through recent changes. In the last set of changes I left in a couple of changes that might cause this slowdown. Can you give that a spin?

Edit: I think I can see where I have gone wrong here. Re-reading the pcap documentation I may have misunderstood packet buffer timeouts.

Have a go anyway, if that still doesn't work there is another pattern I can try.

lazlooose commented 4 years ago

There has been a slight improvement in the latest version although overall it is still doing the ultra slow response.

the improvement is that when touching a fader on the proC and moving it Reaper follows pretty smoothly.

reverse is not true though, no real response from the proC when you move a fader in Reaper and the pans are doing the weird non responsive thing too.

transport controls work though and button pushes on the pro control are registered but the LEDs (mutes and solos) can take from 8 seconds to infinity to respond.

in all of the sessions I have done eventually it crashes and the ProC goes offline (5-10 min)

although I am no longer getting any tracebacks in my logs anymore.

one behaviour I noticed is that at a certain point Reacontrol.py stops receiving osc commands from Reaper. in this log this happens on line 2892 after that point moving faders in Reaper were no longer registered in the terminal window bu there is no sign of a traceback. the activity you see after was moving a fader on the ProC. ReaOscsession.log.14_06.23_21.txt and accompannying main log: main.log.14_06.23_21.txt

On the brightside I can see that all the displays are wanting to work, except no activity from VUs or clock yet (but a forthcoming edit is required for the clock).

lazlooose commented 4 years ago

I ran top and the highest I had was .24 , .06 , 04 although normally it was .04 , .01, .00 while running ReaControl.py.

as far as network traffic nothing has changed from before when it was working.

MiB Mem :    926.1 total,    599.9 free,     72.5 used,    253.6 buff/cache
MiB Swap:    100.0 total,    100.0 free,      0.0 used.    792.2 avail Mem
lazlooose commented 4 years ago

I went back and into the commits and if I git checkout 6063040324 : for that version communication between ProC and Reeaper is working A1 alll Good.

The commit after is the one that started the weird artefacts which is git checkout 9c4c498c66

https://github.com/phunkyg/ReaControl24/compare/6063040324...9c4c498c66 is where the madness starts.

not sure if that is helpful but I was trying to figure out for myself what was the last good working version so thought I'd share.

phunkyg commented 4 years ago

Yes I think that confirmed the changes around the sniffer loop are the problem.

I can reinstate that part easily enough and see if that puts us back to a more fluid operation, then look for another way to make the loop 'breakable'

phunkyg commented 4 years ago

OK I've bunged the original sniffer loop back in. It's made it un-quittable again, but can you confirm if the jerky/slow behaviour has gone @lazlooose ?

lazlooose commented 4 years ago

yes we are back to lightning fast no glitch communication!

I can now confirm that most of the displays are working except clock and VU I will make a more detailed list of other similar addressing problems in refactor branch addressing issues

phunkyg commented 4 years ago

Thanks @lazlooose I have one more pattern to try to see if we can get to a quitable version

lazlooose commented 4 years ago

cool, let me know when its ready to test.

phunkyg commented 4 years ago

I've pushed what I'm working on. It is a mess right now and still doesn't quite right. But, the sniffer should quit now on CTRL+C or being sent a SIGHUP or SIGINT. Can you see if the stuttering behaviour happens on this version?

phunkyg commented 4 years ago

https://stackoverflow.com/questions/11312525/catch-ctrlc-sigint-and-exit-multiprocesses-gracefully-in-python/35134329#35134329

lazlooose commented 4 years ago

Hey @phunkyg, this version is working fine, no stuttering at all!

ctrl+c still does not kill the osc client though

I get this output from the terminal :

Sending Test message via OSC Client
^C**KeyboardInterrupt
**Closing
NetworkHandler closing
Closing DeviceSession for 00 a0 7e a0 17 fd
DevSess1 MAINUNIT closing
DevSess1 MAINUNIT Waiting for thread shutdown
Sending Test message via OSC Client
DevSess1 MAINUNIT closed
NetworkHandler closed
Sending Test message via OSC Client
NetworkHandler closing
Closing DeviceSession for 00 a0 7e a0 17 fd
DevSess1 MAINUNIT closing
DevSess1 MAINUNIT Waiting for thread shutdown
DevSess1 MAINUNIT closed
NetworkHandler closed
Capture Finished
Sending Test message via OSC Client
Sending Test message via OSC Client
Sending Test message via OSC Client
Sending Test message via OSC Client
Sending Test message via OSC Client
Sending Test message via OSC Client
Sending Test message via OSC Client
...

the sending test message ... continues on until I use kill.sh which outputs:

./kill.sh: line 1: 15767 Killed                  sudo pkill -9 -f ReaControl
./kill.sh: line 2: 15773 Killed                  sudo pkill -9 -f Reacontrol
phunkyg commented 4 years ago

I came back to this morning and fiddled with it. The OSC client stopped cleanly. in OSX but in not entirely sure why or what I changed.

Could you try again, just to see if it is a fluke? I'm trying to find my windows testing vm again!

MM502 commented 4 years ago

I can confirm that the daemon is quitting fine on my OS X machine.

lazlooose commented 4 years ago

Nice Work @phunkyg! quitting cleanly on the pi also and everything is working great, no lag issues!

phunkyg commented 4 years ago

Confirmed in Windows, two CTRL+C kills all processes cleanly!