jackaudio / new-session-manager

Assists music production by grouping standalone programs into sessions. Community version of "Non Session Manager".
https://new-session-manager.jackaudio.org
GNU General Public License v3.0
80 stars 10 forks source link

Fragile session lookup #56

Closed theGreatWhiteShark closed 3 years ago

theGreatWhiteShark commented 4 years ago

Since a while I encountered the strange phenomena that sometimes all session are listed in the nsm-legacy-gui (as well as in non-session-manager) and sometimes only a small subset.

After some digging I found the problem to be some symbolic links created by the Non-Timeline. They are pointing to files located on a hard disk which is not mounted all the time. If missing, the dead links cause the ftw function in list_file in nsmd.cpp to abort and return a -1.

How about catching the return value and triggering another more elaborated and maybe less efficient search instead? I'm not a native C++ person and does not know which libraries do you intend to use, but I could give it a try myself.

diovudau commented 4 years ago

Good catch. Symlinks to external files are not only used in Non-Timeline but in many programs. The API docs require that for external samples etc.

It is a general problem how to react to a "misconfigured" system. The threshold is not a hard one but a range of problems. Should NSM detect if your screen is switched on? Of course not. Should NSM detect if all files in the session are available because you did not connect an external hard-drive? You could say "no, that is just the problem of the user/system" or argue with: that is part of the session. In this case I am leaning towards improving the checks in NSM because it costs not much and provides more information to the user and prevents broken sessions.

To be continued...

theGreatWhiteShark commented 4 years ago

I would also tend towards checks in NSM because it does not only affect bricked sessions but all others visited by ftw() afterwards. Also e.g. Non-Timeline has no problems with missing links.

But even this is no issue covered by nsmd itself, it would be nice to have at least a descriptive error message (although this also need some extra code since ftw() seems to just return -1 on a broken symlink without calling the list_file handler function of it. So, one still needs to determine which session it does belong to).

diovudau commented 4 years ago

One might ask why parsing the sessions(!) even lead to this problem. The sessions are just directories which contain a file "session.nsm". Once you find one you don't need to look for any other files, and thus not try to follow any symlinks.

It turns out this problem is actually related to another one, which will solve everything smoothly: https://github.com/linuxaudio/new-session-manager/issues/38

Once this is fixed the search will not try to look at all files recursively.

And, as a bonus, we can do an additional check for broken symlinks or mounts and return a proper error message.

theGreatWhiteShark commented 4 years ago

In the current implementation session.nsm has to be a regular file (and no symbolic link). How about the session folders? Does the standard allow them to be links? If not, one could omit symlinks altogether and just focus on regular files and directories.

diovudau commented 4 years ago

Session directories can be symlinks (not tested). I think with the proposed changes this will be much faster and more stable already and we'll see from there how to go forward.

diovudau commented 3 years ago

I am currently looking into this. No code changes commited yet but some research results that seem to contradict what was submitted by @theGreatWhiteShark on Oct 17 2020.

All of these can be symlinked into the session without any problems or preformance losses (eg wait times: the complete session root, a complete session directory, just the file session.nsm

diovudau commented 3 years ago

@theGreatWhiteShark I have upgraded the session discovery system in git. Would you be to kind to do your tests again and report if anything changed, for better or worse?

If you have a misbehaving session it would be good to upload that for my testing. Create the most minimal not-working session please. But if that is too much work your original session is also fine.

theGreatWhiteShark commented 3 years ago

Okay. The bug was not due to a broken symlink by itself but to a symlink pointing at a piece of "broken hardware". Steps to reproduce:

  1. Mount a usb drive via pmount, e.g. pmount /dev/sdb label
  2. Send computer into hibernation and unplug the drive
  3. ls /media/label will now give ls: reading directory '/media/label/': Input/output error
  4. Add a symlink pointing to e.g. /media/label/sample.wav to one of the sessions (preferably first one in terms of alphabetic order)
  5. Invoke listing of sessions

I can confirm that broken symlinks in general do not pose a problem. So, this issue might be too particular to address.

theGreatWhiteShark commented 3 years ago

I have upgraded the session discovery system in git.

@diovudau With the 2722f6585456d77f19f81bc6bf4326c30cf75158 the problem is not solved. No sessions are listed. But in addition nsmd dies on me unexpectedly.

I'm using this script

#!/bin/bash

FOLDER="test_nsm_test"
PORT=9349

nsmd --session-root $PWD/$FOLDER --osc-port $PORT &
PID=$!

## Create a couple of new sessions.
for ii in {0..20}; do
    RANDOM_NAME=$(base32 /dev/random | head -1)
    oscsend localhost $PORT /nsm/server/new s ${RANDOM_NAME:0:10}
done

## This displays the whole list of sessions
oscsend localhost $PORT /nsm/server/list

## Introducing the culprit
SESSIONS=$(ls $FOLDER)
for ii in $SESSIONS;do
    mkdir $FOLDER/$ii/client/
    ln -s /media/black/nonexisting/ $FOLDER/$ii/
done

## Broken lookup
oscsend localhost $PORT /nsm/server/list

And do receive this output

[nsmd] ../src/nsmd.cpp:2544 main(): Using OSC port 9349 [nsmd] ../src/nsmd.cpp:2603 main(): Session root is: /home/phil/tmp/test_nsm_test [nsmd] ../src/Endpoint.cpp:205 init(): Creating OSC server NSM_URL=osc.udp://abyzou:9349/ [nsmd] ../src/nsmd.cpp:1634 osc_new(): Creating new session "QUBBL2XPMC" [nsmd] ../src/nsmd.cpp:1171 command_all_clients_to_save(): Commanding attached clients to save. [nsmd] ../src/nsmd.cpp:639 wait_for_replies(): Waiting for clients to reply to commands [nsmd] ../src/nsmd.cpp:653 wait_for_replies(): Done waiting [nsmd] ../src/nsmd.cpp:1133 wait_for_killed_clients_to_die(): Waiting for killed clients to die. [nsmd] ../src/nsmd.cpp:1162 wait_for_killed_clients_to_die(): All clients have died. [snip] [nsmd] ../src/nsmd.cpp:1716 osc_list(): Listing sessions [nsmd] ../src/nsmd.cpp:2691 main(): Our parent PID changed from 50419 to 1, which indicates a possible GUI crash. The user has no control over the session anymore. Trying to shut down cleanly. [nsmd] ../src/nsmd.cpp:2474 handle_signal_clean_exit(): Caught SIGNAL 0. Stopping nsmd. [nsmd] ../src/nsmd.cpp:1133 wait_for_killed_clients_to_die(): Waiting for killed clients to die. [nsmd] ../src/nsmd.cpp:1162 wait_for_killed_clients_to_die(): All clients have died. munmap_chunk(): invalid pointer

diovudau commented 3 years ago

There are a few problems with this script as testing ground. Before you read the following let me say that nsmd async OSC messaging and the client/server/gui architecture can make things complicated. I only see what is happening here because I wrote multiple GUIs (including prototypes), rewrote the API document and had my nose in the nsmd code for nearly a year now.

From your log and my own run of this script everything is correct. In fact, it is even confirming that the recent code changes indeed fix a real problem.

P.S. munmap_chunk(): invalid pointer is an unrelated problem

diovudau commented 3 years ago

Correction: I've read through the code once more and /nsm/server/list might actually not be a GUI protocol but return to sender.

The not listing problem here is that you are not a an OSC server and you do not listen for replies at all.

theGreatWhiteShark commented 3 years ago

There are a few problems with this script as testing ground

Nah. Sorry, I probably should have used some more words to express what I was after. This script was not intended to serve as a unit test but just some lines of codes that worked flawlessly at 03e637f85455e3287120d3df50e2aa838d77d910 but failed at 2722f6585456d77f19f81bc6bf4326c30cf75158. And it was this change in state I wanted to report to you. However, since it was initially intended to reproduce the reported bug and I forgot to remove the comments, this only could lead to confusions as there is no "culprit". Sorry for that :/

"#This displays the whole list of sessions" will not do that. The list is sent to the registered GUI, in this case there is none.

I'm totally puzzled right now. I'm a 100% sure I have seen the listing of the session to std::out in the log of nsmd started in the beginning of the script. Well, most probably I messed this one up and had still some debugging code in there without remembering. When resetting to the latest tip of master it was lost and the output different. Probably. Well, anyway, WTF. Sorry for the noise. :sweat_smile: