Closed lukesanantonio closed 4 years ago
Related (and possibly duplicate #852).
Also having the exact same problem after upgrade to 1.0.0-beta2, wasn't there in any previous versions. Running Ubuntu 18.04. There isn't any unusual clipping or distortion to the exported tracks other than very short pieces of silence were they shouldn't be, and waveforms continue as expected after the glitch.
Hi!
I can confirm that there are some general issues with the song export in the current master branch. Beside the problems which are mentioned in this issue, it seems that the UI does not behave properly when a template is selected (the current sample rate / bit depth is not getting updated).
The UI problems are now fixed.
About the behavior which is described in this issue: I wrote a quick test track, similar to your pattern, but with alternating kick/snare beats and exported it with 1.0-beta1 and 1.0-beta2. Then, both files were imported into a DAW. Up to 9.4 seconds, everything works fine. Then, my kick is beeing delayed by what seems to be one eigth note, so i get a "kick kick snare snare" pattern where there was "kick snare" before, when playing both tracks at the same time. This is the relevant part as seen in a DAW:
I will have a deeper look at this..
Thanks @mauser let me know if I can provide any additional debugging information!
Just in case it's worth anything @mauser , I had a look into this earlier and believe I have at least one potential cause, but didn't get as far as a fix. The routine audioEngine_process() tries to acquire the audioengine lock, and if it fails, it will silently return without filling the buffer or advancing the song position. The IO diskwriter will then repeat a buffer until audioEngine_process() successfully acquires the lock and continues, effectively delaying the output by a buffer's duration each time.
That behaviour was introduced as a fix for a deadlock issue and should be vanishingly rare in playback, but frequent in exporting to disk. I didn't look any further than that.
@cme : Great, that saved me a lot of work :) I just had some minutes today to look into the issue, but it seems that the lock is being aquired by PatternEditorRuler::updateEditor. I did a quick fix locally and let the audio engine try for three times to aquire the lock (when an export session is active..), which seemed to solve the export problem for me. It seems that in this situation, the PatternEditorRuler and the DiskWriterDriver are both polling the lock at a quite high frequency, which makes it problematic.
At the moment i do not see an option to remove the lock from the PatternEditorRuler. Throttling the DiskWriterDriver could also help, but it this will be unreliable, since we can't guarantee the stability. Maybe there could be a new return value for the audioEngine_process Method which signals the caller that the processing has been canceled before the data has been processed. If the caller (in this case only the DiskWriterDriver) sees this return value, the audioEngine_process can be called again with the same buffer.
A return value from audioEngine_process and retry was one of my first thoughts (or an extra parameter to allow the caller to decide whether to block for the lock or return on failure).
Does the same possibility of failure to acquire the lock exist during playback to an audio device? It will be lower probability but surely possible given the GUI updates during playback also?
I gave a quick glance over Hydrogen sources and I noticed that AudioEngine_process
tries to acquire lock and gives up instantly if that's not possible. Maybe some timeout would be sufficient? Hydrogen uses pthreads under the hood, so maybe using pthread_mutex_timedlock
might help?
BTW. I have gut feeling that this mutex is acquired in too many places, but I have no science to back that up.
I gave a quick glance over Hydrogen sources and I noticed that
AudioEngine_process
tries to acquire lock and gives up instantly if that's not possible. Maybe some timeout would be sufficient?
This seems a good idea as it allows more graceful behaviour in the case of live playback. The timeout should probably be a parameter to the callback though as it has different constraints for its different uses: for audio driver playback, an acceptable time would be proportional to the buffer length, but for the disk writer an arbitrarily long time would be acceptable.
It might be worth looking again at the original deadlock that needed this fix to see if it's really the right fix?
I gave a quick glance over Hydrogen sources and I noticed that AudioEngine_process tries to acquire lock and gives up instantly if that's not possible. Maybe some timeout would be sufficient?
the audioEngine_process
function only has a very limited amount of time available to perform its actions. Otherwise an XRUN will be produced. Having a timeout might be smoothly working on modern computers but, on the other hand, cause trouble on older or embedded devices. But just a thought and no measurements backing this up.
But also a general remark. As far as I understand the problem occurs only when exporting a song to audio file. And it only occurs because there is both the DiskWriterDriver
and the GUI accessing the patterns. Why not dropping the fancy animation-like behavior of the GUI during export? The progress bar does already do fine job displaying the progress. A check for the driver in PatternEditorRuler::updateEditor
and deciding whether or not to update the GUI might be a more resilient fix.
That's correct. But don't we precisely know how much time is available (buffer size / sample rate), so that timeout can be adjusted accordingly? Critical section in PatternEditorRuler
is not doing that much work:
AudioEngine::get_instance()->lock( RIGHT_HERE );
PatternList *pList = pEngine->getCurrentPatternList();
for (uint i = 0; i < pList->size(); i++) {
if ( m_pPattern == pList->get(i) ) {
bActive = true;
break;
}
}
AudioEngine::get_instance()->unlock();
Why not dropping the fancy animation-like behavior of the GUI during export?
Are we sure that this might not happen outside export session?
But anyway, I think root cause is too much locking going on. Maybe ruler should be driven by events pushed to event queue instead of interacting with AudioEngine directly?
BTW: Here's issue that triggered this issue: Deadlock in ALSA driver. It seems it was reported by me;)
That's correct. But don't we precisely know how much time is available (buffer size / sample rate), so that timeout can be adjusted accordingly? Critical section in
PatternEditorRuler
is not doing that much work:
Absolutely agree here. Failure to acquire that lock should be only momentary. Also it's worth considering the failure modes: missing a buffer due to waiting on a lock is no worse than missing a buffer due to abandoning any attempt to get the lock. In either case, the song's position fails to move on and there will be an audible glitch. What good is a drum machine if it can't keep time? ;)
Why not dropping the fancy animation-like behavior of the GUI during export?
Are we sure that this might not happen outside export session?
I'm pretty sure it can, as the position indicator is still updated in playback. It's unlikely but it's possible so should be considered, especially since adding a time to lock acquisition will almost entirely solve the issue for the playback case.
That's correct. But don't we precisely know how much time is available (buffer size / sample rate), so that timeout can be adjusted accordingly? Critical section in PatternEditorRuler is not doing that much work:
Sure, but we don't know how much time is required to complete a process cycle. When e.g. adding LADSPA effects this time increases quite a lot. Also, because there is just one central lock, there might be other parts of the GUI trying to acquire this lock as well. But don't get me wrong, I like the idea of a short timeout.
Are we sure that this might not happen outside export session?
I'm pretty sure it can, as the position indicator is still updated in playback. It's unlikely but it's possible so should be considered, especially since adding a time to lock acquisition will almost entirely solve the issue for the playback case.
I think so too.
Just for reference: the lock in PatternEditorRuler
was introduced about a year ago. https://github.com/hydrogen-music/hydrogen/issues/714
Also it's worth considering the failure modes: missing a buffer due to waiting on a lock is no worse than missing a buffer due to abandoning any attempt to get the lock. In either case, the song's position fails to move on and there will be an audible glitch. What good is a drum machine if it can't keep time? ;)
In Linux the most common audio driver is JACK, which also includes transport control via a centralized server. Using this driver Hydrogen would still be properly synchronized with all other clients. Also the core concept is that you have a buffer containing audio data, which you move from one application to another. This linkage allows you to build quite powerful pipelines from rather simple UNIX-like programs. But you assume that they will all be done in a certain amount of time. Haven't read too much about this topic but it may have worse effects to not deliver a buffer in time than to deliver a glitch.
That's correct. But don't we precisely know how much time is available (buffer size / sample rate), so that timeout can be adjusted accordingly? Critical section in PatternEditorRuler is not doing that much work:
Sure, but we don't know how much time is required to complete a process cycle. When e.g. adding LADSPA effects this time increases quite a lot.
The real time taken to process a buffer's worth of samples is already tracked in m_fProcessTime. At the start of each buffer's processing, it will contain the time taken to process the previous buffer. Granted this is going to change if the buffer size changes, effects and samples change etc, and will be subject to CPU scheduling noise, but it seems a reasonable first approximation to use, with some appropriate pessimistic factor, such as
double fSlackTime = ( m_fMaxProcessTime - m_fProcessTime * 2.0 ); // allow 2x last process time int nLockTimeout = (int) fSlackTime / 2;
(also hoisting the calculation of nLockTimeout to the start of the function).
Also, because there is just one central lock, there might be other parts of the GUI trying to acquire this lock as well. But don't get me wrong, I like the idea of a short timeout.
Yes, other parts update based on display and user interaction (selecting different patterns, editing) that can happen while playback is running.
If I have time in the next few days I'll implement and test this as well as profiling how long locks are held by the GUI components on a few machines (including Raspi and linux VMs, which should be interesting!)
Wow, that's definitely a constructive discussion :) Let's sum it up!
We have three major proposals:
Use a timeout in audioEngine_process()
when aquiring the lock
Let audioEngine_process()
communicate the problem to DiskwriterDriver,
so that it can re-run audioEngine_process()
until it is successful
Deactivate the UI code which tries do aquire the lock at a higher frequency
In my opinion, all of those parts make sense in their combination. If (for example) only 1. is implemented, there is no guarantee that Export problem won't occur again, if we get the timing wrong. It may also occur that aquiring the lock might fail again after the timeout. Combining 1. and 3. can minimize the probability that the exported file has problems. But i feel that this might be still to weak, since maybe someone else writes a new UI functionality next year which also aquires the lock and the problem will be there again.
The combination with (2) makes the Export use case fail-safe (in my opinion, if i don't have missed a situation).
I would propose the following: Implementing (2) and (3) immediately, since this fixes the Export problem. (3) is not really necessary, but i see no risk there.. I could care for the implementation, as i have done this already locally. This would also give us the change to release rc1 or beta3 shortly (i.e. in the next days).
(1) gives me more headaches :) I have an uneasy feeling about the determination of the timeout. Wouldn't we make things even worse when the timeout is too long? A deeper look at this would be necessary, as @cme already mentioned. But it is also clear that this solutions is the most "powerful".
BTW. I have gut feeling that this mutex is acquired in too many places, but I have no science to back that up.
:smiley:
Honestly, i don't know. But doing a code review of all places where the lock is aquired should be possible. At least for the two occurances in this issue, i do not see a way without the lock (at least without touching the core code).
Wouldn't we make things even worse when the timeout is too long?
I think it couldn't be worse. When timeout is too long and it times out, that means that the mutex was locked by someone else all the time, so there would be xrun anyway. Or am I missing something?
I have to think about this, but it might mean that timeout length is not really relevant, so we might just pick some arbitrary value. Whether we wait loo long or we time out early, we're gonna get missed frame. The real issue is that mutex is held by other parts too long.
I might be answering my own question;) Suppose that UI locks AudioEngine mutex so that current audio frame couldn't be processed in time. There's glitch and we can't do anything about it. But what happens with the next frame? If we time out early enough, will the next frame be played correctly? If we time out for too long wouldn't we "clog" audio thread so that it misses following frames and/or looses timing?
@lukesanantonio :
The fix is now available on the following branch: https://github.com/hydrogen-music/hydrogen/tree/ExportGlitchFixes . Could try to build this branch and see if the problem is gone?
@mauser I'm also working on timed locks (and have some profiling data to share) which will help the potential playback cases. I'll share and pull request shortly if you don't mind.
@mauser I'm able to export without issues!! I did a side by side comparison of 1.0.0-beta1, -beta2 and this branch, and beta1 stayed perfectly in sync with the ExportGlitchFixes branch, while -beta2 got out of sync quickly, as expected.
Some data at https://github.com/cme/hydrogen/blob/timed_locks_profiling/profiling/report.md and a pull request at #878
Hi, good job with profiling! I'm curious, what would be results for smaller buffer sizes, like 128 samples. 1024 samples seems high setting.
Hi, good job with profiling! I'm curious, what would be results for smaller buffer sizes, like 128 samples. 1024 samples seems high setting.
Funnily enough, that's exactly what I was trying to do when I found #879 and #881. The Raspi with ALSA only seems to support a minimum period of 444 frames, where things are much the same. Smaller period/buffer sizes mean it's more likely to hit the wait case, but also the wait times are still very small so even reducing the allowable wait time in line with the smaller available CPU time for each period/buffer seems to allow plenty of time.
Would also be interesting to profile lock instances independently of this to see where the worst cases are.
Hi!
I'm closing this issue now, since the core of the problem is fixed and the pull requests for the locks are also already there.
Hydrogen version * :
Hydrogen 1.0.0-'dc269b98'
(just built frommaster
earlier today) Operating system + version : Ubuntu 20.04 Audio driver + version : Jack 1.9.12, but also tried pulseaudio, alsa driversHello! Just wanted to say thank you for an amazing piece of software!!
I'm having an issue where a song exported from hydrogen does not maintain a consistent tempo, I've been having this issue since I upgraded to 1.0.0-beta2, but the issue still seems present in master (which I compiled earlier today).
Here's how I reproduce it:
Open a new hydrogen project, use bpm 120, add beats on every count, 1,2,3,4, etc. Set this pattern to go for 32 bars in the song. Export song as 44800 Hz wav, linear interpolation. (I've also tried ogg, flac, etc, but the issue persists)
Open audacity, import the hydrogen WAV, hit generate -> rhythm track. I'm using tempo 120 bpm, 4 beats per bar, 32 bars, which sould match the recording exactly.
Okay, so then I play the two files back. They start perfectly in sync, but every 10 bars or so the hydrogen-exported track has a strange, audible glitch which puts a slight delay in the track, resulting in out of sync beats!
I've been reading through some other issues, but can't find any that only apply to exporting the song. And I've used a temporary workaround where I click play in hydrogen at just the right time to sync with LMMS. When I'm playing the pattern directly inside hydrogen it never gets out of sync.
I also noticed these messages in the console when I run hydrogen with
-Vdebug
, they happen for every filetype (ogg, flac, wav) that I've tried exporting!dc269b98458defef4956fe42c4824bf8944012a1