takaswie / linux-firewire-dkms

Currently this repository is maintained for Linux firewire subsystem and unit drivers.
http://ieee1394.docs.kernel.org/
39 stars 8 forks source link

Periodic interrupts with MOTU 828mk3 #27

Closed alesandar closed 3 years ago

alesandar commented 4 years ago

I am experiencing an issue with my MOTU 828mk3 (JACK2 with ALSA backend). A periodic xrun occurs every now and then and the interval seems to be dependent on the buffer size.

During all of the following tests, sample rate and nperiods have been set to 192000/3. For some reason I can not set lower sample rate values. [1] Only the buffer size has been changed after each test. Please consider the corresponding intervals.

$ tail -f ~/.log/jack/jackdbus.log | grep alsa_pcm

buffer size: 2048 (~349s interval between xruns):

Thu Apr 30 10:27:38 2020: Jack: **** alsa_pcm: xrun of at least 0.061 msecs
Thu Apr 30 10:33:32 2020: Jack: **** alsa_pcm: xrun of at least 0.042 msecs
Thu Apr 30 10:39:25 2020: Jack: **** alsa_pcm: xrun of at least 0.056 msecs
Thu Apr 30 10:45:09 2020: Jack: **** alsa_pcm: xrun of at least 0.057 msecs
Thu Apr 30 10:50:55 2020: Jack: **** alsa_pcm: xrun of at least 0.061 msecs

buffer size: 1024 (~344s interval between xruns):

Thu Apr 30 11:29:36 2020: Jack: **** alsa_pcm: xrun of at least 0.061 msecs
Thu Apr 30 11:35:23 2020: Jack: **** alsa_pcm: xrun of at least 0.061 msecs
Thu Apr 30 11:41:09 2020: Jack: **** alsa_pcm: xrun of at least 0.057 msecs
Thu Apr 30 11:46:53 2020: Jack: **** alsa_pcm: xrun of at least 0.053 msecs
Thu Apr 30 11:52:35 2020: Jack: **** alsa_pcm: xrun of at least 0.058 msecs

buffer size: 512 (~81s interval between xruns):

Thu Apr 30 11:56:43 2020: Jack: **** alsa_pcm: xrun of at least 0.060 msecs
Thu Apr 30 11:58:04 2020: Jack: **** alsa_pcm: xrun of at least 0.062 msecs
Thu Apr 30 11:59:27 2020: Jack: **** alsa_pcm: xrun of at least 0.059 msecs
Thu Apr 30 12:00:47 2020: Jack: **** alsa_pcm: xrun of at least 0.061 msecs
Thu Apr 30 12:02:10 2020: Jack: **** alsa_pcm: xrun of at least 0.061 msecs

buffer size: 256 (~81s interval between xruns):

Thu Apr 30 12:11:49 2020: Jack: **** alsa_pcm: xrun of at least 0.060 msecs
Thu Apr 30 12:13:11 2020: Jack: **** alsa_pcm: xrun of at least 0.059 msecs
Thu Apr 30 12:14:31 2020: Jack: **** alsa_pcm: xrun of at least 0.050 msecs
Thu Apr 30 12:15:53 2020: Jack: **** alsa_pcm: xrun of at least 0.062 msecs
Thu Apr 30 12:17:14 2020: Jack: **** alsa_pcm: xrun of at least 0.060 msecs

buffer size: 128 (~15s interval between xruns):

Thu Apr 30 12:20:49 2020: Jack: **** alsa_pcm: xrun of at least 0.064 msecs
Thu Apr 30 12:21:05 2020: Jack: **** alsa_pcm: xrun of at least 0.052 msecs
Thu Apr 30 12:21:27 2020: Jack: **** alsa_pcm: xrun of at least 0.062 msecs
Thu Apr 30 12:21:43 2020: Jack: **** alsa_pcm: xrun of at least 0.065 msecs
Thu Apr 30 12:21:59 2020: Jack: **** alsa_pcm: xrun of at least 0.066 msecs

I know that this messages does not contain very accurate timestamps, but they might be of some help, since $ dmesg -w was completely silent about the interrupts. [2]

[1] Trying to initialize the interface at other sample rates usually causes the following error:

[ 8294.188278] firewire_ohci 0000:03:00.3: DMA context IT0 has stopped, error code: evt_timeout

Power-cycling the interface and reloading firewire-ohci and snd-firewire-motu might not be enough to get it back working. It seems like JACK is unable to reinitialize the server until I reboot my laptop. At that point, if I try to manually start jackd with sample rate parameters, the sample rate LEDs on the interface change accordingly. Sometimes I can even hear a high-pitched signal while trying to initialize the JACK server with low sample rate settings. However, I usually rely on the JACK2's dbus system by executing jack_control commands.

I have also noticed that jack has to be started in synchronous mode, which is not a default option for JACK2. Otherwise the interval between xruns seems to be the same, but they are a lot more... like hundreds. Realtime priorities does not seem to make a change.

The tests above were performed from within a tty, without a graphical environment. I have disabled services like systemd-timesyncd, removed network modules and so on.

CPU scaling governor was set to performance. Both /sys/class/rtc/rtc0/max_user_freq and /sys/class/rtc/rtc0/max_user_freq are set to 2048. I have also tried running different kernels, distributions and so on, but things are usually worse.

[2] I have recompiled linux (without the realtime kernel patchset) and enabled CONFIG_SND_PCM_XRUN_DEBUG, CONFIG_SND_VERBOSE_PROCFS, CONFIG_SND_DEBUG. Here's some relevant output from dmesg that occured after a single xrun.

[  +0.000006] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.6.8-arch1-1-custom #1
[  +0.000002] Hardware name: LENOVO 4236QZ0/4236QZ0, BIOS CBET4000 4.11-2096-g5b06ffea56-dirty 04/07/2020
[  +0.000001] Call Trace:
[  +0.000005]  <IRQ>
[  +0.000018]  dump_stack+0x66/0x90
[  +0.000009]  __snd_pcm_xrun.cold+0x5/0xa [snd_pcm]
[  +0.000005]  snd_pcm_update_state+0x103/0x110 [snd_pcm]
[  +0.000003]  snd_pcm_update_hw_ptr0+0x24f/0x7b0 [snd_pcm]
[  +0.000003]  ? ohci_flush_iso_completions+0x116/0x150 [firewire_ohci]
[  +0.000004]  snd_pcm_period_elapsed+0x75/0xa0 [snd_pcm]
[  +0.000003]  tasklet_action_common.isra.0+0x5f/0x130
[  +0.000002]  __do_softirq+0x111/0x34d
[  +0.000002]  irq_exit+0xac/0xd0
[  +0.000001]  do_IRQ+0x89/0x140
[  +0.000002]  common_interrupt+0xf/0xf
[  +0.000001]  </IRQ>
[  +0.000002] RIP: 0010:cpuidle_enter_state+0xc9/0x410
[  +0.000002] Code: e8 bc 62 95 ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 1c 03 00 00 31 ff e8 4e da 9b ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 72 02 00 00 49 63 d5 4c 2b 64 24 10 48 8d 04 52 48
[  +0.000000] RSP: 0018:ffffafb0800abe68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
[  +0.000002] RAX: ffff8ab3eb6c0000 RBX: ffff8ab3eb6f6400 RCX: 000000000000001f
[  +0.000000] RDX: 0000000000000000 RSI: 000000003164062c RDI: 0000000000000000
[  +0.000001] RBP: ffffffffb94c99c0 R08: 00000094d68ef808 R09: 000000000000019f
[  +0.000000] R10: 0000000000000889 R11: ffff8ab3eb6ebce4 R12: 00000094d68ef808
[  +0.000001] R13: 0000000000000005 R14: 0000000000000005 R15: ffff8ab3ea660000
[  +0.000003]  ? cpuidle_enter_state+0xa4/0x410
[  +0.000002]  cpuidle_enter+0x29/0x40
[  +0.000002]  do_idle+0x1e6/0x270
[  +0.000002]  cpu_startup_entry+0x19/0x20
[  +0.000002]  start_secondary+0x186/0x1d0
[  +0.000002]  secondary_startup_64+0xb6/0xc0
[  +0.001095] ALSA: PCM: [Q] Lost interrupts?: (stream=0, delta=224, new_hw_ptr=224, old_hw_ptr=0)
[  +0.000002] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.6.8-arch1-1-custom #1
[  +0.000000] Hardware name: LENOVO 4236QZ0/4236QZ0, BIOS CBET4000 4.11-2096-g5b06ffea56-dirty 04/07/2020
[  +0.000001] Call Trace:
[  +0.000001]  <IRQ>
[  +0.000002]  dump_stack+0x66/0x90
[  +0.000005]  snd_pcm_update_hw_ptr0.cold+0x3d/0xef [snd_pcm]
[  +0.000003]  ? ohci_flush_iso_completions+0x116/0x150 [firewire_ohci]
[  +0.000004]  snd_pcm_period_elapsed+0x75/0xa0 [snd_pcm]
[  +0.000002]  tasklet_action_common.isra.0+0x5f/0x130
[  +0.000002]  __do_softirq+0x111/0x34d
[  +0.000002]  irq_exit+0xac/0xd0
[  +0.000001]  do_IRQ+0x89/0x140
[  +0.000002]  common_interrupt+0xf/0xf
[  +0.000000]  </IRQ>
[  +0.000002] RIP: 0010:cpuidle_enter_state+0xc9/0x410
[  +0.000001] Code: e8 bc 62 95 ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 1c 03 00 00 31 ff e8 4e da 9b ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 72 02 00 00 49 63 d5 4c 2b 64 24 10 48 8d 04 52 48
[  +0.000001] RSP: 0018:ffffafb0800abe68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
[  +0.000001] RAX: ffff8ab3eb6c0000 RBX: ffff8ab3eb6f6400 RCX: 000000000000001f
[  +0.000001] RDX: 0000000000000000 RSI: 000000003164062c RDI: 0000000000000000
[  +0.000000] RBP: ffffffffb94c99c0 R08: 00000094d6a1c341 R09: 00000000000001ce
[  +0.000001] R10: 0000000000000123 R11: ffff8ab3eb6ebce4 R12: 00000094d6a1c341
[  +0.000000] R13: 0000000000000003 R14: 0000000000000003 R15: ffff8ab3ea660000
[  +0.000003]  ? cpuidle_enter_state+0xa4/0x410
[  +0.000002]  cpuidle_enter+0x29/0x40
[  +0.000002]  do_idle+0x1e6/0x270
[  +0.000002]  cpu_startup_entry+0x19/0x20
[  +0.000002]  start_secondary+0x186/0x1d0
[  +0.000002]  secondary_startup_64+0xb6/0xc0

I feel like all of this might not be related to my system, but the internal clock of the interface. Any ideas?

system details: laptop: Lenovo Thinkpad T420 CPU: Intel i7-3720QM FireWire chip: Ricoh R5C832 OS: Arch Linux (linux-rt 5.6.4-rt3-1-rt)

takaswie commented 4 years ago

Hi,

Today I use my Thinkpad T420 and the same device (828mk3 FireWire only). Although I can see XRUNs each several hundreds seconds in a case of 192.0 kHz, 512/1024 frames/period and 2-4 periods/buffer, I cannot see periodicity based on the size of buffer.

My equipments:

For 1394 OHCI controller and ALSA firewire stack, I used two options for each, however I cannot find any difference between them.

In this trial, I turned hyperthreading off. Would I request you to test again without hyperthreading? In my understanding, hyperthreading loads cores and easily brings jitter for IRQ handling.

takaswie commented 4 years ago

But in another system I run jackd with '-v' option and receive the same XRUN error.

$ jackd -v -r -d alsa -d hw:2,0 -r 96000 -n2 -p 204
Jack: **** alsa_pcm: xrun of at least -91399602.176 msecs
Jack: ALSA XRun wait_status = 0
Jack: JackSocketServerChannel::Execute : fPollTable i = 1 fd = 11
Jack: JackRequest::Notification
Jack: JackEngine::ClientNotify: no callback for notification = 3
Jack: JackEngine::ClientNotify: no callback for notification = 3
Jack: **** alsa_pcm: xrun of at least -91399667.712 msecs
Jack: ALSA XRun wait_status = 0
Jack: JackSocketServerChannel::Execute : fPollTable i = 1 fd = 11
Jack: JackRequest::Notification
Jack: JackEngine::ClientNotify: no callback for notification = 3
Jack: JackEngine::ClientNotify: no callback for notification = 3

I'd like to have more time to investigate the XRUN further, mmm.

takaswie commented 4 years ago

As long as I apply below patch to ALSA firewire stack, I can see no printk output when encountering the XRUNs on jackd process.

$ git diff
diff --git a/sound/firewire/amdtp-stream.c b/sound/firewire/amdtp-stream.c
index 5baa1b7..31140ce 100644
--- a/sound/firewire/amdtp-stream.c
+++ b/sound/firewire/amdtp-stream.c
@@ -815,7 +815,7 @@ static void process_ctx_payloads(struct amdtp_stream *s,
        if (pcm)
                update_pcm_pointers(s, pcm, pcm_frames);
 }
-
+#include <sound/core.h>
 static void out_stream_callback(struct fw_iso_context *context, u32 tstamp,
                                size_t header_length, void *header,
                                void *private_data)
@@ -866,6 +866,7 @@ static void out_stream_callback(struct fw_iso_context *context, u32 tstamp,
                }

                if (queue_out_packet(s, &template.params, sched_irq) < 0) {
+                       snd_printk(KERN_INFO"out: XRUN\n");
                        cancel_stream(s);
                        return;
                }
@@ -893,6 +894,7 @@ static void in_stream_callback(struct fw_iso_context *context, u32 tstamp,
        err = generate_device_pkt_descs(s, s->pkt_descs, ctx_header, packets);
        if (err < 0) {
                if (err != -EAGAIN) {
+                       snd_printk(KERN_INFO"in: XRUN1\n");
                        cancel_stream(s);
                        return;
                }
@@ -904,6 +906,7 @@ static void in_stream_callback(struct fw_iso_context *context, u32 tstamp,
                struct fw_iso_packet params = {0};

                if (queue_in_packet(s, &params) < 0) {
+                       snd_printk(KERN_INFO"in: XRUN2\n");
                        cancel_stream(s);
                        return;
                }
@@ -981,6 +984,7 @@ static void irq_target_callback(struct fw_iso_context *context, u32 tstamp,

        return;
 error:
+       snd_printk(KERN_INFO"irq target: XRUN\n");
        if (amdtp_stream_running(irq_target))
                cancel_stream(irq_target);
takaswie commented 4 years ago

The list of system call on jackd process includes POLLERR in capture PCM character device.

[pid 119598] 18:56:46.688551 poll([{fd=7, events=POLLOUT|POLLERR|POLLNVAL}, {fd=8, events=POLLIN|POLLERR|POLLNVAL}], 2, 34828) = 1 ([{fd=7, revents=POLLOUT}])
[pid 119598] 18:56:46.688659 poll([{fd=8, events=POLLIN|POLLERR|POLLNVAL}], 1, 34828) = 1 ([{fd=8, revents=POLLIN}])
[pid 119598] 18:56:46.711986 poll([{fd=7, events=POLLOUT|POLLERR|POLLNVAL}, {fd=8, events=POLLIN|POLLERR|POLLNVAL}], 2, 34828) = 1 ([{fd=7, revents=POLLOUT}])
[pid 119598] 18:56:46.712084 poll([{fd=8, events=POLLIN|POLLERR|POLLNVAL}], 1, 34828) = 1 ([{fd=8, revents=POLLIN|POLLERR}])
[pid 119598] 18:56:46.734788 ioctl(8, SNDRV_PCM_IOCTL_STATUS_EXT, 0x7fa35618cc80) = 0
[pid 119598] 18:56:46.734883 write(1, "Jack: **** alsa_pcm: xrun of at "..., 58) = 58
[pid 119598] 18:56:46.735029 ioctl(7, SNDRV_PCM_IOCTL_DROP, 0x7fa35704ad00) = 0
[pid 119598] 18:56:46.735091 ioctl(7, SNDRV_PCM_IOCTL_PREPARE, 0x1) = 0
[pid 119598] 18:56:46.735258 ioctl(7, SNDRV_PCM_IOCTL_START, 0x1) = 0
[pid 119598] 18:56:46.735301 write(1, "Jack: ALSA XRun wait_status = 0\n", 32) = 32
[pid 119598] 18:56:46.735348 write(9, " \0\0\0", 4 <unfinished ...>
[pid 119599] 18:56:46.735424 <... poll resumed>) = 1 ([{fd=11, revents=POLLIN}])
[pid 119598] 18:56:46.735467 <... write resumed>) = 4
[pid 119599] 18:56:46.735487 write(1, "Jack: JackSocketServerChannel::E"..., 66 <unfinished ...>
[pid 119598] 18:56:46.735513 write(9, "\f\0\0\0", 4 <unfinished ...>
[pid 119599] 18:56:46.735541 <... write resumed>) = 66
[pid 119598] 18:56:46.735558 <... write resumed>) = 4
[pid 119599] 18:56:46.735574 read(11,  <unfinished ...>
[pid 119598] 18:56:46.735593 write(9, "\377\377\377\377", 4 <unfinished ...>
[pid 119599] 18:56:46.735617 <... read resumed>" \0\0\0", 4) = 4
[pid 119598] 18:56:46.735638 <... write resumed>) = 4
takaswie commented 4 years ago

Playback-only mode of jackd can suppress this issue since the process doesn't process capture PCM character device (of cource).

$ jackd -v -r -d alsa -d hw:2,0 -r 96000 -P -p512
...
creating alsa driver ... hw:2,0|-|512|2|96000|0|0|nomon|swmeter|-|32bit
configuring for 96000Hz, period = 512 frames (5.3 ms), buffer = 2 periods
takaswie commented 4 years ago

Against my expectation, capture-only mode of jackd can suppress the issue.

$ jackd -v -r -d alsa -d hw:2,0 -r 96000 -C -p512
...
creating alsa driver ... -|hw:2,0|512|2|96000|0|0|nomon|swmeter|-|32bit
configuring for 96000Hz, period = 512 frames (5.3 ms), buffer = 2 periods
...
takaswie commented 4 years ago

Oops. The another system runs with MOTU UltraLite mk3 (FireWire model) and the driver under development doesn't work well at 88.2/96.0 kHz (I programmed wrong stream format)...

takaswie commented 4 years ago

But it's possible that the device reduces the number of chunks up to unexpected value, hm.

@alesandar Would I request you to retrieve output from /proc/asound/cardX/firewire/format? (Here X is the card number of your device in ALSA subsystem.) As long as I know, 828mk3 is a kind of device which changes packet format according to settings for optical audio interface.

takaswie commented 4 years ago

I worked with bus analyzer to retrieve the profiles of isochronous packet from/to MOTU 828mk3.

STF(kHz) opt A opt B tx rx
44.1/ 48.0 disable disable 130 106
44.1/ 48.0 ADAT disable 178 154
44.1/ 48.0 TOSLINK disable 154 130
44.1/ 48.0 disable ADAT 178 154
44.1/ 48.0 disable TOSLINK 154 130
44.1/ 48.0 ADAT TOSLINK 202 178
44.1/ 48.0 TOSLINK ADAT 202 178
44.1/ 48.0 ADAT ADAT 226 202
44.1/ 48.0 TOSLINK TOSLINK 178 154
----------- --------- --------- ----- -----
88.2/ 96.0 disable disable 258 210
88.2/ 96.0 ADAT disable 306 258
88.2/ 96.0 TOSLINK disable 306 258
88.2/ 96.0 disable ADAT 306 258
88.2/ 96.0 disable TOSLINK 306 258
88.2/ 96.0 ADAT TOSLINK 354 306
88.2/ 96.0 TOSLINK ADAT 354 306
88.2/ 96.0 ADAT ADAT 354 306
88.2/ 96.0 TOSLINK TOSLINK 354 306
----------- --------- --------- ----- -----
176.4/192.0 disable disable 418 322
takaswie commented 4 years ago

In the case of MOTU AudioExpress.

STF(kHz) tx rx
44.1/ 48.0 82 82
88.2/ 96.0 162 162
takaswie commented 4 years ago

In the case of MOTU 4pre.

STF(kHz) tx rx
44.1/ 48.0 82 82
88.2/ 96.0 162 162
takaswie commented 4 years ago

In the case of MOTU UltraLite mk3 (fw).

STF(kHz) tx rx
44.1/ 48.0 130 106
88.2/ 96.0 210 210
176.4/192.0 322 418
takaswie commented 4 years ago

In the case of MOTU UltraLite.

STF(kHz) tx rx
44.1/ 48.0 106 106
88.2/ 96.0 210 210
takaswie commented 4 years ago

In the case of MOTU 8pre.

STF(kHz) ADAT tx rx
44.1/ 48.0 off 82 58
44.1/ 48.0 on 130 106
88.2/ 96.0 off 162 114
88.2/ 96.0 on 258 210
takaswie commented 4 years ago

In the case of Traveler.

STF(kHz) optical tx rx
44.1/ 48.0 disable 106 106
44.1/ 48.0 ADAT 154 154
44.1/ 48.0 TOSLINK 106 106
----------- --------- ----- -----
88.2/ 96.0 disable 210 210
88.2/ 96.0 ADAT 258 258
88.2/ 96.0 TOSLINK 210 210
----------- --------- ----- -----
176.4/192.0 disable 290 290
takaswie commented 4 years ago

In the case of MOTU 828mkII.

STF(kHz) optical tx rx
44.1/ 48.0 disable 106 106
44.1/ 48.0 ADAT 154 154
44.1/ 48.0 TOSLINK 106 106
----------- --------- ----- -----
88.2/ 96.0 disable 210 210
88.2/ 96.0 ADAT 258 258
88.2/ 96.0 TOSLINK 210 210
takaswie commented 4 years ago

In the case of MOTU 896.

STF(kHz) ADAT tx rx
44.1/ 48.0 off 74 74
44.1/ 48.0 on 122 122
----------- ------ ----- -----
88.2/ 96.0 off 146 146
takaswie commented 4 years ago

In the case of MOTU 828.

STF(kHz) optical tx rx
44.1/ 48.0 disable 82 74
44.1/ 48.0 ADAT 130 121
44.1/ 48.0 TOSLINK 82 74
takaswie commented 4 years ago

In the case of MOTU V4HD.

STF(kHz) Analog AES/EBU ADAT SDI tx rx
44.1/ 48.0 on 82 82
44.1/ 48.0 on 82 82
44.1/ 48.0 on 82 82
44.1/ 48.0 on 82 82
44.1/ 48.0 on on 130 130
44.1/ 48.0 on on 130 130
44.1/ 48.0 on on 130 130
44.1/ 48.0 on on 130 130
44.1/ 48.0 on on 130 130
44.1/ 48.0 on on 130 130
44.1/ 48.0 on on on 178 178
44.1/ 48.0 on on on 178 178
44.1/ 48.0 on on on 178 178
44.1/ 48.0 on on on on 226 226
----------- -------- --------- ------ ----- ----- -----
88.2/ 96.0 on 162 162
88.2/ 96.0 on 162 162
88.2/ 96.0 on 162 162
88.2/ 96.0 on 162 162
88.2/ 96.0 on on 258 258
88.2/ 96.0 on on 258 258
88.2/ 96.0 on on 258 258
88.2/ 96.0 on on 258 258
88.2/ 96.0 on on 258 258
88.2/ 96.0 on on 258 258
88.2/ 96.0 on on on 354 354
88.2/ 96.0 on on on 354 354
88.2/ 96.0 on on on 354 354
88.2/ 96.0 on on on on 450 450
----------- -------- --------- ------ ----- ----- -----
176.4/192.0 on 322 322
takaswie commented 4 years ago

In protocol of MOTU FireWire series, the number of 24-bit chunks per data block can be calculated by this formula:

chunks = ((quadlets - 2) / syt_interval - 1) / 3 * 4
so that:
  chunks: the number of chunks per data block
  quadlets: the amount of quadlets per packet payload
  syt_interval: calculated by sampling transfer frequency in IEC 61883-6
name STF(kHz) Analog AES/EBU ADAT SDI tx rx
V4HD 44.1/ 48.0 on off off off 12.0 12.0
V4HD 44.1/ 48.0 off on off off 12.0 12.0
V4HD 44.1/ 48.0 off off on off 12.0 12.0
V4HD 44.1/ 48.0 off off off on 12.0 12.0
V4HD 44.1/ 48.0 on on off off 20.0 20.0
V4HD 44.1/ 48.0 on off on off 20.0 20.0
V4HD 44.1/ 48.0 on off off on 20.0 20.0
V4HD 44.1/ 48.0 off on on off 20.0 20.0
V4HD 44.1/ 48.0 off on off on 20.0 20.0
V4HD 44.1/ 48.0 off off on on 20.0 20.0
V4HD 44.1/ 48.0 on on on off 28.0 28.0
V4HD 44.1/ 48.0 on on off on 28.0 28.0
V4HD 44.1/ 48.0 on off on on 28.0 28.0
V4HD 44.1/ 48.0 on on on on 36.0 36.0
V4HD 88.2/ 96.0 on off off off 12.0 12.0
V4HD 88.2/ 96.0 off on off off 12.0 12.0
V4HD 88.2/ 96.0 off off on off 12.0 12.0
V4HD 88.2/ 96.0 off off off on 12.0 12.0
V4HD 88.2/ 96.0 on on off off 20.0 20.0
V4HD 88.2/ 96.0 on off on off 20.0 20.0
V4HD 88.2/ 96.0 on off off on 20.0 20.0
V4HD 88.2/ 96.0 off on on off 20.0 20.0
V4HD 88.2/ 96.0 off on off on 20.0 20.0
V4HD 88.2/ 96.0 off off on on 20.0 20.0
V4HD 88.2/ 96.0 on on on off 28.0 28.0
V4HD 88.2/ 96.0 on on off on 28.0 28.0
V4HD 88.2/ 96.0 on off on on 28.0 28.0
V4HD 88.2/ 96.0 on on on on 36.0 36.0
V4HD 176.4/192.0 on off off off 12.0 12.0
name STF(kHz) optical tx rx
828 44.1/ 48.0 disable 12.0 10.666666666666666
828 44.1/ 48.0 ADAT 20.0 17.333333333333332
828 44.1/ 48.0 TOSLINK 12.0 10.666666666666666
name STF(kHz) ADAT tx rx
896 44.1/ 48.0 off 10.666666666666666 10.666666666666666
896 44.1/ 48.0 on 18.666666666666668 18.666666666666668
896 88.2/ 96.0 off 10.666666666666666 10.666666666666666
name STF(kHz) optical tx rx
828mkII 44.1/ 48.0 disable 16.0 16.0
828mkII 44.1/ 48.0 ADAT 24.0 24.0
828mkII 44.1/ 48.0 TOSLINK 16.0 16.0
828mkII 88.2/ 96.0 disable 16.0 16.0
828mkII 88.2/ 96.0 ADAT 20.0 20.0
828mkII 88.2/ 96.0 TOSLINK 16.0 16.0
name STF(kHz) optical tx rx
Traveler 44.1/ 48.0 disable 16.0 16.0
Traveler 44.1/ 48.0 ADAT 24.0 24.0
Traveler 44.1/ 48.0 TOSLINK 16.0 16.0
Traveler 88.2/ 96.0 disable 16.0 16.0
Traveler 88.2/ 96.0 ADAT 20.0 20.0
Traveler 88.2/ 96.0 TOSLINK 16.0 16.0
Traveler 176.4/192.0 disable 10.666666666666666 10.666666666666666
name STF(kHz) ADAT tx rx
8pre 44.1/ 48.0 off 12.0 8.0
8pre 44.1/ 48.0 on 20.0 16.0
8pre 88.2/ 96.0 off 12.0 8.0
8pre 88.2/ 96.0 on 20.0 16.0
name STF(kHz) tx rx
UltraLite 44.1/ 48.0 16.0 16.0
UltraLite 88.2/ 96.0 16.0 16.0
name STF(kHz) tx rx
UltraLitemk3 44.1/ 48.0 20.0 16.0
UltraLitemk3 88.2/ 96.0 16.0 16.0
UltraLitemk3 176.4/192.0 12.0 16.0
name STF(kHz) tx rx
4pre 44.1/ 48.0 12.0 12.0
4pre 88.2/ 96.0 12.0 12.0
name STF(kHz) tx rx
AudioExpress 44.1/ 48.0 12.0 12.0
AudioExpress 88.2/ 96.0 12.0 12.0
name STF(kHz) opt A opt B tx rx
828mk3 44.1/ 48.0 disable disable 20.0 16.0
828mk3 44.1/ 48.0 ADAT disable 28.0 24.0
828mk3 44.1/ 48.0 TOSLINK disable 24.0 20.0
828mk3 44.1/ 48.0 disable ADAT 28.0 24.0
828mk3 44.1/ 48.0 disable TOSLINK 24.0 20.0
828mk3 44.1/ 48.0 ADAT TOSLINK 32.0 28.0
828mk3 44.1/ 48.0 TOSLINK ADAT 32.0 28.0
828mk3 44.1/ 48.0 ADAT ADAT 36.0 32.0
828mk3 44.1/ 48.0 TOSLINK TOSLINK 28.0 24.0
828mk3 88.2/ 96.0 disable disable 20.0 16.0
828mk3 88.2/ 96.0 ADAT disable 24.0 20.0
828mk3 88.2/ 96.0 TOSLINK disable 24.0 20.0
828mk3 88.2/ 96.0 disable ADAT 24.0 20.0
828mk3 88.2/ 96.0 disable TOSLINK 24.0 20.0
828mk3 88.2/ 96.0 ADAT TOSLINK 28.0 24.0
828mk3 88.2/ 96.0 TOSLINK ADAT 28.0 24.0
828mk3 88.2/ 96.0 ADAT ADAT 28.0 24.0
828mk3 88.2/ 96.0 TOSLINK TOSLINK 28.0 24.0
828mk3 176.4/192.0 disable disable 16.0 12.0

For 896, rx packets of 828, data block includes no chunks for message, therefore the calculation result shows floating number. For Traveller at higher rate, data block includes 2 byte padding.

alesandar commented 4 years ago

I will provide additional information a little bit later, regarding some tests with multi-threading disabled, capture-only and playback-only modes, but in the meanwhile:

$ cat /proc/asound/card0/firewire/format
tx:     msg     fixed   differed
44100:  2       18      16
48000:  2       18      16
88200:  2       18      8
96000:  2       18      8
176400: 2       14      0
192000: 2       14      0
rx:     msg     fixed   differed
44100:  2       14      16
48000:  2       14      16
88200:  2       14      8
96000:  2       14      8
176400: 2       10      0
192000: 2       10      0

I would also like to remind you about the fact that I could initialize the interface only at 192kHz, otherwise I get the DMA context error message, thrown by firewire-ohci, mentioned in my first message.

What is the revision of your Ricoh chipset?

$ lspci -d 1180:e832
03:00.3 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 PCIe IEEE 1394 Controller (rev 04)
takaswie commented 4 years ago

I would also like to remind you about the fact that I could initialize the interface only at 192kHz, otherwise I get the DMA context error message, thrown by firewire-ohci, mentioned in my first message. What is the revision of your Ricoh chipset?

$ lspci -vnns 0d:00.3
0d:00.3 FireWire (IEEE 1394) [0c00]: Ricoh Co Ltd R5C832 PCIe IEEE 1394 Controller [1180:e832] (rev 04) (prog-if 10 [OHCI])
    Subsystem: Lenovo R5C832 PCIe IEEE 1394 Controller [17aa:21ce]
    Flags: bus master, fast devsel, latency 0, IRQ 19
    Memory at f1400000 (32-bit, non-prefetchable) [size=2K]
    Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
    Capabilities: [78] Power Management version 3
    Capabilities: [80] Express Endpoint, MSI 00
    Capabilities: [100] Advanced Error Reporting
    Kernel driver in use: firewire_ohci
    Kernel modules: firewire_ohci

As you suspect, Ricoh chip of 1394 OHCI controller has been reported to have some quirks, as long as I know. However, firewire-ohci driver in Linux kernel has some workaround to use the controller. In my environment, the driver is loaded with quirk 0x11.

$ dmesg
...
firewire_ohci 0000:0d:00.3: added OHCI v1.10 device as card 0, 4 IR + 4 IT contexts, quirks 0x11
...

I'd like you to check whether the driver is loaded with proper quirk flags. You can see what 0x11 quirks mean in source code of Linux kernel (QUIRK_CYCLE_TIMER | QUIRK_NO_MSI): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/firewire/ohci.c#n307

takaswie commented 4 years ago
$ cat /proc/asound/card0/firewire/format
tx:     msg     fixed   differed
44100:  2       18      16
48000:  2       18      16
88200:  2       18      8
96000:  2       18      8
176400: 2       14      0
192000: 2       14      0
rx:     msg     fixed   differed
44100:  2       14      16
48000:  2       14      16
88200:  2       14      8
96000:  2       14      8
176400: 2       10      0
192000: 2       10      0

The device seems to be configured so that both of optical interfaces to both direction are for ADAT. In this case, the sum of msg/fixed/differed should equal to entries in below table, and actually the same:

| STF(kHz)    | opt A   | opt B   | tx   | rx   |tx ch |rx ch |
|-------------|---------|---------|------|------|------|------|
|  44.1/ 48.0 | ADAT    | ADAT    |  226 |  202 | 36.0 | 32.0 |
|  88.2/ 96.0 | ADAT    | ADAT    |  354 |  306 | 28.0 | 24.0 |
| 176.4/192.0 | disable | disable |  418 |  322 | 16.0 | 12.0 |
alesandar commented 4 years ago

firewire-ohci is being loaded with the same quirks flag as yours by default, 0x11. I have previously tested 0x1, but the results were much worse.

alesandar commented 4 years ago

I have temporarily switched from JACK2 to JACK1 and turned off hyper-threading.

jackd -v -P69 -dalsa -dhw:D828mk3,0 -r192000 -p512 -n3 -P (playback-only mode) has been running for about 20 minutes, without any issues, while MPD was playing through two of the analogue outputs.

However, in capture-only mode I had to remove the realtime-priority parameter. Whether the realtime property has been set or not, ALSA did not trigger any xruns, but this time JACK was throwing a bunch of late driver wakeup: nframes to process = 1024. and delay of 2682.000 usecs exceeds estimated spare time of 2653.000; restart ..., causing something I would like to call an internal JACK xrun. By disabling the realtime mode, the warnings are still displayed, but they did not cause internal JACK xruns.

In order to detect ALSA xruns, I do echo 3 > /proc/asound/card0/pcm0c/xrun_debug && echo 3 > /proc/asound/card0/pcm0p/xrun_debug and then I simply grep my syslogs.

alesandar commented 4 years ago

The interface could be initialized at 176.4kHz, but this might not make a big difference, since the msg/fixed/differed format is the same, as it is for 192kHz.

alesandar commented 4 years ago

@takaswie, what is you firmware version? I forgot to mention, but mine is v1.06.

alesandar commented 4 years ago

That was very interesting. I have booted into Windows and disabled the optical in/outs. You were right, they were set to be used for ADAT and that's probably set upon a hardware reset. Now I can initialize the interface at lower sample rates without any issues. Tested everything below 176.4kHz (44.1kHz / 48kHz / 88.2kHz / 96kHz).

alesandar commented 4 years ago
$ cat /proc/asound/card0/firewire/format
tx:     msg     fixed   differed
44100:  2       18      0
48000:  2       18      0
88200:  2       18      0
96000:  2       18      0
176400: 2       14      0
192000: 2       14      0
rx:     msg     fixed   differed
44100:  2       14      0
48000:  2       14      0
88200:  2       14      0
96000:  2       14      0
176400: 2       10      0
192000: 2       10      0
takaswie commented 4 years ago

what is you firmware version? I forgot to mention, but mine is v1.06.

I use the same version of firmware (v1.06).

takaswie commented 4 years ago

I have booted into Windows and disabled the optical in/outs.

You can do it by CueMix on device without any software on Windows. Just disconnect the unit from IEEE 1394 bus.

You were right, they were set to be used for ADAT and that's probably set upon a hardware reset. Now I can initialize the interface at lower sample rates without any issues. Tested everything below 176.4kHz (44.1kHz / 48kHz / 88.2kHz / 96kHz).

Hm. In my side, communication with these ADAT interfaces is available. I guess that the cause is on the lower layer than protocol processing; e.g. bandwidth limitation on PCI bus and so on.

For your information, in my machine, any virtualization feature is disabled (VT-x, VT-d). Would you please attempt test in the same condition?

alesandar commented 4 years ago

In duplex mode, periodical xruns still occur. This time realtime priority was not enabled. JACK was started with the following command:

jackd -r -dalsa -dhw:D828mk3 -r192000 -p1024 -n4

$ journalctl --output=short-unix -ab -S -25m | grep -i "snd_firewire_motu"
1589719823.421231 420 kernel: snd_firewire_motu fw1.0: XRUN: pcmC0D0p:0
1589720177.659807 420 kernel: snd_firewire_motu fw1.0: XRUN: pcmC0D0p:0
1589720532.090096 420 kernel: snd_firewire_motu fw1.0: XRUN: pcmC0D0p:0
1589720888.611386 420 kernel: snd_firewire_motu fw1.0: XRUN: pcmC0D0p:0

The average time between xruns here is ~355s. Pretty much the same as in my first post, but this time I have used the system logs, in order to calculate the period.

For your information, in my machine, any virtualization feature is disabled (VT-x, VT-d). Would you please attempt test in the same condition?

I haven't used any kind of hardware virtualization software on that machine. I will check how to disable VT-x in a minute.

One more thing that might be interesting. If xrun_debug is set to 3 for both PCMs, I can see many messages like this one:

[  +0.010461] CPU: 1 PID: 285 Comm: irq/19-firewire Not tainted 5.6.10-rt5-2-rt #1
[  +0.000005] Hardware name: LENOVO 4236QZ0/4236QZ0, BIOS CBET4000 4.11-2593-g65cc80f740-dirty 05/04/2020
[  +0.000002] Call Trace:
[  +0.000004]  dump_stack+0x64/0x88
[  +0.000006]  snd_pcm_update_hw_ptr0.cold+0x5d/0xbf [snd_pcm]
[  +0.000011]  ? ohci_flush_iso_completions+0x138/0x170 [firewire_ohci]
[  +0.000007]  snd_pcm_period_elapsed+0x75/0xa0 [snd_pcm]
[  +0.000011]  tasklet_action_common.constprop.0+0x51/0xb0
[  +0.000004]  __do_softirq+0x111/0x3c2
[  +0.000004]  ? preempt_count_add+0x49/0xa0
[  +0.000003]  ? irq_thread_fn+0x60/0x60
[  +0.000004]  __local_bh_enable_ip+0x144/0x190
[  +0.000003]  irq_forced_thread_fn+0x73/0x80
[  +0.000004]  irq_thread+0xec/0x1c0
[  +0.000004]  ? irq_finalize_oneshot.part.0+0xe0/0xe0
[  +0.000003]  ? irq_thread_check_affinity+0xd0/0xd0
[  +0.000005]  kthread+0x117/0x130
[  +0.000004]  ? __kthread_bind_mask+0x60/0x60

They do not cause actual xruns, but at the same time it's not necessary to start JACK, in order to trigger them. Any ALSA-capable utility would be enough, for example speaker-test or arecord.

The more interesting is that, between hundreds of this messages, once in a while I can see this:

[  +0.000020] CPU: 1 PID: 285 Comm: irq/19-firewire Not tainted 5.6.10-rt5-2-rt #1
[  +0.000003] Hardware name: LENOVO 4236QZ0/4236QZ0, BIOS CBET4000 4.11-2593-g65cc80f740-dirty 05/04/2020
[  +0.000001] Call Trace:
[  +0.000003]  dump_stack+0x64/0x88
[  +0.000004]  snd_pcm_update_hw_ptr0.cold+0x5d/0xbf [snd_pcm]
[  +0.000010]  ? migrate_enable+0x4c/0x460
[  +0.000006]  snd_pcm_period_elapsed+0x75/0xa0 [snd_pcm]
[  +0.000011]  tasklet_action_common.constprop.0+0x51/0xb0
[  +0.000004]  __do_softirq+0x111/0x3c2
[  +0.000003]  ? preempt_count_add+0x49/0xa0
[  +0.000004]  ? irq_thread_fn+0x60/0x60
[  +0.000003]  __local_bh_enable_ip+0x144/0x190
[  +0.000003]  irq_forced_thread_fn+0x73/0x80
[  +0.000005]  irq_thread+0xec/0x1c0
[  +0.000004]  ? irq_finalize_oneshot.part.0+0xe0/0xe0
[  +0.000003]  ? irq_thread_check_affinity+0xd0/0xd0
[  +0.000004]  kthread+0x117/0x130
[  +0.000004]  ? __kthread_bind_mask+0x60/0x60
[  +0.000005]  ret_from_fork+0x35/0x40

? ohci_flush_iso_completions+0x138/0x170 [firewire_ohci] is not called in the second message.

alesandar commented 4 years ago

VT-x has been disabled. Confirmed by the empty output of lscpu | grep Virtualization and cat /proc/cpuinfo | grep -E 'vmx|svm|0xc0f'. The issue is still present, when I use playback and capture simultaneously.

takaswie commented 4 years ago

VT-x has been disabled. Confirmed by the empty output of lscpu | grep Virtualization and cat /proc/cpuinfo | grep -E 'vmx|svm|0xc0f'. The issue is still present, when I use playback and capture simultaneously.

To make extra sure, would I ask you to disable machine-level VT-x/VT-d in BIOS since VT-d is feature of IOMMU and it might affects I/O in PCI/PCI-E bus theirselves. in my understanding.

takaswie commented 4 years ago

As long as I investigated by IEEE 1394 protocol analyzer, the size of incoming/outgoing packet is expected for the calculation by ALSA firewire-motu driver internal. Thus it's scarcely to miss-handle packet payload for missing PCM frames...

alesandar commented 4 years ago

I use coreboot. I had to disable CONFIG_ENABLE_VMX, recompile coreboot and flash the chip again. Before doing so, both of the commands above provided output, indicating that hardware virtualization is enabled. Even if you decide to suspect that coreboot may have affected the performance of the PCI/PCI-E bus, how is this going to make a difference, if the issue is present on your system, which probably runs the stock BIOS? I could ask on the coreboot mailing lists, if anyone has ever experienced such an issue, but do you think it might be necessary to do so?

takaswie commented 4 years ago

I use coreboot. I had to disable CONFIG_ENABLE_VMX, recompile coreboot and flash the chip again. Before doing so, both of the commands above provided output, indicating that hardware virtualization is enabled. Even if you decide to suspect that coreboot may have affected the performance of the PCI/PCI-E bus, how is this going to make a difference, if the issue is present on your system, which probably runs the stock BIOS? I could ask on the coreboot mailing lists, if anyone has ever experienced such an issue, but do you think it might be necessary to do so?

Hm. The judgement was based on my memory that some products on market have inconvenience for DMA on peripheral bus for isochronous communication. 1394 OHCI is based on PCI technology. To implement it on PCI-E, some controllers integrated PCI/PCI-E bridge with 1394 OHCI. In the case of design, some products in this market have bandwidth limitation on the bridge itself. As a result, isochronous communication corrupts due to the limitation.

For example, Texas Instruments XIO 2210 series integrates PCI Express to PCI Bridge internally. This chipset is itself good and I use Sonnet FireWire 800 Pro ExpressCard/34[1] for my development when needed. However some products add bandwidth to the bridge. Once I got such problematic card and disappointed.

Here, I focus on which chipset includes PCI/PCI-Express bridge for our case. As long as reading datasheets of Ricoh's combo chip of OHCI 1394 and SD Host controller[2], R5C832 doesn't have PCI-Express interface but PCI interface. However, lspci shows that R5C832 is on PCI-Express bus:

$ lspci -t
-[0000:00]-+-00.0
           +-02.0
           +-16.0
           +-16.3
           +-19.0
           +-1a.0
           +-1b.0
           +-1c.0-[02]--
           +-1c.1-[03]----00.0
           +-1c.3-[05-0c]--
           +-1c.4-[0d]--+-00.0
           |            \-00.3
           +-1d.0
           +-1f.0
           +-1f.2
           \-1f.3
$ lspci -vnnPP
00:1c.4 PCI bridge [0604]: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 [8086:1c18] (rev b4) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 27
        Bus: primary=00, secondary=0d, subordinate=0d, sec-latency=0
        I/O behind bridge: 00003000-00003fff [size=4K]
        Memory behind bridge: f1400000-f1bfffff [size=8M]
        Prefetchable memory behind bridge: 00000000f0c00000-00000000f13fffff [size=8M]
        Capabilities: <access denied>
        Kernel driver in use: pcieport
...
00:1c.4/0d:00.0 SD Host controller [0805]: Ricoh Co Ltd PCIe SDXC/MMC Host Controller [1180:e823] (rev 08) (prog-if 01)
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at f1401000 (32-bit, non-prefetchable) [size=256]
        Capabilities: <access denied>
        Kernel driver in use: sdhci-pci
        Kernel modules: sdhci_pci
00:1c.4/0d:00.3 FireWire (IEEE 1394) [0c00]: Ricoh Co Ltd R5C832 PCIe IEEE 1394 Controller [1180:e832] (rev 04) (prog-if 10 [OHCI])
        Subsystem: Lenovo R5C832 PCIe IEEE 1394 Controller [17aa:21ce]
        Flags: bus master, fast devsel, latency 0, IRQ 19
        Memory at f1400000 (32-bit, non-prefetchable) [size=2K]
        Capabilities: <access denied>
        Kernel driver in use: firewire_ohci
        Kernel modules: firewire_ohci

It means that the 'R5C832' is just a display name and implemented chip is different from R5C832. This night I opened my T420 to confirm the chip. It's R5U232. According to flyer of the chip, it includes PCI-Express functionality. When considering that 1394 OHCI is based on PCI technology, the chip seems to have PCI/PCI-Express bridge. If the bridge had the bandwidth limitation, it would occur that isochronous communication with higher bandwidth corrupts, perhaps...

[1] http://www.sonnettech.com/product/legacyproducts/fw800expresscard34.html [2] https://www.n-redc.co.jp/en/products/discon.html

alesandar commented 4 years ago

I already know about the fact that the device is connected to a PCI bridge. The FireWire port is just a small daughter board. I have manually mounted it on my laptop, but I did not pay attention to the chip label. I have bricked my BIOS today, so by the end of the week I will be able to fix it. At that point, I will be able to confirm if it is R5U232, but that's not so important right now. I hope you don't mind to keep that issue open for a while.

alesandar commented 4 years ago

I can confirm that my chip is also labeled as R5U232. I was unable to find a datasheet about that model. It must be a discontinued product, but the lack of information about it is somehow weird. At the same time, I was able to find a datasheet about R5C832. There are also some photos on the web, showing a chip that has an R5C832 label and I would conclude that it is a completely different product.

This one might be a little bit off-topic, but I have recently contacted a person with whom I had a conversation about coreboot/T420/FireWire and he is also planning to use his laptop in a similar environment and he sent me a firewire-ohci kernel patch, related to R5C832. It seems like it has actually improved the performance of the chip, but I might have to experiment a little bit more. The patch itself has exposed a firewire-ohci parameter, which writes some hexadecimals to the chip and as far as I can understand, it is trying to prioritize FireWire connections over the SD Host Controller.

Since we can agree that the issue is present only in duplex-mode, I think that I might try to address the problem on the jack-devel mailing list.

In the meanwhile I have played with many different type of bridges, such as ALSA's multi/duplex/loopback, JACK's alsa_in/alsa_out plugins, zita-ajbridge, but without a lot of success.

Thanks a lot for your support, @takaswie. You have spent a great amount of time on that problem. I feel like there might be nothing wrong with your library and that's why I will try to move the topic to jack-devel. Feel free to close the issue.

alesandar commented 4 years ago

@takaswie, I have finally managed to avoid the periodic interrupts by word-clocking the interface with a mid-90's digital console (Yamaha O2R). It has been tested for about an hour in duplex-mode.

$ cat /proc/asound/card4/firewire/clock
Rate:   192000
Source: Word clock on BNC interface

Do you think that there might be an issue with MOTU's internal clock?

takaswie commented 4 years ago

Hi @alesandar,

I'm sorry to be late for reply, but I'm a bit busy for my work for control services in user space[1].

Thanks for your effort and report. Once I suspect that the issue comes from the bandwidth limitation of PCI/PCI-Express bridge in Ricoh R5U232, however you clarify that it's incorrect and something related to internal clock. This is a great help to reveal blackbox in MOTU FireWire units.

Since we can agree that the issue is present only in duplex-mode, I think that I might try to address the problem on the jack-devel mailing list.

I note that even if in simplex mode, ALSA firewire-motu driver processes incoming/outgoing packet streaming. The difference is to process PCM substream for both direction or not. I have a conjecture that:

But the above conjecture doesn't describe issues on OHCI 1394 controller. I need more time to consider about it.

Do you think that there might be an issue with MOTU's internal clock?

The most of Audio and Music unit on IEEE 1394 bus have two types of clock internally; sampling clock and media clock. The former is well-known for users, and the latter is for time-synchronization between transmitter and receivers. These two have association by the mechanism of control engineering; e.g. Phase Lock Loop (PLL) with high-resolution crystal oscillator. The transmitter is in our case ALSA firewire-motu driver (precisely ALSA IEC 61883-1/6 packet streaming engine).

In my opinion, the design of PLL differs depending on vendors and models. At present, ALSA IEC 61883-1/6 packet streaming engine is not optimized to the vendor-dependent or model-dependent design.

I guess that your experience comes from the above. I guess that switching from internal clock to word clock means that the unit splits the sampling clock from the signal of PLL. As a result, the sampling clock refers to signal of word clock. and the unit perform stable sampling and transmit isochronous packets with media clock generated by the stable sampling. The timestamp in received packets (from ALSA firewire-motu driver) is just ignored and including PCM frames goes to DAC.

For my information, would you check to hear any periodical noise in the sound generated by your unit? I guess that you hear no noise in the case of referring to word clock.

[1] https://mailman.alsa-project.org/pipermail/alsa-devel/2020-June/168550.html

alesandar commented 4 years ago

I'm sorry to be late for reply, but I'm a bit busy for my work for control services in user space[1].

No worries, we are not in a hurry. It's nice to see that there are new bindings for libhinawa. I have already compiled both libhinawa and hinawa-utils and kept track on some minor issues, which might soon be reported to the corresponding repositories.

For my information, would you check to hear any periodical noise in the sound generated by your unit? I guess that you hear no noise in the case of referring to word clock.

I am pretty sure that there are no interrupts in both directions (incoming and outgoing signal). I have already had a nice hybrid jam with some hardware synths and a DAW, while the interface was acting as a word-clock slave to my digital mixing desk. I did not notice any noises.

alesandar commented 4 years ago

Regarding what I have said in one of my previous messages:

I can confirm that my chip is also labeled as R5U232. I was unable to find a datasheet about that model. It must be a discontinued product, but the lack of information about it is somehow weird. At the same time, I was able to find a datasheet about R5C832. There are also some photos on the web, showing a chip that has an R5C832 label and I would conclude that it is a completely different product.

I have found a message by a Ricoh employee, stating that 1180:e823 is actually used for two devices - R5U220/R5U232. [1] At the same time, Ricoh has uploaded two separate specifications for R5U232 and R5C832, which does not seem to be the same thing. [2] [3] Third-party services seems to keep only the the R5C832 datasheet.

I have decided to create a gist with the original contents of the previously mentioned firewire-ohci kernel patch. [4] It seems like it was written a long time ago by Clemens Ladisch, who has stated that the patch itself is completely untested. I was told that it must have been designed only for the fourth revision of the chip. I will have to do more tests, in order to confirm how exactly it has improved my overall experience.

[1] https://lore.kernel.org/patchwork/patch/260859/ [2] https://www.n-redc.co.jp/en/pdf/discon/r5u232-ek.pdf [3] https://www.n-redc.co.jp/en/pdf/discon/r5c832-ek.pdf [4] https://gist.github.com/alesandar/d4265522bc42f25a77d1467264c8c11b

alesandar commented 3 years ago

I can confirm that the exact same problem occurs with ThinkPad W530. The hardware specification of that laptop is pretty much the same as the one for my modded T420, but this time there are no hardware/software mods. Fresh Ubuntu Studio install. It would be interesting to see if the problem occurs with the following combo - ThinkPad T42 and PCMCIA card w/ VIA chip. I've got the T42 from the flea market and the resurrections process will take a few days, but I will let you know how it goes.

alesandar commented 3 years ago

I won't be able to perform the tests with the PCMCIA, because I have probably fried the T42 mobo.

takaswie commented 3 years ago

I attempt to implement media clock recovery to suppress the issue. If still interested, please test current HEAD of topic/media-clock-recovery remote branch (3a2c5fd52657).

For jackd, I recommend to use -S (--sync) option.

takaswie commented 3 years ago

I note that patchset for the issue is posted to upstream:

takaswie commented 3 years ago

The patchset was merged to upstream. The patches are also available in master branch of the repository. Let me close the issue. Thanks for your cooperation.