openairplay / airplay2-receiver

AirPlay 2 Receiver - Python implementation
2.13k stars 133 forks source link

Sample rates and synchronization #10

Closed systemcrash closed 3 years ago

systemcrash commented 3 years ago

Automate codec setup: all ALAC formats that airplay handles, are now automated. AAC also. OPUS is possible. PCM should work (should be trivial to tweak if any issues arise).

The other interesting commit fixes playback latency so that airplay2-receiver plays nice with other airplay receivers.

glmnet commented 3 years ago

Hey there, looking at this, and your branches on the repo, how it is even possible to sync to work close to a 1ms sync if ptp is not done?

I see your ptp branch is missing some glue code may be too? (did not tried it yet but just adding a file won't do much) I see the iphone broadcasts PTP messages with the time image

but after play is set I get the plist:

{'networkTimeFlags': 0,
 'networkTimeFrac': -6499027745031323648,
 'networkTimeSecs': 1527018,
 'networkTimeTimelineID': -9147822675370376762,
 'rate': 1,
 'rtpTime': 537731188}

might there be an issue on the datatype of neworkTimeFrac / networkTimeTimelineID?

I'm just trying to understand if I am missing something here :)

Great work.

systemcrash commented 3 years ago

The PTP branch is largely complete - but where to hook in is... flexible. It consumes PTP from any airplay sender, and it knows where in the timeline it should be. ( It will likely never be able to account for PTP interface egress delay mentioned in the IEEE specs, but we can conveniently exclude it in the round-trip calculations. )

networkTimeTimelineID is your (e.g. iPhone) sender device MAC (or close to it), I think. networkTimeFrac is some PITA format equating to fraction of a second in units of 1/2^64. The float makes it negative sometimes, so not certain how it's formatted. Might just be 64 bit precision.

With this PR - I can get airplay2-receiver to sync to within a msec of my Sonos devices when they're all grouped together. PTP times conveyed regard play-head in the track, ignoring buffers. So whichever mechanism is used, PTP or RTP, play-heads can sync very closely. PTP also helps when choosing 'primaries' (master in PTP terminology). This PR adds compensation for output buffers on the local system which pyalsa can query from the audio interface:

        audioDevicelatency = \
            self.pa.get_default_output_device_info()['defaultHighOutputLatency']

Not sure how to hook into PTP on the receiver side. The internal RTSP handling seems to rely on RT(S)P and its timestamps only (rtpTime), which with this PR, seems to get 'acceptably' close.

glmnet commented 3 years ago

The PTP branch is largely complete

Indeed I successfully run it and get the PTPcorrection messages. Great work!!

The networkTimeTimelineID is the same as the ClockIdentity in the PTP so may be it can be checked to sync to that clock when the play request (rate: 1) is received, may be necessary if multiple streams are played, not sure, not necessary yet

networkTimeFrac is some PITA format equating to fraction of a second in units of 1/2^64.

I am all lost here, it will be impossible to sync without this (read on below why I think that) So far I got a few data samples

    Received data             Bytes (LE)             UInt32 (LE)        Bytes (BE)     UInt32 (BE o first 4 bytes)
       303680279753523200 000000009be33604                  70706075 R: 0436e39b00000000   2615358980
       -97880728422842368 00000000ff41a4fe                4272177663 R: fea441ff00000000   4282492158
      6300335557121671168 00000000cf496f57                1466911183 R: 576f49cf00000000   3477696343
     -1074596629866086400 000000000f4416f1                4044768271 R: f116440f00000000    256120561
     -6473930193831460864 00000000cafa27a6                2787637962 R: a627faca00000000   3405391782
      3936977906407833600 000000008cf4a236                 916649100 R: 36a2f48c00000000   2364842550

I expect to get values < 1.000.000.000 which is 1 second in nanoseconds, it's weird they are only using 4 bytes in the field, I checked the binary plist and is ok, field is marked as integer of 8 bytes.

This PR adds compensation for output buffers on the local system which pyalsa can query from the audio interface:

I understand that, so it tells you the time the system spends between you do sink.write and the audio is actually rendered, this is necessary but should not be sufficient.

Not sure how to hook into PTP on the receiver side

This I believe is the most important part, when airplay sends SETRATEANCHORTIME this is where you must sync to other peers, it's telling you the master clock time which must match the rtpTime, what the code is currently doing is anchoring the rtpTime with the systems monotonic time, but this message might come a bit earlier or before the time that should be matched against, rtpTime is not right now when the message is received.

You're currently just getting the PTPcorrection, that is useless alone, you need to get the (corrected) preciseOriginTimestamp from the clock master.

glmnet commented 3 years ago

Also, which this patch I hear some skips in audio after pause -> play, there should not be sleep in the audio output routine, I see airplay sends many seconds of audio beforehand, so if you are behind/ahead you should be able to just seek to the right position instantly. (Just thinking out loud here)

systemcrash commented 3 years ago

Not sure how to hook into PTP on the receiver side

This I believe is the most important part, when airplay sends SETRATEANCHORTIME this is where you must sync to other peers, it's telling you the master clock time which must match the rtpTime, what the code is currently doing is anchoring the rtpTime with the systems monotonic time, but this message might come a bit earlier or before the time that should be matched against, rtpTime is not right now when the message is received.

Granted. The PTP time is system upticks from the master. Airplay does not use real or NTP.

You're currently just getting the PTPcorrection, that is useless alone, you need to get the (corrected) preciseOriginTimestamp from the clock master.

It does. They look like:

PTP320 FOLLOWUP srcprt-ID: 32820 clockId: xxxxxxxxxx001c seq-ID: 00000100 correctionNanosec: 000250500 PreciseTime: 1094098.475330875

That's what PTP module applies correction to by determining the propagation delay. The PTP module has everything needed, and it's largely done (you're only a master when your MAC is the lowest and you're broadcasting PTP). But where we hook in to provide this data to the rest of the system still needs a bit of work.

systemcrash commented 3 years ago

Also, which this patch I hear some skips in audio after pause -> play, there should not be sleep in the audio output routine, I see airplay sends many seconds of audio beforehand, so if you are behind/ahead you should be able to just seek to the right position instantly. (Just thinking out loud here)

That's the braking mechanism in the decoder module. Nothing I've touched.... 🤷 ( I think )

glmnet commented 3 years ago

It does. They look like:

PTP320 FOLLOWUP srcprt-ID: 32820 clockId: xxxxxxxxxx001c seq-ID: 00000100 correctionNanosec: 000250500 PreciseTime: 1094098.475330875

What I tried to say is that your output pipe is just giving you the PTPcorrection but what you really want is the master's uptime tick? isn't it?

Check my code I modified it to respond with master clock, and tried to use it in the audio pipe, the threading is an awful implementation, not sure how to do that. Also not sure how to pass the DEVICE_ID there.

It's not going to sync as the fractional part is unknown, but the idea is there.

I pushed an ugly attempt here https://github.com/glmnet/airplay2-receiver/tree/glmnet

EDIT: I now realize I removed the py audio delays, etc. That was not intentional.

systemcrash commented 3 years ago

That could work. Read the commit message for how to init PTP. (As I wrote it,) It takes the device MAC which is gathered in ap2 startup. So you either pass the MAC to PTP and pass PTP obj up to audio, or init PTP in audio and pass up MAC from ap2 to audio obj. Something like that. Although ClockIDs are all based on MACs - PTP does not need MACs because nothing is actually 'verified' per-se. You could use a random value, but this might become confusing when you're debugging.

You also need to account for seeking.

TL;DR: MACs are not mandatory.

Where we place and init the PTP module may be important. It should be 'close' to the audio module, since passing values through the stack implies a time penalty which must also be accounted for.

For the fractions of a second (because only PTP nanos time is interesting here) one must do something like: networkTimeFrac (as BE --> float) * ( 1 / 2^64). This should yield a value between 0 and 1.

I don't think this calculation should be done in the PTP module - rather, there where the receipt of the plist is done to get some sensible nanos value to compare to what PTP thinks is right. We could augment the PTP module to accept networkTimeFrac and do it there, since this implementation is being used in airplay2.

glmnet commented 3 years ago

networkTimeFrac (as BE --> float) * ( 1 / 2^64). This should yield a value between 0 and 1.

I've tried a few calcs but cannot get a value between 0 and 1

systemcrash commented 3 years ago

Remember that it's an unsigned int 64. You may need to int and or abs to get the correct value.

glmnet commented 3 years ago

Remember that it's an unsigned int 64. You may need to int and or abs to get the correct value.

ok, I'll try this

samples = [ 303680279753523200, -97880728422842368, 6300335557121671168, -1074596629866086400,
     -6473930193831460864, 3936977906407833600 ]

import numpy as np

for sample in samples:
    a = np.array(sample, dtype=np.int64)
    r = (a).astype(np.uint64)
    absSample = abs(sample)
    nthFactor = (0.5**64)
    print(r * nthFactor, absSample * nthFactor)

output:
0.01646254095248878 0.01646254095248878
0.9946938750799745 0.005306124920025468
0.34154187492094934 0.34154187492094934
0.9417459999676794 0.05825400003232062
0.6490475409664214 0.35095245903357863
0.2134239999577403 0.2134239999577403
systemcrash commented 3 years ago

Looks good! Let's avoid the baggage of extra libraries and just abs.

Truncate after 9dp. Combine sec with frac and push this through to the audio module as int nanos. I reckon that should do it.

glmnet commented 3 years ago

but abs gives different values (updated code snippet above with abs)

systemcrash commented 3 years ago

Hmm. I was hoping to deal with the uint64 cleanly whether positive or negative, by just stuffing it into a float and converting to int later on.

Cannot quote easily, on email now...

glmnet commented 3 years ago

nthFactor = 0.5 ** 64

for sample in samples:
    sample_bytes = sample.to_bytes(8, byteorder="big", signed=True)
    uint64_sample = int.from_bytes(sample_bytes, byteorder="big")

    print(uint64_sample * nthFactor)

this gives the same results as numpy, btw numpy is already in requirements.txt but anyway

glmnet commented 3 years ago

pushed some more with this idea, seems to work!!! 🎉 it still tries to resync quite often, presumably ptp is not accurate, or I don't know

systemcrash commented 3 years ago

Cool! I'll take a look.

As I understand it, the airplay SDK uses a specific clock PPM. So that everything fits in 32/64bits, there is some compromise on precision and it syncs the internal clock from ticks with NTP (itself 64 bit) occasionally. Edit: that happens once a second, it seems.

The other possibility is the variable delay for devices on Wi-Fi (or the "weather" on the Internet). We can refine the update sampling to occur only between tracks so there's no jumping as a result of re-sync there. Can be problematic for longer mixes. I guess this is a good argument to just stay with RTP. (Or at least be able to choose/have both).

PTP itself may require a few refinements. The clocks are meant to be continuously synchronizing, so I surmise the work needs to happen outside of PTP. (PTP also allows for sub nanosecond precision, but I skipped the subs altogether, since most iDevices can barely manage nanos accurately enough.)

Phase accuracy is a decent goal, so locking timelines to the nanosecond is ideal. This might also be why airplay has a locked PPM, I think.

systemcrash commented 3 years ago

Edit: I also pushed some updates to the PTP module, which I think you should pull in (and apply your edits to). I made MAC at startup optional. Just use e.g. PTP.spawn(None) if you've no MAC available.

systemcrash commented 3 years ago

Tried your current iteration, and it works well. I skipped a track and the whole thing got stuck on: buffer: read is not possible - empty buffer and didn't work after that. I'm going to integrate the perhaps more useful functions into the PTP module. I added some flags you can use to dis/enable print of messages. You can now do PTP.spawn(). Check the args in PTP. MAC (defaults to None, which uses dummy MAC) and Flags (default is all msgs off)

Joshfindit commented 3 years ago

We can refine the update sampling to occur only between tracks so there's no jumping as a result of re-sync there.

His is just my input as a (probably extremely picky) user, but I notice speakers being out of sync by at most 5ms, and at 10ms it affects my ability to enjoy what I’m listening to. Others can also notice if I point out the “echo”.

If a system has to handle being out of sync due to wifi I totally understand that and would much rather the system be honest with users by “jumping” to sync as soon as it can or I am likely to do it myself by stopping and restarting the music.

On the other side, it’s on-brand for Apple to hide all the flaws by simply smoothing things over (like the transitions on iPhones being slower than needed simply because some devices can’t handle the faster speeds) so I’m not going to die on this hill as the kids say.

systemcrash commented 3 years ago

His is just my input as a (probably extremely picky) user, but I notice speakers being out of sync by at most 5ms, and at 10ms it affects my ability to enjoy what I’m listening to. Others can also notice if I point out the “echo”.

This PR fixes that. The (admittedly off-topic) discussion on this PR is trying to shoe-horn PTP into the works. I guess when PTP is up to scratch with smooth synching, this PR would do well to be updated :)

glmnet commented 3 years ago

Yes I should've apologized before because of the thread hijack but didn't find a better way to contact @systemcrash

@Joshfindit remember this is very experimental, what works for one might not work for others. I have much better results with ptp now (but skips are noticeable) than this PR.

I'm on windows and have a home pod mini and an airport express (they sync perfectly of course) and my branch with ptp sync quite well but is very buggy, e.g it has to join when the others are already connected and skipping tracks doesn't work (it disconnects for some yet unknown reason)

glmnet commented 3 years ago

@systemcrash not sure why, but not announcing as a timing peer here i.e. removing these lines, causes AP to not try to use the python node as clock master. Useful as mac addresses are hard to spoof

systemcrash commented 3 years ago

@systemcrash not sure why, but not announcing as a timing peer here i.e. removing these lines, causes AP to not try to use the python node as clock master. Useful as mac addresses are hard to spoof

Understood. Although we are not spoofing a MAC, we are just presenting a dummy MAC (if you do not provide a MAC in .spawn() function). The IP is always used for communication, but the MAC is used to identify the clock.

systemcrash commented 3 years ago

This PR has been tested extensively and been open for review for a while. I'm merging this, which will make way for work on the PTP branch.