Decode enhanced audio stream

argilo commented 4 years ago

Currently the AM receiver only decodes the core audio stream. Decoding the enhanced audio stream would improve the audio quality since it would bring the bit rate from 20 kbps up to 36 kbps. We'll need to figure out how the enhanced audio packets are encoded.

pclov3r commented 4 years ago

@argilo I'm sure the NRSC-5 spec covers this but maybe this is helpful anyhow? Mentions about core and enhanced audio streams for AM https://www.eetimes.com/understanding-hd-radio-the-program-audio-chain-part-2/

Seems this is the most important part

Core and enhanced audio streams are always transmitted in the other IBOC transmission modes: AM hybrid mode, AM all-digital mode, and FM all-digital mode. In these dual-stream modes, core audio packets are grouped 4 (average) per PDU, with 8 PDUs per frame (fixed). This provides the receiver with a faster-arriving, smaller PDU to disassemble, to more rapidly recover the core audio. Enhanced packets are always transmitted 32 per PDU (average), one PDU per frame (fixed).

pclov3r commented 4 years ago

@argilo I'm curious if that article ever helped at all? I figure it was just a repeat of what was covered in the NRSC-5 spec manual.

argilo commented 4 years ago

The article does pretty much describe what's in the NRSC-5 spec. The part that needs to be sorted out is the codec, and that's not specified in NRSC-5.

pclov3r commented 4 years ago

Hum interesting. I thought the codec was all figured out already.

argilo commented 4 years ago

@awesie worked out all the details necessary to decode the modes used for FM, but AM uses other modes. My hacky workaround was to pretend that packet types 5 and 6 are the same as packet type 1 (https://github.com/theori-io/nrsc5/commit/84e9257df915dbf47b4897d6a181a61e887861a3#diff-83b04fa1cac5572bffd314a77e64924eR298-R299) but there are differences that need to be accounted for.

awesie commented 4 years ago

I'm looking at this a bit, but it is more difficult than the core audio since having a separate enhanced stream is not part of HE-AACv2.

From what I can tell so far, HDC block types 3, 4, 5, 6, and 7 can all have "enhanced data". We have already seen 5 and 6 in the AM data. 7 should be stereo data. 3 and 4 could be used with FM enhanced audio.

This enhanced data is not "just" SBR data though. In block types 5 and 6, there may be SBR data in the enhanced stream, but there is still other channel data before that. I'll continue working on it and post any additional revelations.

argilo commented 4 years ago

Interesting! I'm glad to see you're having a peek at the codec.

The other thing we'll need to do is align the Enhanced packets with the Core packets using the sequence numbers in the PDU headers. I'll take a crack at that soon.

awesie commented 4 years ago

Still trying to understand how this works.

The enhanced stream is parsed similar to the core stream, and if the block type in the core stream was 5 or 6, then the enhanced stream is parsed as stereo (whereas the core stream is mono). Once both streams are parsed, you now have 3 channels worth of spectral data (audio in the frequency domain). The core stream channel is merged with the enhanced stream channels so you end with 2 channels.

Parsing the enhanced stream is not hard since it is exactly the same as the core stream. I'm just not sure about the merge step. I don't know enough about audio compression to know if there is an obvious way to combine these frequency domain channels. Scalable AAC (N.B. not SLS) has been a part of the AAC spec since 1999, but I don't think anyone really used it until DRM used ER AAC-Scalable. I'll continue investigating to see if DRM code is insightful.

awesie commented 4 years ago

ISO/IEC 14496 Part 3 Subpart 4.5.2.2.4 covers decoding basic scalable AAC. We probably have what they refer to as "AAC-Only-M/S". The core stream is a Mid channel. The enhancement stream is Mid/Side channels. It should be as simple as add the two Mid channels to get Mid'; then use the normal AAC M/S decoding with Mid' and Side to get a L/R output.

awesie commented 4 years ago

I think alignment is a simple matter of using the "starting sequence number" field in the PDU header. It is defined as "initialized to zero for all streams simultaneously". It looks like once we receive a P3 PDU with the enhancement stream, it gives us the audio packets to use for the next 32 core packets or so.

awesie commented 4 years ago

Well, the sequence number almost worked. It was off by 64 audio packets (e.g. I needed to delay the enhanced stream by 64 packets). The basic stuff seems to be working now, though. It can parse the mono core stream and combine it with the M/S enhanced stream. I don't know how to handle a L/R enhanced stream, so let's up that never happens.

The next issue I need to fix is that the SBR data gets sent in both streams. The core stream sends the SBR data for one output channel and the enhanced stream contains the SBR data for the other output channel.

awesie commented 4 years ago

I found one station (1200AM near Detroit) that uses HDC block type 7 though I don't have a good recording of it. This is a stereo AM core stream, and it may have a stereo enhanced stream. For now, we can treat it the same as type 2.

argilo commented 4 years ago

Here's a recording of KSL, which uses HDC block type 7:

https://drive.google.com/file/d/13b8CYKQSMWIsezjhqz0WAONKQ1Aqwrdt/view?usp=sharing

As you suspected, I was able to decode the audio by adding this to the hdc_main_element function in syntax.c:

    case 7:
        decode_cpe(hDecoder, hInfo, ld, ID_CPE);
        break;

awesie commented 4 years ago

From what I can tell, that recording KSL does not have an enhanced stream? (just making sure I'm not crazy)

argilo commented 4 years ago

I didn't check that yet. I'll have a look now.

argilo commented 4 years ago

I suspect it does have an enhanced stream, but the receiver is not good enough at separating the analog audio from the secondary & tertiary sidebands to make it decodable.

argilo commented 4 years ago

That does seem to be the case: If I remove the 0.06 "fudge factor" from acquire.c then some (but not all) enhanced stream packets are successfully decoded and have a valid CRC.

theori-io / nrsc5

Decode enhanced audio stream #239