video-dev / hls.js

HLS.js is a JavaScript library that plays HLS in browsers with support for MSE.
https://hlsjs.video-dev.org/demo
Other
15.01k stars 2.59k forks source link

Accounting for priming samples in an mp3 segments #5099

Open reckoner165 opened 1 year ago

reckoner165 commented 1 year ago

What do you want to do with Hls.js?

I'm attempting to discard the first 1152 samples off of mp3 fragments in an HLS stream. The fragments are encoded using LAME, and append samples at the beginning. While the startPTS accounts for this, one can still here an audible click at the transition between fragments. I want to trim this offset while adding these fragments to the source buffer. I'm using this code as a reference.

Test Stream with the priming clicks The 0.048second init segment containing silence was added to the playlist to prevent the stream from being misdetected as aac and throwing MediaErrors. I don't think this is relevant to the issue with priming samples

What have you tried so far?

I have written a customFragmentLoader that parses through response.data for each fragment and obtains the exact number of priming samples at the start of each fragment.

function processContext(context, primingData) {
  const newContext = Object.assign({}, context);
  // this does not work
  newContext.frag.start += primingData.frontPaddingDuration;
  return newContext;
}

class FragmentLoader extends Hls.DefaultConfig.loader {
  constructor(config) {
    super(config);
    const load = this.load.bind(this);
    this.load = function (context, config, callbacks) {
      const onSuccess = callbacks.onSuccess;
      callbacks.onSuccess = function (response, stats, context) {
        if (context.url.includes('raw_audio')) {
          // Parser that returns priming sample duration in seconds
          const primingData = getPrimingData(response, response.data);
          const newContext = processContext(context, primingData);
          onSuccess(response, stats, newContext, undefined);
        }
        else {
          onSuccess(response, stats, context, undefined);
        }
      };
      load(context, config, callbacks);
    };
  }
}

It seems like hls,js does not offer direct access to the source buffer. Updating context.frag does not help as this data gets overwritten after the transmux step. Is there a way I can update context.frag such that it accounts for the time offset, and make those values persist when the fragment is added to the source buffer?

fmqa commented 1 year ago

I am interested in this as well. A naive approach based on adjusting the timestampOffset in buffer-controller.ts (https://github.com/video-dev/hls.js/blob/cccd825b0f42afb9f01c7116dd5f423e1263aa4f/src/controller/buffer-controller.ts#L359) does not seem to work, unfortunately.

Is there any way to extend hls.js to support MP3 codec delay?

robwalch commented 1 year ago

The BUFFER_APPENDING event is emitted with muxed data from the worker prior to appending to SourceBuffers, and prior to updating level timing (updateLevelTiming shifts the timeline according the parsed media timestamps like startPTS). Adding a listener to this event is probably the best place to modify muxed fmp4 data and fragment timing prior to append.

If you are forking the project and want to drop MP3 samples directly, modify MP3Demuxer. Disabling the worker with "enableWorker": false and stepping through its demux method should give you an idea of how it works.

fmqa commented 1 year ago

Hello @robwalch. I forked the project to extend the MP3Demuxer with support for parsing the Xing/Info tags, which contain the encoder delay and padding [1]. I think it's best to detect the values from the header if available. Note that this is an issue with hls.js currently since the Xing/Info pseudo-frame should be ignored after extracting the delay information [2]. IOW we're processing an extra frame where we shouldn't (thus unintentionally adding to the delay and exacerbating the issue).

Despite the hls.js code being extensible and understandable to accommodate the above change, there remain some impediments. We have to bring the processing pipeline to drop or ignore padding samples at both the front and the back of the track. As I understand your comment, there are 2 options to achieve this:

1) Drop samples in MP3Demuxer. This would be easiest, but unfortunately the MP3 encoder delay is not usually a multiple of the frame size [3] - and I'm not sure if we can do something like dropping 576 samples (=0.012s encoder delay samples on a 48kHz LAME encode) without accounting for frame boundaries.

2) Have the SourceBuffer filter out the time range via appendWindowStart, appendWindowEnd and timestampOffset: This is what I tried yesterday on a private branch. Putting aside the fact that I didn't manage to get it working, my impression was that it is an invasive change and it did require passing the padding values down all the layers to the MP4 remuxer & the BufferController and also requires some careful PTS adjustment. Despite that, it didn't work - but that may be due to Browser issues (indeed, I've never managed to get any form of time-based SourceBuffer trimming to work under Firefox even in minimal test cases, perhaps because of subtle differences in the interpretation of the standard).

In general, I would like to prepare a PR to fix this, at least when I get the "dropping the samples" part sorted out. Right now it's not quite there yet, and proposing only half a fix (handling the Xing/Info header correctly) would not be a good look.

[1] https://github.com/fmqa/hls.js/blob/master/src/demux/mpegaudio.ts

[2] https://thebreakfastpost.com/2016/11/26/mp3-decoding-with-the-mad-library-weve-all-been-doing-it-wrong/

[3] https://lame.sourceforge.io/tech-FAQ.txt