ruuda / claxon

A FLAC decoder in Rust
Apache License 2.0
287 stars 28 forks source link

A blueprint for seeking feature #29

Open TonalidadeHidrica opened 3 years ago

TonalidadeHidrica commented 3 years ago

I'm planning to add seeking feature to this library, as I mentioned in #28 . This issue has been opened for tracking the design plan for this new feature.

The methods to add

First, we are going to add new constructors new_seekable and new_seekable_ext in impl <R: Read + Seek> FlacReader {. The purpose of creating a new struct is that we have to do an additional process right after reading all the metadata blocks: save the position of the reader. This enables the reader to determine the range for binary searching for every seek. This position is going to be stored in a new field of FlacReader, audio_block_start_position: Option<u64>.

Next, we are going to add new function seek to FrameReader and FlacSamples, both of which takes the sample number (0-indexed).

Concerns and future tasks

Thanks for reading this lengthy draft! Feel free to make questions or oppositions. I'll get started in a few days anyway.

ruuda commented 3 years ago

Thanks for writing this up!

we are going to add new constructors new_seekable and new_seekable_ext

I think in hindsight, adding these new_ext constructors, and parsing so much in the constructor, was a mistake. It forces reading blocks that the caller may not be interested in, and the only way to expose them is by storing them in FlacReader, so they need to be allocated on the heap. The FlacReaderOptions are not flexible enough to accomodate all use cases. The design that I am now gravitating towards is to expose two things:

I am working on this in the metadata3 branch. My feeling is that seeking should follow the same approach: expose a lower-level function for reading a block (essentially FrameReader::read_next_or_eof, there is no real need for the FrameReader struct). Then seeking could be another one of those lower-level functions that takes a reader, a sample number, and optionally a SeekTable, and it positions the reader right at the point where FrameReader::read_next_or_eof would read the frame that contains the desired frame number. An additional advantage of this, is that only this function needs an io::Seek constraint on the reader.

Therefore, the next frame (this is equivalent to claxon::frame::Block, right?)

Yes, the flac format refers to blocks of raw audio data as blocks, and blocks encoded in the bitstream as frames.

The range of search is between the beginning of audio frame (that was obtained in the constructor described above) and the end of the stream std::io::SeekFrom::End(0). However, if possible, this function uses SEEKTABLE metadata to narrow down the range, and thus speed up the seeking process.

Yes, that sounds good!

By the way, I found that claxon::metadata::SeekTable does not expose any API to access the data. Shall I add some as well?

It’s not yet exposed indeed, parsing isn’t even implemented yet, so I would start with that first.

Also, I think that the data should be stored with something like BTreeMap rather than Vec.

Depending on the size of the seektable, it may be faster to have a vec with binary search or even linear search, but for now I would go with whatever makes the implementation simpler.

For every step of binary searching, we find the first occurrence of FRAME_HEADER, by matching both sync code and 8-bit CRC provided at last. Calculating CRC-16 for entire frame makes it even more robust, but I think it is excessive work, and there are very low possibility that CRC-8 accidentally matches.

Yeah just the header should be sufficient, the flac format takes care to avoid creating the frame header bit pattern in other places, and if the CRC of the header matches and the sample number makes sense, then that should be fine. I expect we need to verify that the sample numbers are increasing as the position in the stream increases, otherwise it is probably possible to create an adversarial input that breaks the binary search.

In the course of seeking, when the search range has been narrowed down enough, we switch to linear searching.

That may be a good future optimization, but I would start with the simplest option possible. But we need some kind of forward search anyway, right? If binary search tells you an offset in the stream, but that offset is not the start of a frame header, then we need to search ahead (or backwards) until there is some frame, otherwise the binary search can’t continue.

FlacSamples::seek seeks the reader to the desired sample.

That could be a second step, I would start with searching for the frame; I’m not sure how useful seeking in FlacSamples is, because it is more of a toy API anyway. It is very convenient for just opening a file and getting the audio data, but anything that cares about performance, or wants to do some more advanced things (like seeking), would likely read the frames manually anyway.

TonalidadeHidrica commented 3 years ago

Thanks for your detailed review.

I understood that adding two more constructors are not a good way. So, as I understand, the high-level API FlacReader should do basic things, and advanced features like seeking should be provided only via low-level API, at least for now, right?

So, the low-level seeking function will be like this. Does it seem good?

/// Seek the given reader to an appropriate position so that
/// the next frame returned by `FrameReader::read_next_or_eof`
/// will contain the sample with the given index, if exists.
fn seek_reader_by_sample_number<R: io::Read + io::Seek>(
  reader: &mut R,
  sample_number: u64,
  audio_block_start_position: Option<u64>,  // Maybe it should not be Option?
  seek_table: Option<SeekTable>,
) -> Option<u64> {
  // ...
}

Do I have to implement new seeking feature on metadata3 branch? Or should I wait until the branch is merged? What is the tasks left for that branch?

Therefore, the next frame (this is equivalent to claxon::frame::Block, right?)

Yes, the flac format refers to blocks of raw audio data as blocks, and blocks encoded in the bitstream as frames.

Thanks for teaching me. I've totally confused them.

Regarding with the seektable, are you planning to implement parsing and storing it in metadata3 branch? Or can I do it instead?

For every step of binary searching, we find the first occurrence of FRAME_HEADER, by matching both sync code and 8-bit CRC provided at last. Calculating CRC-16 for entire frame makes it even more robust, but I think it is excessive work, and there are very low possibility that CRC-8 accidentally matches.

Yeah just the header should be sufficient, the flac format takes care to avoid creating the frame header bit pattern in other places, and if the CRC of the header matches and the sample number makes sense, then that should be fine. I expect we need to verify that the sample numbers are increasing as the position in the stream increases, otherwise it is probably possible to create an adversarial input that breaks the binary search.

Well actually, when it comes to an adversarial input, I guess it is possible to provide a fake header where sample numbers "seem" to be increasing but still results in breaking binary search. So I think we can leave it as future tasks as well.

In the course of seeking, when the search range has been narrowed down enough, we switch to linear searching.

That may be a good future optimization, but I would start with the simplest option possible. But we need some kind of forward search anyway, right? If binary search tells you an offset in the stream, but that offset is not the start of a frame header, then we need to search ahead (or backwards) until there is some frame, otherwise the binary search can’t continue.

Um, it is my fault that I didn't explain clearly, but this is a necessary step for binary seeking, not an optimization. Simply put, binary search consisting of repetition of the following recursive step:

I've already implemented the similar algorithm for the lewton crate here, so take a look at it if you are interested in.

FlacSamples::seek seeks the reader to the desired sample.

That could be a second step, I would start with searching for the frame; I’m not sure how useful seeking in FlacSamples is, because it is more of a toy API anyway. It is very convenient for just opening a file and getting the audio data, but anything that cares about performance, or wants to do some more advanced things (like seeking), would likely read the frames manually anyway.

At least, I want seekable FlacSamples for easy adoption to my project :P

ruuda commented 3 years ago

So, as I understand, the high-level API FlacReader should do basic things, and advanced features like seeking should be provided only via low-level API, at least for now, right?

Yes, that would be a good first step either way.

Do I have to implement new seeking feature on metadata3 branch? Or should I wait until the branch is merged? What is the tasks left for that branch?

I intend for metadata3 to be merged into master, but it is currently blocked on performance regressions caused by dropping the custom buffered reader and switching it out for BufRead. I expect it can be resolved, and basing seeking off of metadata3 would be easier, but if you prefer to base it off of master, I can rebase metadata3 later; I’ll leave it up to you.

Regarding with the seektable, are you planning to implement parsing and storing it in metadata3 branch? Or can I do it instead?

I have no plans to implement it, you can go ahead :) This would be a welcome addition in a separate pull request either way.

Well actually, when it comes to an adversarial input, I guess it is possible to provide a fake header where sample numbers "seem" to be increasing but still results in breaking binary search. So I think we can leave it as future tasks as well.

If there are still adversarial cases, it should be possible to find them with a fuzzer. There are already a few fuzzers, maybe we can add one where the fuzz input is first a 32-bit offset to seek to, and the remainder of the input is the flac file. The implementation of the fuzzer would be to just do the seek. If there is an infinite loop, libFuzzer will report it as a hang. But a good first step would be to implement seek indeed, we can add the fuzzer later.

Simply put, binary search consisting of repetition of the following recursive step

Yes, I think we are on the same page here :+1:

At least, I want seekable FlacSamples for easy adoption to my project :P

Out of curiosity, can you tell a bit more about your use case?

TonalidadeHidrica commented 3 years ago

So, as I understand, the high-level API FlacReader should do basic things, and advanced features like seeking should be provided only via low-level API, at least for now, right?

Yes, that would be a good first step either way.

Do I have to implement new seeking feature on metadata3 branch? Or should I wait until the branch is merged? What is the tasks left for that branch?

I intend for metadata3 to be merged into master, but it is currently blocked on performance regressions caused by dropping the custom buffered reader and switching it out for BufRead. I expect it can be resolved, and basing seeking off of metadata3 would be easier, but if you prefer to base it off of master, I can rebase metadata3 later; I’ll leave it up to you.

Regarding with the seektable, are you planning to implement parsing and storing it in metadata3 branch? Or can I do it instead?

I have no plans to implement it, you can go ahead :) This would be a welcome addition in a separate pull request either way.

Well actually, when it comes to an adversarial input, I guess it is possible to provide a fake header where sample numbers "seem" to be increasing but still results in breaking binary search. So I think we can leave it as future tasks as well.

If there are still adversarial cases, it should be possible to find them with a fuzzer. There are already a few fuzzers, maybe we can add one where the fuzz input is first a 32-bit offset to seek to, and the remainder of the input is the flac file. The implementation of the fuzzer would be to just do the seek. If there is an infinite loop, libFuzzer will report it as a hang. But a good first step would be to implement seek indeed, we can add the fuzzer later.

Simply put, binary search consisting of repetition of the following recursive step

Yes, I think we are on the same page here 👍

At least, I want seekable FlacSamples for easy adoption to my project :P

Out of curiosity, can you tell a bit more about your use case?

Well, FlacSamples is an easy iterator that iterates over samples, so it is easy to adopt to iterator-based audio signal processor like rodio. If we don't have FlacSamples::seek, we have to use FrameReader and then skip appropriate samples to achieve precise seek to a specific sample, so implementing it will reduce the effort for user, I think.

joshhansen commented 1 year ago

I find myself looking for seek functionality in a flac decoder - did you ever make any progress on this, @TonalidadeHidrica ?

TonalidadeHidrica commented 1 year ago

@joshhansen Unfortunately I didn't, and if you would try implement this, I'd be very happy!