Closed tehmaze closed 3 years ago
Thank you, but as go-mp3 output is always 2-channels regardless of source, is NumberOfChannels
useful in your use case?
+1 for this method. It can be useful when conversion into non-interleaved data structure is needed.
I'm not sure I understand what you meant, but the request is to have non-inteleaved decoding result for monaural mp3 source rather than NumberOfChannels
?
Well, I'll try to explain. go-mp3
serves really well here because it streams PCM data and it's most of the time interleaved. However, some audio APIs require non-interleaved data layout. Like, for example VST expects data in a form of [numChannels][numFrames]float64. To do such conversion it's needed to know how many channels is within interleaved array.
Talking about mp3 specifically, I think there could be only mono or stereo data, so probably it would just make sense to check if source data is stereo/mono with some enum flag.
Hope this helps to understand the need. Thanks.
To do such conversion it's needed to know how many channels is within interleaved array.
As I said at https://github.com/hajimehoshi/go-mp3/pull/20#issuecomment-355460459, the number of channels of the decoded stream is always 2, regardless of whether the source is mono or stereo.
That's ok, but how can I distinguish if those two channels are decoded mono file? Does it have duplicated data for both channels?
Does it have duplicated data for both channels?
Yes. I might be missing something, but my understanding is that we don't have to care whether the number of source is 1 or 2, as go-mp3 duplicates data when the source is mono.
That's what I've mentioned in comment, for some API's it's needed to have data aligned in a different way.
Here is what go-mp3
offers:
Stereo: [LR][LR][LR]
Mono: [MM][MM][MM]
This is what VST will expect:
Stereo: [LLL][RRR]
Mono: [MMM]
I don't see an easy way of achieving such conversion without knowing number of channels of the source.
Also, PDMP3 allows to obtain that info.
I was wondering if always treating the decoded data as stereo would work.
Well, it will work in most of scenarios, but I think it would be usefult to give some more control.
Hmm, to be honest, I've not been convinced the necessity of the number of channels. I'd like to revisit this when we find an actual use case.
But this VST use case is the actual one. I'm workning on DSP pipeline project and would be realy happy to implement mp3 component backed with your decoder, but without knowing number of channels I cannot do described conversion.
Could you elaborate why always treating the decoded stream as stereo doesn't work?
Ok, first of all mono is not stereo. There always will be cases where people just want to process sound data and not play it. For this scenario I wish to treat mono as mono and not stereo.
Mp3 could be only mono or stereo, but PCM can have any number of channels. If I want to use your decoder to handle mono data only, I will not have other choice, but only to check buffer for the duplicated data to identify mono stream.
I'm not sure why you decided to duplicate data in the first place, because to me it seems illogical in a sense of performance and clarity of API.
Mp3 could be only mono or stereo, but PCM can have any number of channels. If I want to use your decoder to handle mono data only, I will not have other choice, but only to check buffer for the duplicated data to identify mono stream.
If you already know the source is mono, you can reprocess the decoded stream. The problem is the case when you don't know, but I'm still thinking always treating the stream as stereo should work. In this case, perhaps do you want to change the number of channels of PCM dynamically?
I'm not sure why you decided to duplicate data in the first place, because to me it seems illogical in a sense of performance and clarity of API.
It's because we don't have to distinguish the decoded stream is stereo or mono when playing. You are right, removing the distinction between stereo and mono might not be responsibility of go-mp3, but it is too late. We can't change this behavior for backward compatibility anyway. As we don't change this behavior, I'm not sure it is worth introducing NumberOfChannels.
If you already know the source is mono, you can reprocess the decoded stream. The problem is the case when you don't know, but I'm still thinking always treating the stream as stereo should work.
This is the exact situation I'm talking about. I want to treat mono as mono when I have it.
OK, to be honest, I'm not fully convinced, but I understood that there is a case when the distinction between mono and stereo is important on audio processing.
However, I think introducing only NumberOfChannels
would be confusing since the decoded stream is still stereo. Let me think the API design. Probably I'll introduce a new kind of NewDecoder
to return mono/stereo stream.
Glad to hear that.
I'm not sure which direction you'll decide to go, but please don't make a separate decoders for mono/stereo data.
I don't like adding NumberOfChannels()
which can return 1 for mono files, when the output of go-mp3
is in stereo. The name is bound to cause confusion. Why not make the API clear by adding:
// while go-mp3 output is hardcoded to stereo, in rare cases you might be interested
// to know if the original source was mono or stereo (MP3 only supports mono/stereo)
func OriginalNumberOfChannels() int
I do understand the wish for this API, and I :+1: for having it
In my opinion the status quo is exactly right. This library has minimal API surface, which is a great feat everybody should strive for. A fixed format of 2 bytes per sample and 2 channels is great as it is what you need almost all of the time. It handles mono sounds well by duplicating the data, it does not support 5.1 or 7.1 which is not the target of this library. If you need this kind of control you go for a more complex, harder to use but also more flexible library. That is not what this library is targeted at.
If I wanted to know whether the mp3 I am decoding is really mono, I would either pre-parse the mp3 in a separate step and check the header, or decode the whole thing using this library and compare every 2 bytes with their neighboring 2 bytes. This is already possible.
Complicating this library for a rare use case that can already be accomplished is not a good tradeoff in my opinion. I would argue that you rarely need to differentiate between mono and stereo and thus it can be a bit more work to get it done. That is a tradeoff that I like.
For these reasons I would vote to close this issue/pull request and make it a point to explicitly not implement it.
To the proponents of this: @tehmaze you can easily wrap a Decoder from this library to return a Config that has 2 set for the channel count. I do not see any problem at all for using this package in the azul3d engine. @dudk As said above, either use stereo sounds everywhere (the most simple thing to do with this library) or find out beforehand whether your Decoder contains mono data and in that case just wrap the io.ReadSeeker in a thing that returns only every other 2 bytes.
Yeah, I agree that I'd like to keep the API as minimal as possible.
Let me close this PR as there have not been updates in the discussion.
To be honest, I'm not sure how exposing a meta information about decoded mp3 file through a decoder type bloats API🙂
IMHO the library hides this information from the user and doesn't provide any convenient way to get it. So having such method would be complementary.
My pipelined mp3 source is not really used, so it's not crucial for me at this moment.
@dudk Of course I see your point, having this information is useful for specific use cases. But what I want to argue is that this library is built on the premise that you do not need any such configuration. The whole point seems to be the minimal way to do things. As a consumer I need not worry about the difference between mono and stereo. Neither do I need to worry about different sample sizes, WAV files can have 8 bits, 16 bits, 32 bits, signed/unsinged, integer or floating point samples. This is also fixed here. The only thing left to consider is the sample rate. I guess this was not fixed because implementing a conversion for that would make the code much more complicated and the problem can quite easily be done in a separate layer on top of the given io.ReadSeeker.
If we made the mono information available, other things might follow, like original precision. Why not tags?
Also it is just plain aesthetically unpleasing to me to have a function OriginalNumberOfChannels or the like. It just looks and feels wrong to me. Why do we (the average user) really care? As I said above, every step to a larger API should be carefully considered and cost and benefit must be weighed. It is already possible to find out if the mp3 is really mono so my opinion stands as long as you convince me of the opposite :-)
Just my two cents: if API simplicity is so much concern, go-mp3 could expose a lower-level, separate, package for rarer use cases like these (and perhaps internally use that lower-level package to implement the simpler interface).
I don't think adding OriginalNumberOfChannels() int
brings too much maintenance burden. The point of "slippery slope" though stands - of that I don't have an informed opinion.
Having this information would allow this package to be used in conjunction with the
azul3d.org/engine/audio
package; as this is required information for any Decoder.References: