thebigmunch / audio-metadata

A library for reading and, in the future, writing audio metadata. https://audio-metadata.readthedocs.io/
https://forum.thebigmunch.me
MIT License
57 stars 11 forks source link

Handle XING headers with 0 num_frames #11

Closed droopanu closed 4 years ago

droopanu commented 4 years ago

I'm getting the following error while trying to upload a folder:

[2019-10-14 12:13:36] Logging in to Music Manager [2019-10-14 12:13:37] Logging in to Mobile Client [2019-10-14 12:13:38] Loading local songs [2019-10-14 12:13:39] Comparing hashes Traceback (most recent call last): File "/usr/local/bin/gms", line 10, in sys.exit(run()) File "/usr/local/lib/python3.7/dist-packages/google_music_scripts/cli.py", line 920, in run DISPATCHcommand File "/usr/local/lib/python3.7/dist-packages/google_music_scripts/commands.py", line 364, in do_upload if generate_client_id(song) not in google_client_ids: File "/usr/local/lib/python3.7/dist-packages/google_music_proto/musicmanager/utils.py", line 21, in generate_client_id song = audio_metadata.load(song) File "/usr/local/lib/python3.7/dist-packages/audio_metadata/api.py", line 119, in load return parser_cls.load(fileobj) File "/usr/local/lib/python3.7/dist-packages/audio_metadata/formats/mp3.py", line 536, in load self.streaminfo = MP3StreamInfo.load(self._obj) File "/usr/local/lib/python3.7/dist-packages/audio_metadata/formats/mp3.py", line 492, in load bitrate = ((audio_size - frames[0]._size) 8 frames[0].sample_rate) / num_samples ZeroDivisionError: division by zero

Most probably one of the mp3 files is broken, but there is no way to figure out which one.

Would it be possible to add a debugging option to show the file that is being processed so that way I can see what mp3 is broken :)

Cheers, Liviu

thebigmunch commented 4 years ago

I have a test script here that will try to load all the detected audio files in the directory it's run in and print out any it failed to load. If you could post or email any file(s) it finds, that'd be helpful.

droopanu commented 4 years ago

Awesome! It worked, I found the faulty file.

Watch out as it's a big one (300MB).

https://drive.google.com/file/d/1mZbzaXHTnP9wGRd_iQ6Q7pql_qFSNipq/view?usp=sharing

Cheers, Liviu

On Mon, Oct 14, 2019 at 1:04 PM thebigmunch notifications@github.com wrote:

I have a test script here https://gist.github.com/thebigmunch/c80560c789ce72d46db301e15c265082 that will try to load all the detected audio files in the directory it's run in and print out any it failed to load. If you could post or email any file(s) it finds, that'd be helpful.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/thebigmunch/google-music-proto/issues/8?email_source=notifications&email_token=ABZXJMUDQUDVDTC4B3EEV3TQOOZSPA5CNFSM4JAJBVU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBDDOWI#issuecomment-541472601, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZXJMVHS3CXV4MURNXKPYLQOOZSPANCNFSM4JAJBVUQ .

thebigmunch commented 4 years ago

So, FYI, this file seems to have an invalid XING header that says there's 0 MPEG frames/data in the file, which is obviously not true. This file shouldn't even have a XING header as it is CBR and not encoded with LAME. This in turn screws up the bitrate calculation by giving audio-metadata an incorrect number of samples.

Probably should raise an exception on this somewhere so it fails predictably and gracefully with a useful message.

thebigmunch commented 4 years ago

PS This particular file is actually over Google Music's file size limit of 300 MiB as well, so it couldn't be uploaded anyway : P

droopanu commented 4 years ago

did not know there is a 300MB limit. Well ... good to know :)

On Mon, Oct 14, 2019 at 2:46 PM thebigmunch notifications@github.com wrote:

PS This particular file is actually over Google Music's file size limit of 300 MiB as well, so it couldn't be uploaded anyway : P

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/thebigmunch/audio-metadata/issues/11?email_source=notifications&email_token=ABZXJMR4YLBJVIB5QECYHMLQOPFPHA5CNFSM4JAJRV5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBDFTKA#issuecomment-541481384, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZXJMRXXOFTHEUIAEH2F7DQOPFPHANCNFSM4JAJRV5A .

droopanu commented 4 years ago

BTW, thank you very much for your quick reply :D

I can see 2 feature requests coming from this:

  1. Would be good if the uploader would indeed raise an exception for the divide by 0 failure, print the file name and skip it.
  2. Display an error when the file is too big and can't be uploaded, including the filename, and skip to the next file.

On Mon, Oct 14, 2019 at 3:30 PM Liviu Sas droopanu@gmail.com wrote:

did not know there is a 300MB limit. Well ... good to know :)

On Mon, Oct 14, 2019 at 2:46 PM thebigmunch notifications@github.com wrote:

PS This particular file is actually over Google Music's file size limit of 300 MiB as well, so it couldn't be uploaded anyway : P

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/thebigmunch/audio-metadata/issues/11?email_source=notifications&email_token=ABZXJMR4YLBJVIB5QECYHMLQOPFPHA5CNFSM4JAJRV5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBDFTKA#issuecomment-541481384, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZXJMRXXOFTHEUIAEH2F7DQOPFPHANCNFSM4JAJRV5A .

thebigmunch commented 4 years ago

For the latter, you can increase the verbosity to see the progress messages that will tell you that the large file wasn't uploaded because of the limit. Either way, it won't be uploaded and move right along. You just hadn't reached that point yet.

Edit: I might, however, change the failure messages to be displayed at a lower verbosity level than success messages so there is an option to see only failure messages. Or even attach the progress messages to a separate CLI option.

thebigmunch commented 4 years ago

I've added handling of MP3 with XING headers with 0 num_frames. audio-metadata will now load these files, including the one in this issue.

For the specific file in this issue, it technically is a loadable MP3 but seems like its incorrect. The only MPEG frame that isn't a CBR of 128Kbps is the XING frame at 64Kbps. A CBR MP3 using XING is supposed to use the Info identifier rather than Xing, this file did not. This actually confuses a number of programs to its actual stream info due to their implementations/guesswork.

mutagen assumes VBR due to the XING header not using the Info string and having the quality indicator field set. Since the XING header doesn't have the number of frames set, mutagen just takes the bitrate from the XING frame, which is the only one that is not 128Kbps and set the duration (length) to 0 seconds.

>>> import mutagen
>>> m = mutagen.File('ractive_music_without_subtitles_2019-09-07.mp3')
>>> vars(m.info)
{
    'bitrate': 64000,
    'bitrate_mode': <BitrateMode.VBR: 2>,
    'channels': 2,
    'encoder_settings': '',
    'frame_offset': 303,
    'layer': 3,
    'length': 0.0,
    'mode': 0,
    'padding': False,
    'protected': False,
    'sample_rate': 44100,
    'sketchy': False,
    'version': 1,
}

foobar2000 figures out that the file should be CBR (though I'm not sure how), but it also just grabs the bitrate from the XING frame. It then uses the bitrate to calculate the duration. Since the bitrate of the XING frame is half the bitrate of every other frame, it detects the duration as double of what it actually is.

audio-metadata now sets the bitrate mode to UNKNOWN as well as calculating the correct bitrate and duration for this file at the expense of having to count all the MPEG frames in the file. This operation took 10-12 seconds on this 300 MiB file with over 750,000 frames on my desktop. But, now it at least correctly loads it unlike mutagen. And MP3 files of a more average size obviously take much less time.