thebigmunch / audio-metadata

A library for reading and, in the future, writing audio metadata. https://audio-metadata.readthedocs.io/
https://forum.thebigmunch.me
MIT License
53 stars 11 forks source link

Handle invalid (null terminated) Vorbis comments #10

Closed replicant0wnz closed 4 years ago

replicant0wnz commented 4 years ago

Got a new one :-D

Traceback (most recent call last):
File "/usr/local/bin/gms", line 11, in <module>
sys.exit(run())
File "/usr/local/lib/python3.6/dist-packages/google_music_scripts/cli.py", line 968, in run
args.func(args)
File "/usr/local/lib/python3.6/dist-packages/google_music_scripts/commands.py", line 465, in do_upload
delete_on_success=args.delete_on_success
File "/usr/local/lib/python3.6/dist-packages/google_music_scripts/core.py", line 334, in upload_songs
no_sample=no_sample
File "/usr/local/lib/python3.6/dist-packages/google_music/clients/musicmanager.py", line 245, in upload
track_info = mm_calls.Metadata.get_track_info(song)
File "/usr/local/lib/python3.6/dist-packages/google_music_proto/musicmanager/calls.py", line 297, in get_track_info
track.track_number = int(track_split[0])
ValueError: invalid literal for int() with base 10: '1\x00'

I added a print statement at line 259 on google_music_proto/musicmanager/calls.py to get the offending file. Here's the FLAC header for it (kinda big due to embedded image):

http://dallaslamers.org/bad_flac.txt

thebigmunch commented 4 years ago

If I could get the file itself, that'd be best. Besides just being easier to check on, the flac tools actually support reading some files that violate the spec in different ways. So, I can't be sure of the actual binary layout.

replicant0wnz commented 4 years ago

Here ya go! Just lemme know after ya grab it so I can whack it ..

http://dallaslamers.org/flac/

thebigmunch commented 4 years ago

Got it. Thanks.

thebigmunch commented 4 years ago

So, yeah, almost all the Vorbis comments in that file are invalid according to the spec. Most of them are terminated with a null byte when it is clearly spelled out in the spec not to null terminate them.

2019-08-08_17-45-21

Not sure what wrote those, but it's doing it wrong. This would've failed with mutagen, too, as it doesn't strip the null bytes from them either. Dunno what they end up like on Google Music if uploaded with their Music Manager, but it's possible you'd see some metadata issues then, too.

I've chosen to stick to the specs for things, but this does let me know I should add some validation and raise an exception in this case. Gonna move this issue to the audio-metadata repo for that reason.

thebigmunch commented 4 years ago

PS Thanks for finding me more files that break things ^_^

People truly don't understand how much we programmers want/need them.

thebigmunch commented 4 years ago

Ok, so, just for future reference, I tested with Google's Music Manager for shits and giggles. It strips null bytes somewhere along the way. Not sure I'd do that since audio-metadata would be throwing exceptions in this particular case in the future, but it's nice to know these quirks.

thebigmunch commented 4 years ago

I don't think there's actually anything I can do about this one but pass it through. I'll probably just add a try/except to google-music-proto where the exception occurs. If I end up trying to add a number field class for vorbis comments, I could handle this particular case there.

replicant0wnz commented 4 years ago

Think I'm just gonna run my entire collection through a tag "fixer" or just write my own. Thanks for taking a look!

replicant0wnz commented 4 years ago

Quick hack:

https://github.com/replicant0wnz/strip_null