h2non / filetype.py

Small, dependency-free, fast Python package to infer binary file types checking the magic numbers signature
https://h2non.github.io/filetype.py
MIT License
661 stars 113 forks source link

FLAC files with ID3 tags are misdetected #121

Open stordoff opened 2 years ago

stordoff commented 2 years ago

Certain FLAC files (older ones, as I understand it) may have ID3 metadata tags instead of or addition to Vorbis tags. These files are reported to have a MIME type of 'audio/mpeg', instead of the expected 'audio/x-flac'. They are essentially standard FLAC files with an ID3 block prepended to them (roughly ID3<metadata>fLaC<FLAC file>), which seems to be causing this confusion.

There is a sample file available in this issue which demonstrates this.

I'm not sure if there is an easy/immediately obvious solution to this, as the fLaC header doesn't start after >2000 bytes of the sample file[1], and presumably could be further into the file based on the contents of the ID3 tags. If you do have the full file contents available though, something like if bytearray([0x66, 0x4C, 0x61, 0x43]) in file_byte_array: #return audio/x-flac appears to work.

[1] Quickly tested with:

file = '01 - Lookee Here-trimmed.flac'
with open(file, 'rb') as fp:
  byte_array = bytearray(fp.read())
  print(byte_array.find(bytearray([0x66, 0x4C, 0x61, 0x43])))

which prints 2048.

CatKasha commented 1 year ago

its easy to read what size ID3 tag have, but ID3 tag can be up to 256mb and filetype.py (current v1.2.0) only reads first 8kb and idk if filetype.py can read more or seek though file

only solution that i can make is to move FLAC matcher above MP3 and check if ID3 tag smaller than readed buf, then check FLAC header after ID3 tag, if not then return None

same check should be apply to MP3 too tbh

here is code how to skip ID3 tag

id3_flags = bin(buf[5])[2:].zfill(8)
footer_present = bool(int(id3_flags[3]))
id3_tag_size = ""
for i in buf[6:10]:
    id3_tag_size += bin(i)[2:].zfill(7)
id3_tag_size = int(id3_tag_size, 2) + 10
if (footer_present):
    id3_tag_size += 10

file.seek(id3_tag_size, 0)