h2non / filetype.py

Small, dependency-free, fast Python package to infer binary file types checking the magic numbers signature
https://h2non.github.io/filetype.py
MIT License
626 stars 109 forks source link

File does not evaluate mp4 properly. #47

Open cbitterfield opened 4 years ago

cbitterfield commented 4 years ago

This signature returns "None" filetype:

>>> kind = filetype.utils.get_signature_bytes('HDVWM419.mp4')
>>> print(kind)
bytearray(b"\x00\x00\x00 ftypisom\x00\x00\x02\x00isomiso2avc1mp41\x00\x00\x00\x08free\x8b_\x04\xf6mdat\x00\x01\'xe\xb8\x04_\xdb\xb3`+@\x85a)>\x10\x88\xebv{\x1ec(\x96(\xc9W^\x8a\\\xa7\xaaB$\xfaz\xb1\x8c&\xca\t\xa9\x04\x95\xb7\x87P\xf2\xaew~,\x8f\xa2\xda>\xe6\xe4\n$&\x9dm\x19\xeb4\xbd0\x00\xc6\x91\xf0\xb0\x85\x0f\xab<\x04\xf5\xe00\xaa\rm\xdc\xa6<a\x08\xcf\x8c\\\x0f\x18)\xdd\xc7\x8e\n\xd6\xd7\xd7\x05\x0fdPj\x15\x1f\xc5H\xd4\x98\x0cx\xce\xb9\xa7\xa8\t\xea\x8d\xe1\xb7\xe2F\x8fQoD\xadKT{\xc9D\xcapZ\xb8\xa2\xeez\xbd\xab\x9e7\x9a\xf7G\xbe/\xbdQ>P\xf6\xa3f\xdc\x17\xfb\xcb\x9c\x9a\x14\x06\xd4J\xb2\xe2\x15\x05\xda\xc5oL\x0b\xbd!\xb7>-\xe2\xb6\xda\x8bi\xab\x8c\xe3\xc1\xa7\x82c\x83\x93\x17$\xd9\xa8zM\xe4@Q\xab\\\xc5\xb4<\x04")

file HDVWM419.mp4
HDVWM419.mp4: ISO Media, MP4 Base Media v1 [IS0 14496-12:2003]

    Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, GBR), 1920x1080 [SAR 1:1 DAR 16:9], 12168 kb/s, 30 fps, 30 tbr, 30 tbn, 60

I guess the issues is using the correct magic file or metadata evaluation. I looked at your source code but not sure how to get it to see this as an MP4.

Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    creation_time   : 2016-09-19T13:43:30.000000Z
    encoder         : Lavf51.12.1

Videos with metadata that matches:
  Metadata:
    major_brand     : mp42
    minor_version   : 1
    compatible_brands: mp42mp41
dosas commented 3 years ago

@cbitterfield I think this one is resolved?

Using your bytestring from above:

buf = bytearray(b"\x00\x00\x00 ftypisom\x00\x00\x02\x00isomiso2avc1mp41\x00\x00\x00\x08free\x8b_\x04\xf6mdat\x00\x01\'xe\xb8\x04_\xdb\xb3`+@\x85a)>\x10\x88\xebv{\x1ec(\x96(\xc9W^\x8a\\\xa7\xaaB$\xfaz\xb1\x8c&\xca\t\xa9\x04\x95\xb7\x87P\xf2\xaew~,\x8f\xa2\xda>\xe6\xe4\n$&\x9dm\x19\xeb4\xbd0\x00\xc6\x91\xf0\xb0\x85\x0f\xab<\x04\xf5\xe00\xaa\rm\xdc\xa6<a\x08\xcf\x8c\\\x0f\x18)\xdd\xc7\x8e\n\xd6\xd7\xd7\x05\x0fdPj\x15\x1f\xc5H\xd4\x98\x0cx\xce\xb9\xa7\xa8\t\xea\x8d\xe1\xb7\xe2F\x8fQoD\xadKT{\xc9D\xcapZ\xb8\xa2\xeez\xbd\xab\x9e7\x9a\xf7G\xbe/\xbdQ>P\xf6\xa3f\xdc\x17\xfb\xcb\x9c\x9a\x14\x06\xd4J\xb2\xe2\x15\x05\xda\xc5oL\x0b\xbd!\xb7>-\xe2\xb6\xda\x8bi\xab\x8c\xe3\xc1\xa7\x82c\x83\x93\x17$\xd9\xa8zM\xe4@Q\xab\\\xc5\xb4<\x04")

from filetype import guess
guess(buf).mime
'video/mp4'