hoover / snoop

Other
6 stars 3 forks source link

Use magic to find types of documents without extensions #35

Closed gabriel-v closed 7 years ago

gabriel-v commented 7 years ago

libmagic, that is.

Might be worth doing: use regexes on MAGIC_DESCRIPTION_TYPES in case the found descriptions will change (because more metadata is found).

mgax commented 7 years ago

Since we're going to the trouble of calling libmagic, can we save the content type for the document? It would be good to have Document.content_type as single source of truth.