loris-imageserver / loris

Loris IIIF Image Server
Other
208 stars 87 forks source link

Image format identification #95

Open jpstroop opened 10 years ago

jpstroop commented 10 years ago

See this comment. It's true, and this is a fairly typical convention for image processing libraries (as I wrote here when IIIF dropped support for conneg) and I don't feel that bad about it.

However, with the current resolvers, if your local file is named for just some opaque number or string, you're hosed. One option would be to use something like ahupp/python-magic (basically a wrapper for the Unix file utility). So if a jp2 came onto the file system and it was named "761390", you could do the Python equivalent of:

$ file --mime-type 761390
761390: image/jp2

with

>>> import magic
>>> magic.from_file("local_cache/761390", mime=True)
'image/jp2'

And the mime-type could then be mapped to, e.g. jp2 or whatever is appropriate.

I don't think I'd want to run that for every request (though maybe it would be OK), so at least an in-memory hash (OrderedDict?) mapping IDs to formats would probably be a good idea.

Not saying I'll add this functionality any time soon, but it's an option, and needed to be logged.

jpstroop commented 10 years ago

(In reference to https://github.com/pulibrary/loris/issues/98#issuecomment-49982002)

@eocarragain Thanks! Ruven and I had a brief discussion about this last winter--nice to see he has a solution. His approach is probably a lot more efficient and makes a lot of sense: using a big library like I proposed in #95 is probably overkill when we could just as easily sniff at the first few bytes of the file--we only have a handful of signatures to worry about, not thousands!

Another approach, if we did want the nice tidy API that ahupp/python-magic provides would be to supply a shorter magic file, which is an option:

https://github.com/ahupp/python-magic/blob/master/magic.py#L37-L53

and also limit how much we read from the file:

https://github.com/ahupp/python-magic/blob/master/magic.py#L13-L15

Just to log it, there's some info about how to do that here: http://stackoverflow.com/a/7236262/714478

This is starting to feel more compelling....