markevans / dragonfly

A Ruby gem for on-the-fly processing - suitable for image uploading in Rails, Sinatra and much more!
http://markevans.github.io/dragonfly
MIT License
2.12k stars 244 forks source link

Incorrect Mime type for remote url without file extension #386

Open speedmax opened 9 years ago

speedmax commented 9 years ago

file = Dragonfly.app.fetch_url("https://ununsplash.imgix.net/photo-1415302199888-384f752645d0?dpr=0.25&fit=crop&fm=jpg&h=700&q=75&w=1050")

File mime type returns octet stream instead of images/jpeg.

> file.mime_type
 => "application/octet-stream"

While there is correct content type set in HTTP response header

curl -i -XHEAD "https://unsplash.imgix.net/photo-1414005987108-a6d06de8769f?dpr=0.25&fit=crop&fm=jpg&q=75&w=1050"

HTTP/1.1 200 OK
Date: Mon, 09 Feb 2015 02:35:19 GMT
Server: imgix-fe
Last-Modified: Wed, 22 Oct 2014 19:26:28 GMT
X-Imgix-FramesIn: 1
Cache-Control: max-age=315360000
Content-Length: 18147
Accept-Ranges: bytes
Age: 1214
Content-Type: image/jpeg
X-Imgix-Request-ID: 1d6d4a7338bd34a04fa87167a3be751f98a421ca
lksv commented 8 years ago

I ran into the same problem.

Unfortunately it seems to me, that Dragonfly do not contain mime-type at all, it returns mime type according file extension (see https://github.com/markevans/dragonfly/blob/master/lib/dragonfly/content.rb#L81-L82 and https://github.com/markevans/dragonfly/blob/master/lib/dragonfly/app.rb#L209-L211)

I guess that the simplest way is to set extension according to mimetype in FetchUrl#apply. Unfortunately should not be always desired to change filename (file extension).

Second solution could be to store mime type in meta data, unfortunately it is bigger change...

any suggestions?

ShogunPanda commented 8 years ago

There is also a third way, which worked for me using small monkey patching. Basically when the extension is nil, I re-run the Analyzer and return the format as the file extension.

Here's the patch. @markevans Do you think you can add it to the real code?

module Dragonfly
  module HasFilename
    def ext
      if name
        File.extname(name)[/\.(.*)/, 1] if name
      else
        ImageMagick::Analysers::ImageProperties.new.call(self)["format"].gsub("jpeg", "jpg")
      end
    end
  end
end

Hope this helps! Paolo

markevans commented 8 years ago

hi - actually it used to work a bit like that, using the file command from unix (though not in the "ext" method - that specifically returns the file extension)

I took it out because it actually complicated things somewhat - using the file extension may not be very sophisticated, but it does make things simple.

Unfortunately should not be always desired to change filename (file extension).

Actually why not? It would ensure consistency. Can you think of a case when it would not be desired? Normally with fetch_url there's no "file" so to speak of anyway

ShogunPanda commented 8 years ago

Yup, but the problem is that it's not really reliable. Especially if your storage doesn't support extensions or strip them out by any chance.

markevans commented 8 years ago

the storage always supports extensions, as the ext comes from the metadata (meta["name"]) which has nothing to do with the uid (the file/s3 datastore just use the filename in the uid for convenience)

thinking about it, it would be easy for Content#mime_type https://github.com/markevans/dragonfly/blob/master/lib/dragonfly/content.rb#L81-L82 first to check for meta["mime_type"] then fall back to the file extension if nil, as @lksv mentioned (this is pretty much what it used to do actually - I can't remember exactly why but I removed it for simplicity)

I'd rather not be doing any expensive operations to find it out though, as used to happen - it was more trouble than it was worth, and was slow

I think the bottom line should be that anywhere the mime-type information is already known (e.g. from headers in FetchUrl, or headers when getting from S3), then that information should not be lost (which as mentioned above it sometimes currently is), and therefore stored in meta["mime_type"]. The ext shouldn't be changed without the user knowing, and therefore for example if the ext/mime_type from fetching from another url are already inconsistent, then that's not Dragonfly's problem (and probably won't matter anyway)

In other words, I think I favour the approach @lksv mentioned:

benpickles commented 8 years ago

Ideally the file type should be determined once on write rather than on every read. I store it on an *_ext database column (which used to be "magic" - https://github.com/markevans/dragonfly/issues/189).

ShogunPanda commented 8 years ago

@markevans I see. So we need to manually store the mime upon creation or are you going to modify the code?

lksv commented 8 years ago

@markevans Thanks to deeply explanation and sorry for so late answer.

Unfortunately should not be always desired to change filename (file extension).

Actually why not? It would ensure consistency. Can you think of a case when it would not be desired? Normally with fetch_url there's no "file" so to speak of anyway

Currently (very simplified) I am building something like cache storage of particular documents on web. So I need the name (and whole other metadata) would not be the changed.

thinking about it, it would be easy for Content#mime_type https://github.com/markevans/dragonfly/blob/master/lib/dragonfly/content.rb#L81-L82 first to check for meta["mime_type"] then fall back to the file extension if nil, as @lksv mentioned (this is pretty much what it used to do actually - I can't remember exactly why but I removed it for simplicity)

As I understand it correctly. It is very simple to fix it (at least of setting content by url e.g. mymodel.image_url = 'http://....'), The only few lines are needed to chage:

  1. https://github.com/markevans/dragonfly/blob/master/lib/dragonfly/job/fetch_url.rb#L78 to job.content.update(data, 'name' => "file.#{ext}", 'mime_type' => mime_type)
  2. https://github.com/markevans/dragonfly/blob/master/lib/dragonfly/job/fetch_url.rb#L47 to: a. get_following_redirects has to return mime_type as well b. update the mime_type on the line above.

Unfortunately I still do not understand the whole workflow of dragonfly. Could you please @markevans confirm my theory of "fixing" it. At least I need it fix it in my fork. ...I will definitely put pull request but I do not understand if you regarded it important ...as you said, you already removed some of this change.

markevans commented 8 years ago

@ShogunPanda yeh I'll change the code unless someone already sends a pull request @lksv that looks about right to me

johnnagro commented 7 years ago

any updates?

a little hacky, but I'm working-around this issue (using sinatra) like this:

    img = Dragonfly.app.fetch_url(img_url)
    thumb = img.thumb(size)
    resp = thumb.to_response(env)
    # https://github.com/markevans/dragonfly/issues/386
    if resp[1]["Content-Type"] == "application/octet-stream"
      resp[1]["Content-Type"] = "image/#{thumb.format}"
    end
    resp
markevans commented 7 years ago

apologies for the delay on this - I'll try to look at this (including @lksv 's pull request) this week

mid9commander commented 7 years ago

hi Mark, is there any chance you can look at this?

markevans commented 7 years ago

i absolutely will but i'm away right now. but it is on my to-do list. thanks for your patience

xnagpa commented 6 years ago

Aaaaaaand I'm here bumping into the same issue

markevans commented 6 years ago

actually this issue should have been closed as it was fixed here: https://github.com/markevans/dragonfly/pull/438 and https://github.com/markevans/dragonfly/commit/e5ca848aef74c5106d328ff06ee51dcafb8825d2 What are you bumping into exactly? And what version are you using?