thoughtbot / cocaine

A small library for doing (command) lines.
https://robots.thoughtbot.com
Other
785 stars 55 forks source link

Can't pass UTF-8 file names from paperclip #34

Closed kenn closed 11 years ago

kenn commented 11 years ago

Because the base string literals are US-ASCII and trying to interpolate UTF-8 strings into it?

Let me know if you need more information.

ArgumentError: invalid byte sequence in US-ASCII
[GEM_ROOT]/gems/cocaine-0.4.2/lib/cocaine/command_line.rb:121:in `gsub'
[GEM_ROOT]/gems/cocaine-0.4.2/lib/cocaine/command_line.rb:121:in `block in interpolate'
[GEM_ROOT]/gems/cocaine-0.4.2/lib/cocaine/command_line.rb:120:in `each'
[GEM_ROOT]/gems/cocaine-0.4.2/lib/cocaine/command_line.rb:120:in `inject'
[GEM_ROOT]/gems/cocaine-0.4.2/lib/cocaine/command_line.rb:120:in `interpolate'
[GEM_ROOT]/gems/cocaine-0.4.2/lib/cocaine/command_line.rb:66:in `command'
[GEM_ROOT]/gems/cocaine-0.4.2/lib/cocaine/command_line.rb:74:in `run'
[GEM_ROOT]/gems/paperclip-3.3.0/lib/paperclip/helpers.rb:31:in `run'
[GEM_ROOT]/gems/paperclip-3.3.0/lib/paperclip/processor.rb:39:in `convert'
[GEM_ROOT]/gems/paperclip-3.3.0/lib/paperclip/thumbnail.rb:77:in `make'
[GEM_ROOT]/gems/paperclip-3.3.0/lib/paperclip/processor.rb:33:in `make'
[GEM_ROOT]/gems/paperclip-3.3.0/lib/paperclip/attachment.rb:410:in `block in post_process_style'
[GEM_ROOT]/gems/paperclip-3.3.0/lib/paperclip/attachment.rb:409:in `each'
[GEM_ROOT]/gems/paperclip-3.3.0/lib/paperclip/attachment.rb:409:in `inject'
[GEM_ROOT]/gems/paperclip-3.3.0/lib/paperclip/attachment.rb:409:in `post_process_style'
[GEM_ROOT]/gems/paperclip-3.3.0/lib/paperclip/attachment.rb:402:in `block in post_process_styles'
[GEM_ROOT]/gems/paperclip-3.3.0/lib/paperclip/attachment.rb:401:in `each'
[GEM_ROOT]/gems/paperclip-3.3.0/lib/paperclip/attachment.rb:401:in `post_process_styles'
[GEM_ROOT]/gems/paperclip-3.3.0/lib/paperclip/attachment.rb:394:in `block (2 levels) in post_process'
jyurek commented 11 years ago

Actually, yeah, I do have questions. What version of Ruby is this, and do you see this in your app or in the test suite for paperclip?

kenn commented 11 years ago

The error was reported via Airbrake on production. And there, we use ruby-1.9.3-p0 [i686] on Debian 6 32-bit.

jyurek commented 11 years ago

Hmm, ok. Can you paste your paperclip attachment definition and the name of the file you're trying to upload?

kenn commented 11 years ago

I can only see "#<ActionDispatch::Http::UploadedFile:0xc562d70>" on Airbrake and can't check what was the file name for that. The paperclip definition is pretty straightforward -

has_attached_file :icon, { :styles => { :s36 => "36x36>", :s72 => "72x72>", :s144 => "144x144>" }, :default_style => :s72 }.merge(IMAGE_OPTIONS)

where IMAGE_OPTIONS is mostly storage options.

BTW the browser was Chrome on Windows 7 - my wild guess is that this guy used Japanese version (or other localized ver) of Windows, where the file name was uploaded as Shift-JIS and you get mojibake on the server side?

At this point I'm not sure if cocaine is the right place to report the problem - feel free to close. :)

jyurek commented 11 years ago

I asked for the definition to see what your path and url looked like. I should have been more specific, sorry. I assume they're just listed as strings without any specific encoding here. In that case I'll have to try to come up with a failing testcase so it can be fixed.

kenn commented 11 years ago

For production, path and url are as follows - :base_class is a custom interpolation.

:url => ":s3_domain_url",
:path => ":rails_root/public/system/:attachment/:base_class/:id/:style.:extension"

So yes, it's a string literal which encoding should inherit the source encoding, which is by default US-ASCII.

On a side note, the good news is that starting Ruby 2.0 the default source encoding will be UTF-8.

http://bugs.ruby-lang.org/issues/6679

robworley commented 11 years ago

Just encountered this issue in production with a file named "Façade.jpg".

cfabianski commented 11 years ago

Will you accept a PR with only an addition of the #coding: UTF8 comment ?

jyurek commented 11 years ago

@kenn, can you try applying @cfabianski's patch and reporting back if it fixes your problem?

cfabianski commented 11 years ago

I'm trying to fix the build with jruby 1.7.0 as travis is not happy with this change

jyurek commented 11 years ago

And I ask this because, sadly, I'm unable to reproduce this using the https://github.com/thoughtbot/paperclip_demo repo. I uploaded a "Façade.jpg" without issue.

cfabianski commented 11 years ago

@jyurek what's your ENV ? (LC_LANG and so on) When you get into those troubles, it becomes hard

cfabianski commented 11 years ago

BTW, I need more time, master isn't compatible with jRuby 1.7. It has been tested only with jRuby 1.6.8

jyurek commented 11 years ago

Ah, I have "LANG=en_US.UTF-8" in there. Maybe that has something to do with it. Thanks, I'll check this out later.

cfabianski commented 11 years ago

On my side I have LANG=fr_FR.UTF-8

cfabianski commented 11 years ago

@jyurek if you have an idea about why the build os broken on jRuby 1.7.0, please let me know :-)

jyurek commented 11 years ago

I'll check it out tonight and let you know what I find.

kenn commented 11 years ago

I can't reproduce the problem on my Mac, probably because Macs hold filenames as UTF-8-MAC and browsers send them as UTF-8.

My guess is that the client that caused the error was a localized Windows client with ISO-8859-1 (or whatever) encoding, which sends filenames in their system encoding.

If so, I don't think magic comments in the source files will fix the problem.

ActionDispatch::Http::UploadedFile has this method for original_filename:

def encode_filename(filename)
  # Encode the filename in the utf8 encoding, unless it is nil or we're in 1.8
  if "ruby".encoding_aware? && filename
    filename.force_encoding("UTF-8").encode!
  else
    filename
  end
end

If my reasoning is correct, force_encoding in this method is the root cause of the problem.

cfabianski commented 11 years ago

@kenn I'm able to reproduce the issue with the current master of cocaine. ArgumentError: invalid byte sequence in US-ASCII If I change for my fork and the dedicated branch it does work

I just uploaded a filename with accented characters like this Image_à-çédric.jpg

Can you try again?

[EDIT] ruby 1.9.3-p0 with rvm on Mac OS X 10.7.5

cfabianski commented 11 years ago

BTW, the magic comment should be added by default because UTF-8 is more a standard than US-ASCII don't you think?

cfabianski commented 11 years ago

I have a server running with my own fork and I can confirm that my fix fixed all bugs pointing out this issue. Unfortunately, as usual with encoding issue, it's pretty hard to reproduce this in a test because you have to put the magic comment on top of your spec file to accent accented characters :-/

jyurek commented 11 years ago

Ok I've pulled in @cfabianski's PR. @robworley, can you try this out and let me know if you have further problems?

robworley commented 11 years ago

@cfabianski's patch is working for me. Thanks all.

jyurek commented 11 years ago

Great, thanks. I'll close this for now. Please let us know if this pops up again.

ujh commented 11 years ago

Would it be possible to get a new release of cocaine? I'm facing this bug and I'd rather use an official release than a git revision.

cfabianski commented 11 years ago

@ujh for now, you can do this gem "cocaine", '0.4.2', :git => "git://github.com/cfabianski/cocaine.git", :branch => '34-utf-8-encoding'

ujh commented 11 years ago

@cfabianski Thanks. I'm doing something along these lines for now.