larsch / ocra

One-Click Ruby Application Builder
http://ocra.rubyforge.org/
835 stars 83 forks source link

Load file with special characters in path #157

Open imbrish opened 4 years ago

imbrish commented 4 years ago

Versions:

Default windows temp directory is C:\Users\<user>\AppData\Local\Temp.

It is a known issue that Ruby does not deal well with environment variables containing unicode characters, see https://bugs.ruby-lang.org/issues/12650.

This caused me a lot of headache when using tempfile. When user name contains unicode the temp path will be mangled and reading temp files will fail.

As ocra relies on loading tempfiles, the compiled executables will crash under these circumstances.

In my own scripts I deal with this by redefining the temp path in runtime:

require 'tempfile'

if Dir.tmpdir =~ /[^\x00-\x7F]/
    class Dir
        def self.tmpdir
            "C:/Temp"
        end
    end
end

Workaround to make ocra executables work for now is to invoke them with:

env TMP=C:/Temp ./script.exe

However things get more tricky when running exe from windows explorer.

Would you be interested in fixing this issue?

larsch commented 4 years ago

Unable to reproduce on Windows 10 1909, Ruby 2.6.5. My Windows appear to set the TEMP environment variable using the 8.3 version of the username path (Børge becomes BRGE~1).

TEMP=C:\Users\BRGE~1\AppData\Local\Temp

This script works fine:

require "tempfile"
Tempfile.create("tempfiletest") do |tmpfile|
  p tmpfile.path
end
imbrish commented 4 years ago

My username is Paweł (the last character is U+0142).

Here is p ENV["TEMP"]:

"C:\\Users\\Pawe\xC5\x82\\AppData\\Local\\Temp"

And here is the output of you script:

"C:/Users/Pawe\xC5\x82/AppData/Local/Temp/tempfiletest20200317-22132-lvvr3r"

It remains the same if I run the compiled executable.

However if I compile the following script:

require "pastel"

puts Pastel.new.red('Unicorns!')

The compilation works fine. But running the executable gives the following error:

Traceback (most recent call last):
        2: from C:/Users/Paweł/AppData/Local/Temp/ocr890D.tmp/src/temp.rb:1:in `<main>'
        1: from C:/Users/Paweł/AppData/Local/Temp/ocr890D.tmp/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:92:in `require'
C:/Users/Paweł/AppData/Local/Temp/ocr890D.tmp/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:92:in `require': cannot load such file -- pastel (LoadError)
        3: from C:/Users/Paweł/AppData/Local/Temp/ocr890D.tmp/src/temp.rb:1:in `<main>'
        2: from C:/Users/Paweł/AppData/Local/Temp/ocr890D.tmp/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:156:in `require'
        1: from C:/Users/Paweł/AppData/Local/Temp/ocr890D.tmp/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:168:in `rescue in require'
C:/Users/Paweł/AppData/Local/Temp/ocr890D.tmp/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:168:in `require': No such file or directory -- C:/Users/PaweĹ‚/AppData/Local/Temp/ocr890D.tmp/lib/ruby/gems/2.7.0/gems/pastel-0.7.3/lib/pastel.rb (LoadError)

Unless I call it with env TMP=C:/Temp ./temp.exe and then it works fine:

Unicorns!

So to be precise the problem happens when temp path contains a special character and some gem is required.

I did not dig into ocra to find out why that would happen, but can offer some more details about the Dir.tmpdir fix. Nowadays using tempfiles within ruby is mostly fine given default internal and external encodings are set to UTF-8. The problems occur when working with ENV variables, in system calls to shell commands with unicode arguments and probably some more as we can see above.

I tried to find an elegant solution for long hours, or, if you include all the previous encoding headache, probably days. I only got frustrated. If you nevertheless want to dig a bit yourself please let me know, so maybe I will be able to assist you some :)

imbrish commented 4 years ago

The simplest way to reproduce this issue is to create two files:

# specials.rb
require_relative "ø.rb"
# ø.rb
puts "ok"

Then compile using:

ocra --debug --debug-extract specials.rb

Running the executable will then fail even if TEMP path contains only ANSI characters:

# ./specials-debug.exe
Traceback (most recent call last):
        1: from C:/.../Special characters/ocr79F9.tmp/src/specials.rb:1:in `<main>'
C:/.../Special characters/ocr79F9.tmp/src/specials.rb:1:in `require_relative': cannot load such file -- C:/.../Special characters/ocr79F9.tmp/src/ø.rb (LoadError)

Despite that logs indicate creation of correct file:

CreateFile(C:\...\Special characters\ocr18A7.tmp\src\ø.rb, 11)

Inspection of unpacked directory shows that name is in fact garbled:

C:\...\Special characters\ocr79F9.tmp\src\ø.rb

I can imagine that working with Unicode from C is not easy. I don't have experience with this myself, but I've found this article https://docs.microsoft.com/en-us/cpp/text/unicode-programming-summary. The article is about C++ but some googling shows that it may as well be applicable to C. Maybe it will be of some help.