oneclick / rubyinstaller2

MSYS2 based RubyInstaller for Windows
https://rubyinstaller.org
BSD 3-Clause "New" or "Revised" License
654 stars 249 forks source link

Ruby require fails when the path has special characters #265

Open mohits opened 2 years ago

mohits commented 2 years ago

What problems are you experiencing?

If the path has special characters in it and you try to run a Ruby script that does a relative_require on that path, it fails to load the file. It's almost certainly something to do with encoding on the Windows console.

It failed for me with:

This ticket is based on an issue on rails at https://github.com/rails/rails/issues/29087

Steps to reproduce

Create a folder called Test Ø and in it have 2 files:

1.rb

# encoding: UTF-8
require_relative "2.rb"

puts 'success'

2.rb

puts 'in the file'

You should see an error like this:

$ ruby 1.rb
1.rb:2:in `require_relative': cannot load such file -- D:/projects/blog/_posts-trials/rails/Test ?/2.rb (LoadError)
        from 1.rb:2:in `<main>'

What's the output from ridk version?


ruby: path: C:/Ruby30-x64 version: 3.0.3 platform: x64-mingw32 ruby_installer: package_version: 3.0.3-1 git_commit: 981867a msys2: path: C:\Ruby30-x64\msys64 cc: gcc (Rev2, Built by MSYS2 project) 11.2.0 sh: GNU bash, version 5.1.8(1)-release (x86_64-pc-msys) os: Microsoft Windows [Version 10.0.19044.1586]

fxn commented 2 years ago

I have run this script in that directory:

p __dir__.encoding
p Dir.pwd.encoding
puts
p __ENCODING__
p ''.encoding
p Encoding.default_external
p Encoding.default_internal

and the output is

#<Encoding:IBM437>
#<Encoding:Windows-1252>

#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
nil

This is Windows 10 running in Parallels Desktop.

I suspect that just shows my ignorance wrt how file encoding works in Windows, and also what can a Ruby program assume when reading file/directory names.

mohits commented 2 years ago

Thanks for posting this here @fxn - I opened the issue here so that we can get closer to finding the correct place to fix this :) since the issues is clearly to do with Ruby + Windows, and not Rails.

This is what I get with codepage 65001 (UTF-8)

#<Encoding:UTF-8>
#<Encoding:UTF-8>

#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
nil

and with codepage 437

#<Encoding:IBM437>
#<Encoding:UTF-8>

#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
nil

Can you please check your codepage by doing chcp on the command line?

fxn commented 2 years ago

@mohits it says 437.

fxn commented 2 years ago

If I execute chpc 65001, the output is:

#<Encoding:UTF-8>
#<Encoding:Windows-1252>

#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
nil

Almost!

mohits commented 2 years ago

Hi @fxn - Yes, I think 437 (OEM - United States) is the most common on English Windows. I think we need someone with a better understanding of locales on Windows to look at this issue.

Unsurprisingly, my simple test works on JRuby, of course - it successfully requires the file. Also, your code matches the output for chcp 65001 when run with JRuby even on a console that is CP-437.

$ jruby xfn.rb
#<Encoding:UTF-8>
#<Encoding:Windows-1252>

#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
nil
fxn commented 2 years ago

Today I could not reproduce, trying more carefully.

The file system in my machine is in Windows-1252. I created a directory called à using the file explorer to make sure the encoding is honored. Inside that directory I created this test file and a dummy bar.rb:

puts Encoding.find('filesystem')
p Dir.pwd.bytes[-1]
require_relative "bar"

This works, and the output is

Windows-1252
224

If you check the codes in Windows-1252, you'll see 224 is, indeed, à.

@mohits Can you reproduce using these steps? Maybe the directory was created with UTF-8 bytes for a non-UTF-8 file system?

fxn commented 2 years ago

However, ø belongs to Windows-1252 (code 248) and the same script prints the expected byte, but fails to perform the require_relative.

This is interesting, because both à and ø and non-ASCII, I would expect to succeed or fail in the same way.

@mohits what happens in your machine with à?

mohits commented 2 years ago

hi @fxn - I am a bit confused now with the results I am seeing but I have progress to report (kind of..)

[1] I created this path:

$ cd D:\projects\blog_posts-trials\rails\Test-à

[2] I ran your code:

$ chcp
Active code page: 437

$ ruby 1.rb
UTF-8
160

On my system, it shows both it as UTF-8. I did a chcp 1252 and ran the same code and it also ran with the same result. This is where it gets interesting. I went to the folder with Test Ø in the name, and ran the code again (still with CP-1252) and it ran successfully.

UTF-8
152

[3] I forced it to change to CP-437 again by doing chcp 437 and it failed but I got this:

UTF-8
152
1.rb:4:in `require_relative': cannot load such file -- D:/projects/blog/_posts-trials/rails/Rails Server Test ?/2.rb (LoadError)
        from 1.rb:4:in `<main>'

It read the character properly (as 52) but failed on the require_relative.

[4] On the other hand, with cp-437it, I ran it in the path with Test-à and it worked.

$ ruby 1.rb
UTF-8
160

So, to summarise:

I found this online: http://zuga.net/articles/text-ascii-vs-cp-1252-vs-cp-437/ that compares the code pages side by side.

CP-1252 is an 8-bit character encoding based on ASCII (identical up to code point 127). This is the default codepage for graphical applications under Windows. CP-437 is an 8-bit character encoding based on ASCII (identical up to code point 127). This is the default codepage for console applications under Windows.

In this, CP-1252 has the 2 characters at 224 and 248 respectively. CP-437 has à at 143 but does not have Ø at all.

fxn commented 2 years ago

@mohits Which Ruby version is that?

I discovered by testing related things in Zeitwerk that in Ruby 3.0 the file system encoding is assumed (unsure if the verb is correct) to be UTF-8. This issue in Redmine seems relevant.

mohits commented 2 years ago

@fxn - my bad. I should have included the ruby version: 3.0.3.

More information then:

Ruby 3.0.3 | Test Ø | CP-437  | UTF-8 | 152 | Fails to require
Ruby 2.7.4 | Test Ø | CP-437  | Windows-1252 | 216 | Fails to require
Ruby 2.6.8 | Test Ø | CP-437  | Windows-1252 | 216 | Fails to require

Ruby 3.0.3 | Test Ø | CP-1252 | UTF-8 | 152 | require_relative works
Ruby 2.7.4 | Test Ø | CP-1252 | Windows-1252 | 216 | require_relative works
Ruby 2.6.8 | Test Ø | CP-1252 | Windows-1252 | 216 | require_relative works

Ruby 3.0.3 | Test-à | CP-437  | UTF-8 | 160 | require_relative works
Ruby 2.7.4 | Test-à | CP-437  | Windows-1252 | 224 | require_relative works
Ruby 2.6.8 | Test-à | CP-437  | Windows-1252 | 224| require_relative works

Ruby 3.0.3 | Test-à | CP-1252 | UTF-8 | 160 | require_relative works
Ruby 2.7.4 | Test-à | CP-1252 | Windows-1252 | 224 | require_relative works
Ruby 2.6.8 | Test-à | CP-1252 | Windows-1252 | 224 | require_relative works

Yes, the issue on Redmine does seem relevant and might explain the result we see for the character code and encoding... but it appears that require_relative uses some other encoding for the file path/ name?