Open mohits opened 2 years ago
I have run this script in that directory:
p __dir__.encoding
p Dir.pwd.encoding
puts
p __ENCODING__
p ''.encoding
p Encoding.default_external
p Encoding.default_internal
and the output is
#<Encoding:IBM437>
#<Encoding:Windows-1252>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
nil
This is Windows 10 running in Parallels Desktop.
I suspect that just shows my ignorance wrt how file encoding works in Windows, and also what can a Ruby program assume when reading file/directory names.
Thanks for posting this here @fxn - I opened the issue here so that we can get closer to finding the correct place to fix this :) since the issues is clearly to do with Ruby + Windows, and not Rails.
This is what I get with codepage 65001 (UTF-8)
#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
nil
and with codepage 437
#<Encoding:IBM437>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
nil
Can you please check your codepage by doing chcp
on the command line?
@mohits it says 437.
If I execute chpc 65001
, the output is:
#<Encoding:UTF-8>
#<Encoding:Windows-1252>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
nil
Almost!
Hi @fxn - Yes, I think 437 (OEM - United States)
is the most common on English Windows. I think we need someone with a better understanding of locales on Windows to look at this issue.
Unsurprisingly, my simple test works on JRuby, of course - it successfully requires the file. Also, your code matches the output for chcp 65001 when run with JRuby even on a console that is CP-437.
$ jruby xfn.rb
#<Encoding:UTF-8>
#<Encoding:Windows-1252>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
nil
Today I could not reproduce, trying more carefully.
The file system in my machine is in Windows-1252
. I created a directory called à
using the file explorer to make sure the encoding is honored. Inside that directory I created this test file and a dummy bar.rb
:
puts Encoding.find('filesystem')
p Dir.pwd.bytes[-1]
require_relative "bar"
This works, and the output is
Windows-1252
224
If you check the codes in Windows-1252
, you'll see 224 is, indeed, à
.
@mohits Can you reproduce using these steps? Maybe the directory was created with UTF-8 bytes for a non-UTF-8 file system?
However, ø
belongs to Windows-1252
(code 248) and the same script prints the expected byte, but fails to perform the require_relative
.
This is interesting, because both à
and ø
and non-ASCII, I would expect to succeed or fail in the same way.
@mohits what happens in your machine with à
?
hi @fxn - I am a bit confused now with the results I am seeing but I have progress to report (kind of..)
[1] I created this path:
$ cd D:\projects\blog_posts-trials\rails\Test-à
[2] I ran your code:
$ chcp
Active code page: 437
$ ruby 1.rb
UTF-8
160
On my system, it shows both it as UTF-8. I did a chcp 1252
and ran the same code and it also ran with the same result. This is where it gets interesting. I went to the folder with Test Ø
in the name, and ran the code again (still with CP-1252) and it ran successfully.
UTF-8
152
[3] I forced it to change to CP-437 again by doing chcp 437
and it failed but I got this:
UTF-8
152
1.rb:4:in `require_relative': cannot load such file -- D:/projects/blog/_posts-trials/rails/Rails Server Test ?/2.rb (LoadError)
from 1.rb:4:in `<main>'
It read the character properly (as 52) but failed on the require_relative.
[4] On the other hand, with cp-437
it, I ran it in the path with Test-à
and it worked.
$ ruby 1.rb
UTF-8
160
So, to summarise:
I found this online: http://zuga.net/articles/text-ascii-vs-cp-1252-vs-cp-437/ that compares the code pages side by side.
CP-1252 is an 8-bit character encoding based on ASCII (identical up to code point 127). This is the default codepage for graphical applications under Windows. CP-437 is an 8-bit character encoding based on ASCII (identical up to code point 127). This is the default codepage for console applications under Windows.
In this, CP-1252 has the 2 characters at 224 and 248 respectively. CP-437 has à at 143 but does not have Ø at all.
@mohits Which Ruby version is that?
I discovered by testing related things in Zeitwerk that in Ruby 3.0 the file system encoding is assumed (unsure if the verb is correct) to be UTF-8. This issue in Redmine seems relevant.
@fxn - my bad. I should have included the ruby version: 3.0.3.
More information then:
Ruby 3.0.3 | Test Ø | CP-437 | UTF-8 | 152 | Fails to require
Ruby 2.7.4 | Test Ø | CP-437 | Windows-1252 | 216 | Fails to require
Ruby 2.6.8 | Test Ø | CP-437 | Windows-1252 | 216 | Fails to require
Ruby 3.0.3 | Test Ø | CP-1252 | UTF-8 | 152 | require_relative works
Ruby 2.7.4 | Test Ø | CP-1252 | Windows-1252 | 216 | require_relative works
Ruby 2.6.8 | Test Ø | CP-1252 | Windows-1252 | 216 | require_relative works
Ruby 3.0.3 | Test-à | CP-437 | UTF-8 | 160 | require_relative works
Ruby 2.7.4 | Test-à | CP-437 | Windows-1252 | 224 | require_relative works
Ruby 2.6.8 | Test-à | CP-437 | Windows-1252 | 224| require_relative works
Ruby 3.0.3 | Test-à | CP-1252 | UTF-8 | 160 | require_relative works
Ruby 2.7.4 | Test-à | CP-1252 | Windows-1252 | 224 | require_relative works
Ruby 2.6.8 | Test-à | CP-1252 | Windows-1252 | 224 | require_relative works
Yes, the issue on Redmine does seem relevant and might explain the result we see for the character code and encoding... but it appears that require_relative
uses some other encoding for the file path/ name?
What problems are you experiencing?
If the path has special characters in it and you try to run a Ruby script that does a relative_require on that path, it fails to load the file. It's almost certainly something to do with encoding on the Windows console.
It failed for me with:
Active code page: 437
andActive code page: 65001
This ticket is based on an issue on rails at https://github.com/rails/rails/issues/29087
Steps to reproduce
Create a folder called
Test Ø
and in it have 2 files:1.rb
2.rb
You should see an error like this:
What's the output from
ridk version
?ruby: path: C:/Ruby30-x64 version: 3.0.3 platform: x64-mingw32 ruby_installer: package_version: 3.0.3-1 git_commit: 981867a msys2: path: C:\Ruby30-x64\msys64 cc: gcc (Rev2, Built by MSYS2 project) 11.2.0 sh: GNU bash, version 5.1.8(1)-release (x86_64-pc-msys) os: Microsoft Windows [Version 10.0.19044.1586]