Closed ian-mcdowell closed 6 years ago
That is a very good question. I'll look into it, but most Unix commands expect a C-string as input. There might not be a way to keep UTF8 characters through the entire pipeline.
At some point, the arguments have to be (char argc, char **argv)
. I do not know of a way to do UTF8 -> char -> UTF8 while keeping UTF8 information. I'll have to learn.
-[NSString cStringUsingEncoding:NSUTF8StringEncoding]
?
Sorry, strike my previous answer. ios_system()
operates on char*
, because system()
operates on char*
. There is no operation on the encoding inside ios_system()
. The conversion was done in CommandExecutor
, where there is the line ios_system(command.utf8CString)
which is equivalent to [command cStringUsingEncoding:NSUTF8StringEncoding]
(but shorter).
We need to track the result of this command.utf8CString
conversion, but my guess is that it removed the UTF-8 content to make it compatible with C-like strings. This conversion cannot be reversed.
Take a look at parseArgument
. You’re converting the arguments to ASCII.
If you take a UTF-8 C string (char *)
, then call +[NSString initWithCString:string encoding:NSASCIIStringEncoding]
to convert it to an NSString, you have lost the UTF-8 characters. Even if you later later call -[NSString cStringWithEncoding:NSUTF8StringEncoding]
, they will be gone.
All that should be necessary to fix this (I've verified locally) is to change all occurrences of NSASCIIStringEncoding
to NSUTF8StringEncoding
in ios_system.m
However, if we want to support other encodings than UTF-8 in the future, more work would be needed.
You're absolutely right. It works now. Gonna make a short PR for that. The effects are... interesting. Not all commands work, but most of them do.
Looks like ls
fails, that's it? Overall, looks like a nice improvement to what we currently have.
Investigating ls
. Looks like the relevant code is at file_cmds_ios/ls/util.c
line 113. iswprint(wc)
fails, so it prints a ?
Removing that iswprint
check allows emoji to display, but I'm sure there's a reason that check is there.
According to the doc, it should reply true if the character is printable, according to the defined locale: http://en.cppreference.com/w/c/string/wide/iswprint
Do we need to define LANG
, LC_ALL
, etc in the environment?
Defining LC_ALL
to en_US.UTF-8
: doesn't work.
Calling setlocale(LC_ALL, "en_US.UTF-8")
in iOS_system.m
: doesn't work.
Calling setlocale(LC_ALL, "en_US.UTF-8")
in ls.c
: doesn't work.
It does work on my mac.
What does setlocale(LC_ALL, NULL)
return? That should get you the current locale that C is using.
It says "C", which explains the behaviour of iswprint
but is inconsistent with the LC_ALL
.
Because setlocale
doesn't like en_US.UTF-8
.
Update: found a similar question on stackoverflow: https://stackoverflow.com/questions/32594377/setlocale-works-in-ios-simulator-but-fails-on-device
setlocale
depends on access to /usr/share/locale
, which is not possible in the sandbox. I'm going to deactivate all iswprint
calls.
Fixed with 6c22c28. I guess I can close this issue now?
OpenTerm passes UTF-8 commands to ios_system, and expects UTF-8 output from commands. However, argument parsing & command parsing converts the string to ASCII, which causes data loss.
Try:
echo 😀
in OpenTerm to see results.