rschupp / PAR-Packer

(perl) Generate stand-alone executables, perl scripts and PAR files https://metacpan.org/pod/PAR::Packer
Other
48 stars 13 forks source link

Fixing the Unicode issues on Windows by patching runperl.c #32

Closed plk closed 3 years ago

plk commented 3 years ago

As per this long-standing slew of issues on Windows:

https://www.nu42.com/2017/02/perl-unicode-windows-trilogy-one.html

and the fix derived from this here:

https://github.com/circulosmeos/Perl-with-Unicode-for-Windows

it's possible to patch windows perl quite easily to fix the unicode file-system and command-line argument problems. I've tested it with 64-bit Strawberry 5.32 and it is a magic bullet for all sorts of horrible unicode issues on Windows. It's the only way to really fix it, apparently. However, when I pack my app with a fixed perl.exe, the resulting PAR exe is still broken in the usual ways, presumably because pp doesn't really call perl.exe where the fix is. Is there something we can do about this? Without modifying perl.exe in this minor way on windows, we can't solve the problem.

rschupp commented 3 years ago

@plk You must adapt the patch for runperl.c to myldr/main.c which is the main program for the custom Perl interpreter used by a packed executable.

plk commented 3 years ago

I assume boot.c is also involved here? As I understand it, the root of this is that we have to pass in wchar_t* instead of char * for argv otherwise any wide chars are already mangled before perl ever sees them. After changing main.c to accept wchar_t *, boot.exe does for me and looking there, it looks like this is what actually passes the real argv into main.c?

rschupp commented 3 years ago

main.c is a variant of RunPerl(): it adds some elements to argv (as fakeargv), passes that to perl_parse() and then starts the returned interpreter with perl_run().

boot.c mungs argv (like ShellQuote), then passes it on (via spawn) to the executable compiled from main.c.

If you replace argv with a wchar_t* version, you must make sure that the munging gets adapted to work with that.

plk commented 3 years ago

So, to be clear, the custom perl in main.c is always called via boot.c? I am trying to trace where the first point of contact is for command-line options to a pp packed .exe. If I have an pp executable X and I do X -option filename, what code gets those command-line things first?

rschupp commented 3 years ago

If I have an pp executable X and I do X -option filename, what code gets those command-line things first?

boot.c

main.c might also get called directly as parl, but not in the pp scenario.

plk commented 3 years ago

And run_with_inc.pl is only used at install time? The issue I think I'm having is that the install-time stuff adds things to argv which aren't there under normal command-line use and the UTF16->UTf8 decode mangles the install time boot.c calls and stops installation currently.

rschupp commented 3 years ago

Sorry, you're on your own here. IMHO it's futile to try to make pp-packed executables do stuff that perl itself doesn't do. This isn't a recent issue - unless Perl upstream moves in this direction, I'm not interested in even thinking about it.

shawnlaffan commented 3 years ago

@plk - have a look at Win32::LongPath. I've been using it for a while, including in pp packed executables.

You can wrap the calls inside subs with OS checks to avoid issues on non-windows.

In case it is useful, the code I use is at: https://github.com/shawnlaffan/biodiverse/blob/5fbf48b54a7a5446d22943fc16a04ab8c65ba98b/lib/Biodiverse/Common.pm#L1711-L1818

Regards, Shawn.

plk commented 3 years ago

I have been using Win32::Unicode::File and that works for file access but the issue I'm having currently is the command-line options when these contain things like Cyrillic etc. I don't think there is a way to make these work without patching the main() of perl as things are already broken by the time any code sees it. I have heard that Windows 10 1803+ has a general "force everything to UTF-8" regional setting which will solve it but I have yet to try this.

shawnlaffan commented 3 years ago

I just tested a PAR packed exe using a file called años.bps on the command line and it opened without issue. It failed for one using Cyrillic, though (ГДЄЅЗИѲ.bps), so Win32::LongPath is not a solution in this case.

plk commented 3 years ago

I tried the Windows 1803+ setting and it does indeed work, fixing all the issue I had - it basically forces most windows files system APIs to UTF8 and as long as you decode @ARGV to utf8, it all seems fine.