Open avih opened 1 year ago
I'm not aware of any case which relies on being able to change the buffer after putenv
.
ash
changes environ
in evalcommand()
when running a nofork applet. The existing value of environ
is saved and restored after the applet completes.
Thanks.
I ended up (ab)using the ANSI environ
to store the UTF8 values, same as I did 3 years ago here https://github.com/rmyorston/busybox-w32/issues/17#issuecomment-681927971 so basically the native unicode thing uses the ANSI environ
normally as far as the system goes, just that the encoding of non-ascii values is utf8 instead of ACP (and the mingw getenv/putenv/etc wrappers remain as is).
So any trick busybox or busybox-w32 do (like with the empty values, or the clearenv implementation), should work out of the box also with the native unicode mode.
ash
changesenviron
inevalcommand()
As far as I can tell, when spawn is used - it takes the process env - not the CRT environ
, but the process env is hopefully up to date because putenv
updates both the crt environ
(and possibly also _wenviron
) and also the process env (e.g. using SetEnvironmentVariable[W]
).
But if you modify environ
to point to something else, how does the process env gets updated to the new values?
Still on the subject of environ
.
I see that there were some commits related to UCRT. As far as I can tell, the current state is that we can't spawn with UCRT with non-null env
, or else it will crash?
And so, if we need to spawn with a custom env
array, it basically clears the current env and sets it from this custom array, and then calls spawn with a NULL
env so that it would be taken from the environment (which is now a copy of the custom array).
Is that correct?
Originally I meant to handle NULL env to spawn by simply replacing it with environ
(which is then converted to a wide array before calling _wspawnve
), so that would presumably crash with UCRT? (it's not a problem to update the process env if it's null, it's just less code to replace NULL with environ
, even if possibly slower to convert all of environ to a wide array).
Can you reproduce the crash issue with non-NULL env? Any pointers how to build busybox-w32 with UCRT?
Nofork applets run in the same process as their parent shell. There's no spawn in this case.
Passing a non-NULL env to spawn with UCRT causes any subsequent spawn in the child process to fail. If the child process doesn't spawn a grandchild there's no problem. The workaround is to arrange things so that the initial spawn always has a NULL env.
I haven't reproduced this recently. When I worked on the problem I used the UCRT variant of MSYS2 on Windows.
Passing a non-NULL env to spawn with UCRT causes any subsequent spawn in the child process to fail. If the child process doesn't spawn a grandchild there's no problem. The workaround is to arrange things so that the initial spawn always has a NULL env.
Thanks.
So how does one trigger the crash scenario as far as a user test case goes?
Also, I might want to put all the UTF8 APIs implementations in a new C file (maybe mingw-utf8.c
or some such), possibly with a matching header.
How should I go about it as far as makefiles/other-requirements go? I tried grepping mingw.c
and mingw.o
(to check how mingw.c
is compiled depending on CONFIG_PLATFORM_MINGW32) but couldn't find how this happens.
I see it's in win32/Kbuild
. I used ag
(the silver searcher) and apparently it skipped some files...
Do I need to mention the .h
file someplace so that it rebuilds the .c
file when the header changes?
So how does one trigger the crash scenario as far as a user test case goes?
You'd need to:
sed -i 's/_UCRT/_WHATEVER/' shell/ash.c
time
as the second.Do I need to mention the .h file someplace
That shouldn't be necessary.
Thanks.
I still didn't get to try UCRT, but I'll update here once I do.
Regardless, before you make the next release, can I please review the release notes about unicode?
Draft release notes
Thanks. Generally looks good. Few minor comments:
It isn't necessary to adjust the console code page: this is handled transparently.
That might sound as if we're changing the console CP. Assuming we don't want to imply this, maybe remove "this is handled transparently", or maybe reword as something like "Unicode output and interactive input work with any console code page."
It has been necessary to apply workarounds for deficiencies in Microsoft's current support for UTF-8. There may be undiscovered problems and the workarounds themselves may have issues.
I believe this refers only to UTF8_INPUT and/or UTF8_OUTPUT? (I don't recall other workarounds)
If yes, then it might be useful to use something like "... deficiencies in Microsoft's current support for UTF-8 in interactive use and printouts to the console"
Or some other wording to limit the statement to console IO, as this doesn't refer, as far as I can tell, to the ability to work with files with unicode names, like with tar
or cat
.
The Windows Terminal works better than the Windows console.
Maybe add "especially when it comes to rendering unicode". I don't think there's difference in editing as long as the glyphs are supported.
As an additional note, I found out that other terminals also work better than the windows console, like vscode builtin terminal (amazing combining chars rendering), or hyper or wez (that's not an endorsement), and possibly more.
All the mentioned ones seem to handle combining chars better than the windows terminal, tested with this - check "The Three Kingdoms" alignment thingy (the whole page pastes just fine into a here-document in an interactive sh prompt ;) ).
However, all of those also don't have mouse-wheel events for console apps which support it (like less
for windows), while such apps do work at the windows terminal/console.
But I agree that mentioning only the windows terminal is enough.
Editing (cursor movement, delete, etc) can be buggy, especially with double-width or combining characters.
A reminder that while it doesn't fix all busybox editing issues, I do have a version with wcwidth
updated to latest Unicode (15). We already pushed a tiny update only for emojis, but the full update has much more. I can arrange quickly a PR with the full update if you want, which will be this, but it will use it unconditionally with UTF8_MANIFEST (and disable the existing table). Or just feel free to take this commit, edit it as you see fit, and push it yourself.
I don't know how much it affects practical use cases, but I believe it would affect that rocketship prompt which was mentioned in another issue, because that rocket glyph is double width, but current busybox[-w32] doesn't know that, so if it appears as part of PS1 at the last line of the prompt, busybox will get confused with the line wrap (of the user input which follows). The full update to Unicode 15 does include it as double width.
(mingw64-gcc 12.2.1-8.fc38; mingw64-crt 10.0.0-4.fc38; glob; Unicode)
As I think it's feasible to get the native unicode working, maybe change it to Unicode-manifest
or Unicode-Win10
so that the difference becomes visible (I'm not suggesting again to also change the download file name).
Regardless of Unicode, this looks like another good release, and good notes. Thanks again for maintaining it.
I originally wrote
It isn't necessary to adjust the console code page: this is handled automatically.
but thought that might imply the code page was changed automatically. So I said 'transparently' instead.
End users probably don't care too much about the exact nature of Microsoft's deficiencies so I prefer to leave them unspecified.
I'll have a look at the wcwidth
commit later.
The release certainly isn't going to be FRP-5179. Testing has uncovered problems on Wine which will require at least one workaround.
I'll have a look at the
wcwidth
commit later.
Sure. Let me know if you need anything.
The release certainly isn't going to be FRP-5179. Testing has uncovered problems on Wine which will require at least one workaround.
Hmm... I presume that's in your private test setup?
However, all of those also don't have mouse-wheel events for console apps which support it (like
less
for windows), while such apps do work at the windows terminal/console.
First of all, I was wrong. Mouse events do work in wezterm, but not the others (vscode, hyper).
FWIW, this seems to be an issue of older (win10, but not win11) conpty.dll
, which apparently doesn't do mouse events well.
wezterm bundles a newer conpty.dll
, so mouse wheel works.
I've decided not to take the wcwidth
commit for now. I always seem to have cause to regret last-minute changes made just before a release.
I'm not impressed with the latest version of Wine in Fedora. There are a number of regressions. I've worked around the most serious; the rest can wait. It's not like Wine is an important platform for busybox-w32.
Hmm... I presume that's in your private test setup?
Not private: testsuite/runtest
in busybox-w32 works on Windows as well as Linux. There are more failures on Windows, but that's only to be expected.
I'll restart my testing with FRP-5181 and see how that goes.
I've decided not to take the
wcwidth
commit for now
Sure. But if the unicode build gains traction, then I think it would be good to get this or something similar in. The upstream wcwidth
is 15 years out of date, and the number of meme double-width codepoints/icons has grown considerably since then - both in official support and in various meme use cases, like very fancy PS1.
I'm not suggesting to push it upstream because TBH I don't think we should deal with the "compression" of the data at upstream wcwidth whenever a new Unicode version is released (I think Unicode 16 should be released in few weeks).
I don't think there are existing scripts to automate it from the Unicode data to the busybox "compression", so taking it from someplace which does automate it sounds a lot better to me, especially where we might have slightly less concerns with the balance of program size vs features.
Not private:
testsuite/runtest
in busybox-w32 works on Windows as well as Linux. There are more failures on Windows, but that's only to be expected.
Do the linux parts refer to testing busybox-w32 on linux using wine? or to testing the native linux busybox?
Anyway, back to the subject of this issue: native unicode support.
I've noticed that in some win32 files, libbb.h
is included before other standard headers, for instance at fsync.c
or glob.c
or poll.c
or even mingw.c
.
Now, libbb.h
includes mingw.h
, and mingw.h
has some mappings of default names, such as fopen
to mingw_fopen
, etc.
So the point is that I think libbb.h
should be included last, so that it doesn't overrides system names before their respective system header is included.
I don't think I can identify a reason it's included first. Is there such reason?
If this is expected to work because libbb.h
already includes these headers, and so they'll no-op the second #include
, then I think it's kind of meh IMO, but would still be good to know.
I'm asking this because the utf8 API would need similar global mapping which would probably apply via mingw.h
(like CreateFileA
to createFile_U
), but if windows.h
is included after libbb.h
, then this poses an issue (and same for other names which would require such mappings).
(of course, names which are already mapped globally currently, like mingw_fopen
, would not have global mapping to the utf8 variants. These name would map to the utf8 variants at the C file where the current mingw/winansi wrapper is implemented).
Do the linux parts refer to testing busybox-w32 on linux using wine? or to testing the native linux busybox?
The latter. It's possible to build and test a native Linux binary using the busybox-w32 source. The test output doesn't exactly match that obtained using the upstream source. That's because some Windows tests are included when they really shouldn't be. I'll fix that.
libbb.h
tends to be included first because that's how it's always been done:
fa37e9da4b (Nguyễn Thái Ngọc Duy 2010-04-14 02:12:01 +0200 5) #include "libbb.h"
fa37e9da4b (Nguyễn Thái Ngọc Duy 2010-04-14 02:12:01 +0200 6) #include <windows.h>
1ee308c75f (Ron Yorston 2021-12-22 08:22:12 +0000 7) #include "lazyload.h"
fa37e9da4b (Nguyễn Thái Ngọc Duy 2010-04-14 02:12:01 +0200 8) #undef PACKED
fa37e9da4b (Nguyễn Thái Ngọc Duy 2010-04-14 02:12:01 +0200 9)
I've just been following that existing practice.
I've just been following that existing practice.
Would you be against moving libbb.h
it to the end?
For instance, mapping fprintf
to mingw_printf
would be disastarous if this happens before fprintf
is declared...
Anyway, next subject.
I was trying to evaluate which/how-many APIs would need to be mapped from ansi to UTF8. Searching APIs at the code did not seem productive, so I wrote a script which produces this list of the ansi symbols used by busybox.exe
.
I'm guessing a good bunch of the ansi APIs don't need special handling, e.g. many IO APIs, like fprintf
, already work fine with UTF8 too (i.e. no need to add a UTF8 wrapper) , and similarly for string APIs like strlen
(which has an equivalent wcslen
), or we probably don't need iswlower
, etc.
Once we have native UTF8 working, we can add a white list of allowed ansi symbols, and then use the script occasionally to check if we need to add more UTF8 wrappers etc.
So this is largely an FYI, but I'd appreciate if you could glance over it and see if anything stands out as wrong.
For reference, here's the current script: w32sym.zip
I'm guessing a good bunch of the ansi APIs don't need special handling...
Yeah, I think I overshoot a bit with the matching wide symbols.
These are correct matching symbols (e.g. strlen
, wcslen
) but they're only different because they operate on different type.
Compared, for instance, to [_w]fopen
or CreateFile{A,W}
, which have different reach - the ANSI ones are more limited compared to the wide ones, and these are the ones we mainly care about.
And here's the revised script. By default it now produces the short list (limited ANSI APIs), but it can also produce the long list with the general ansi APIs: w32sym.v2.zip
But I'm guessing it can't detect symbols which are loaded dynamically? Not sure...
Would you be against moving libbb.h it to the end?
Not if it's strictly necessary.
But I'm guessing it can't detect symbols which are loaded dynamically? Not sure...
I guess not. They'll appear as strings in the binary not as symbols. Easy enough to find in the source, though.
- Have the shell spawn an application that spawns some other application. In the original issue the first application was GNU make and I used busybox-w32
time
as the second.
Hmm.. I can't reproduce this.
I've built busybox.exe with UCRT on windows, using winlibs, i386, adding to the path a dir with busybox and the output of busybox.exe --install .
minus ar
, plus xxd
, pkg-config
and make
from the w64devkit bin dir (it probably doesn't need make
because winlibs has it, but it does seem to need a non-busybox xxd).
I got many warnings where LL_FMT and OFF_FMT are used - these seem to be ignored (it thinks "...%"LL_FMT"d"
is %d
, and I'm not sure why, but I know that it definitely goes through the #define LL_FMT "I64"
line), but it did build, and it does depend/linked with ucrtbase.dll
.
Example warning:
editors/awk.c:925:44: warning: format '%d' expects argument of type 'int', but argument 4 has type 'long long int' [-Wformat=]
925 | snprintf(g_buf, MAXVARFMT, "%"LL_FMT"d", (long long)n);
| ^~~ ~~~~~~~~~~~~
| |
| long long int
editors/awk.c:925:54: note: format string is defined here
925 | snprintf(g_buf, MAXVARFMT, "%"LL_FMT"d", (long long)n);
| ~~~~~~~~~^
| |
| int
| %"LL_FMT"lld
I then used a (wide) spawn variant which replaces NULL env with environ, so that spawn is never called with a NULL env, and does get called with environ env.
I couldn't quite figure out how to setup the make/time example, so I tried these, and all seem to work:
./busybox sh -c './busybox sh -c busybox'
./busybox sh -c 'sh -c "sh -c busybox"'
./busybox sh -c 'sh -c "./busybox printf xxx | ./busybox grep xxx"'
It would really help to have some one-liner test case to reproduce it...
Anyway, for now, I'll keep the current UCRT code which uses NULL env, and I'll handle exporting the UTF8 env to the system env, because that would also typically be much quicker to convert only he unicode values instead of converting the whole UTF8 environ to wide. It's not a big function so that should be fine.
But I would still like to be able to reproduce the UCRT crash when passing environ env, and then confirm that the NULL env fixes it.
I tried these, and all seem to work:
./busybox sh -c './busybox sh -c busybox'
./busybox sh -c 'sh -c "sh -c busybox"'
./busybox sh -c 'sh -c "./busybox printf xxx | ./busybox grep xxx"'
I tried also with the latest winlibs x86-64, plain non-any-utf8 build, with only this diff to the source:
diff --git a/shell/ash.c b/shell/ash.c
index 5a5c947e8..ec11e8f56 100644
--- a/shell/ash.c
+++ b/shell/ash.c
@@ -9126,7 +9126,7 @@ tryexec(IF_FEATURE_SH_STANDALONE(int applet_no,) const char *cmd, char **argv, c
# else
if (APPLET_IS_NOEXEC(applet_no)) {
# endif
-#if ENABLE_PLATFORM_MINGW32 && !defined(_UCRT)
+#if 1 || ENABLE_PLATFORM_MINGW32 && !defined(_UCRT)
/* If building for UCRT move this up into shellexec() to
* work around a bug. */
clearenv();
@@ -9203,7 +9203,7 @@ static void shellexec(char *prog, char **argv, const char *path, int idx)
int applet_no = -1; /* used only by FEATURE_SH_STANDALONE */
envp = listvars(VEXPORT, VUNSET, /*strlist:*/ NULL, /*end:*/ NULL);
-#if ENABLE_PLATFORM_MINGW32 && defined(_UCRT)
+#if 0 && ENABLE_PLATFORM_MINGW32 && defined(_UCRT)
/* Avoid UCRT bug by updating parent's environment and passing a
* NULL environment pointer to execve(). */
clearenv();
And tried also this case:
./busybox sh -c 'X=x ./busybox sh -c "Z=z ./busybox"'
Which also works fine as far as I can tell..
As for the warning, I think I previously missed also this line, which is in line with my statement that it's defined as I64
:
include/libbb.h:293:17: note: format string is defined here
293 | #define LL_FMT "I64"
| ^
So I'm guessing in winlibs it doesn't like I64
? which is weird. It's defined like this:
#if ENABLE_PLATFORM_MINGW32 && \
(!defined(__USE_MINGW_ANSI_STDIO) || !__USE_MINGW_ANSI_STDIO)
#define LL_FMT "I64"
#else
#define LL_FMT "ll"
#endif
So maybe __USE_MINGW_ANSI_STDIO
is ignored/doesn't-exist with UCRT? this might make sense, because the mingw stdio exists to work around msvcrt issues, which maybe don't exist with UCRT, and so this define is gone?
I'm also unable to reproduce the problem. Maybe Microsoft have fixed UCRT.
OK, I'll leave it up to you to decide what to do with the UCRT workaround.
Meanwhile, I'll just support the standard behavior with (utf8) spawn: allow and "forward" both NULL and non-NULL env, and if it's NULL then ensure the system wide env is up to date (instead of the smaller and slower hack of replacing NULL with environ and then converting all of it to wide).
As for the UCRT I64 warnings, I have a patch and will send a PR soon.
I'm Just confirming first it compiles without warnings with/without UCRT for i686/x86-64.
OK, this rabbit hole was a bit deeper than expected.
It bothered me that I was unable to reproduce the problem with UCRT. The original issue (#234) was related to an interaction between GNU make and busybox-w32 when they were both compiled with UCRT.
I was able to reproduce the problem using GNU make from back then but wanted to do it solely with current busybox-w32 compiled with the current MinGW-w64 UCRT toolchain.
The issue in GNU make is that it passes a non-default environment to CreateProcess()
. In busybox-w32 CreateProcess()
is used in the implementation of popen(3)
, so I hacked that to have a non-default environment:
diff --git a/win32/popen.c b/win32/popen.c
index 2208aa6bb..de2f2b1fb 100644
--- a/win32/popen.c
+++ b/win32/popen.c
@@ -190,6 +190,7 @@ static int mingw_popen_internal(pipe_data *p, const char *cmd,
int success;
int fd = -1;
int ip, ic, flags;
+ char env[] = "HELLO=1\0";
if ( cmd == NULL || *cmd == '\0' || mode == NULL ) {
return -1;
@@ -251,7 +252,7 @@ static int mingw_popen_internal(pipe_data *p, const char *cmd,
NULL, /* primary thread security attributes */
TRUE, /* handles are inherited */
0, /* creation flags */
- NULL, /* use parent's environment */
+ env,
NULL, /* use parent's current directory */
&siStartInfo, /* STARTUPINFO pointer */
&p->piProcInfo); /* receives PROCESS_INFORMATION */
Then I modified the shell to force it to spawn all applets and reverted the UCRT workaround in the shell:
diff --git a/shell/ash.c b/shell/ash.c
index 2ea87a049..0f1862424 100644
--- a/shell/ash.c
+++ b/shell/ash.c
@@ -9103,7 +9103,7 @@ tryexec(IF_FEATURE_SH_STANDALONE(int applet_no,) const char *cmd, char **argv, c
{
#if ENABLE_FEATURE_SH_STANDALONE
if (applet_no >= 0) {
-# if ENABLE_PLATFORM_MINGW32
+# if ENABLE_PLATFORM_MINGW32 && 0
/* Treat all applets as NOEXEC, including the shell itself if
* this is a FS_SHELLEXEC shell. */
struct forkshell *fs = (struct forkshell *)sticky_mem_start;
@@ -9124,7 +9124,7 @@ tryexec(IF_FEATURE_SH_STANDALONE(int applet_no,) const char *cmd, char **argv, c
# else
if (APPLET_IS_NOEXEC(applet_no)) {
# endif
-#if ENABLE_PLATFORM_MINGW32 && !defined(_UCRT)
+#if ENABLE_PLATFORM_MINGW32 && !defined(_WHATEVER)
/* If building for UCRT move this up into shellexec() to
* work around a bug. */
clearenv();
@@ -9201,7 +9201,7 @@ static void shellexec(char *prog, char **argv, const char *path, int idx)
int applet_no = -1; /* used only by FEATURE_SH_STANDALONE */
envp = listvars(VEXPORT, VUNSET, /*strlist:*/ NULL, /*end:*/ NULL);
-#if ENABLE_PLATFORM_MINGW32 && defined(_UCRT)
+#if ENABLE_PLATFORM_MINGW32 && defined(_WHATEVER)
/* Avoid UCRT bug by updating parent's environment and passing a
* NULL environment pointer to execve(). */
clearenv();
To make the call to popen(3)
I used this Makefile:
sum != sum Makefile
target:
@echo sum is $(sum)
The shell macro assignment (!=
) uses popen(3)
. Using a UCRT-compiled version of busybox-w32 make
from current git master gives:
~ $ make
sum is 22355 1
The UCRT version without the workaround gives:
~ $ make
sum is
So, the workaround is still necessary. Without it GNU make still triggers the problem previously reported.
Using a UCRT-compiled version of busybox-w32
make
It's good to know the workaround is still required, and apparently in _wspawnve
as well, judging by the links below.
Reference:
Here are some more thoughts (i'm not abandoning the native utf8, but still).
Mingw-w64 recently changed their default to ucrt (at git master, probably for v12, if there wouldn't be too much backlash), and while w64devkit doesn't currently have plans to switch to ucrt, I'm guessing the defaults would eventually trickle down to most mingw setups.
ucrt applications can run on XP. It does require installing the ucrt runtime (the is/was an official runtime for XP), but then it works, even when compiling with llvm 17 (with some XP compiler and linker options - if the mingw setup defaults to win 7 or later).
Using a utf8 locale does not actually require the utf8 manifest. It can also be set using setlocale(".UTF8")
. Similar to the manifest, this would only succeed on win10 1803+ (the manifest is 1903+ - not the same version), but unlike the manifest, it also runs on XP, and if setlocale failed then no problem - it remains ANSI.
Another difference with runtime setlocale
is that argv
and environ
are already ANSI - unlike with the manifest, so they need to become utf8. Also, apparently, CP_ACP
and GetACP()
are as if setlocale was never called (but of course, the current locale - which is affected by setlocale - can be determined at runtime).
And, also unlike the manifest, this only works with ucrt. With msvcrt on win 10/11 (and older versions) the setlocale call fails, which is why I mentioned the mingw default change and that ucrt can run on XP as well.
Here's an example of using the runtime setlocale approach: https://github.com/gwsw/less/commit/ad440b6cea6a8ac2c5c26011fe4fe710791b459f
It's relatively little code. More than the manifest (which basically requires no code changes), but not too much. I think it can trivially work in busybox-w32 as well.
A manifest can still be added, which would bypass the initial conversion of argv and environ, but I'm guessing that the goal of this is to support XP up to latest with the same binary, and the manifest breaks XP.
And, we might need to reconsider our position about the same binary being unicode on one system and ansi on another.
Thoughts?
Using a utf8 locale does not actually require the utf8 manifest. It can also be set using
setlocale(".UTF8")
. Similar to the manifest
So, I have a local patch to use this method as FEATURE_UTF8_UCRT, but it seems that it doesn't work fully, or I couldn't make it work, as I expected.
TL;DR: fopen
seems to work and does open UTF-8 names correctly, and the docs mention that _mkdir
and _getcwd
do become UTF-8, and (with this busybox build) the spawn
* family does invoke apps with correct unicode args (i.e. seemingly argv is converted correctly to UTF8 by new code in this patch, and spawn
uses this UTF8 without further conversion to wide chars), but as far as I can tell Find{First,Next}FileA
seem to return question marks instead of non-ASCII codepoints.
So in busybox sh, cat unicode-path
, cd unicode-path
, pwd
, non-applet.exe unicode-arg
seem to work OK, but ls
doesn't list unicode names correctly.
I've reproduced the fopen
and FindFirstFileA
cases in a test program, where with setlocale
only the former becomes UTF8, while with the UTF8 manifest both seem to become UTF8.
See also here https://github.com/gwsw/less/pull/438
I don't know whether that's a bug of the UTF8 locale or a designed behavior - I don't think there's docs for this locale other than the linked page, and I don't quite get where the line goes between the manifest and the setlocale methods (other than argv, environ, etc).
and I don't quite get where the line goes between the manifest and the setlocale methods (other than argv, environ, etc).
Or maybe I do. I'm guessing the setlocale
method only applies to msvcrt/ucrt API, like fopen
, but indeed not like FindFirstFileA
.
Basically, roughly the libc API, which I guess kind of makes sense, but is not very helpful if one needs total UTF8 API, including the ANSI win32 APIs...
@rmyorston @avih Wouldn't it be possible to allow to user to enable/disable the interactive UTF-8 editing at runtime using a special BB_* var?
Please stop it. The answer is no. Just replace the binary instead.
So, I'm looking into adding native unicode support.
Basically similar to what the utf8 manifest does, but manually (adding
U
variant of win32 APIs which use utf8 prototypes and then internally calls theW
APIs - instead of letting the manifest change theA
APIs into utf8).So I created this issue to ask related questions instead of opening an issue on each question.
First question: I might need to maintain a utf8 version of
environ
independent of the systemenviron
(with its own *env API). Does busybox-w32 expectputenv
to add the string to the environment, so that e.g. this prints1
?With POSIX *env API (and on linux) it prints 1, but on windows, as far as I can tell
_putenv
makes a copy ofxenv
, so the user's string doesn't actually become part of the environment, and it prints 0.Also, with POSIX, it's valid to change
environ
to point to something else (user-provided array, for instance).I see that
mingw_putenv
does a trick to set an empty value, and it's indeed aware that the buffer gets copied (henceenviron
is iterated to find the copy to truncate).Do you know if these use cases (change the buffer after
putenv
, or changeenviron
) happen with upstream busybox? If yes, how does it work on windows?(
setenv
does make a copy, and so it can be more efficient to access using something less linear thanenviron
).