Closed phidebian closed 2 years ago
While reformatting your reproducer, I was able to identify the cause of this core dump and reproduced it on my Ubuntu 22.04 x86_64 and macOS x86_64 boxes as well:
$ ksh --version
version sh (AT&T Research) 93u+m/1.1.0-alpha+d84067a0/MOD 2022-09-28
# core dumps in interactive shell or not. so no need for -i. manually tested interactively as well.
# if variable a is set with value and discipline function is terminated via newline = core dump
$ ksh +E -uc $'a=A; function a.get { : $z; }\necho $a'
ksh[2]: a.get: line 1: z: parameter not set
Segmentation fault (core dumped)
$
# if discipline function is terminated using semi-colon then avoids core dump.
$ ksh +E -uc $'a=A; function a.get { : $z; }; echo $a'
ksh: a.get: line 1: z: parameter not set
$
# when variable is not set avoids core dump.
$ ksh +E -uc $'a=; function a.get { : $z; }\necho $a'
ksh[2]: a.get: line 1: z: parameter not set
$
Thanx @hyenias for pushing the streamline a bit further.
I got hard to reproduce the core dump in the variable.sh test, I started by mimicing other test and did something like got=(...) (no core dump)
got=$(ksh +E -uc $'a=A; function a.get { : $z; }\necho $a')
ksh[2]: a.get: line 1: z: parameter not set
While interactive got (core dump)
ksh +E -uc $'a=A; function a.get { : $z; }\necho $a'
ksh[2]: a.get: line 1: z: parameter not set
Segmentation fault (core dumped)
So at some point I introduced -i and left it (and wrongly thought interactive was a contributing factor). Only when I constructed the test without got=(...) construct I got the core dump, and then didn't proceeding with removing the -i from the picture.
$SHELL +E -uc $'a=A; function a.get { : $z; }\necho $a' ||
{ err_exit "$errmsg" "(expected success, got core dump)"
}
PS: I am github illiterate, so I am a beginner regarding the process here, I first submitted and issue (#545), then I submitted a qa-545 branch with the qa only change, but discovered that it somewhat create #546, and the same with fix-545 that produced #547, if doing this way is wrong, let me know (gently) where I should read further process technic, May be I should not had to open an issue and produce PR directly?
PS2 : I assume that my PR are 'examples' and that someone else 'who know better' will take care of it and may be tune them to reflect your streamline. If I should 'fix' my PR myself to include your new streamline up to a point it become acceptable 'as is' by the maintainer, I can do so.
Cheers,
@phidebian First off, thanks for submitting this issue and bringing this memory fault to light, much appreciated. I completely understand the hurdles submitting PRs as I still consider myself a novice to it all. That said, I have been following along and helping out @McDutchie ksh93u+m efforts when I can.
Normally, one opens up an issue and then a discussion takes place before submitting a PR to resolve the agreed upon course of action. Fact finding, narrowing the scope, root cause, possible fixes by including some small diff(s), discovery, and etc. are all parts of the issue phase. @McDutchie and @JohnoKing typically are quite adept at including lengthy code snippets into an issue's comments. Pull Requests (PRs) are git's primary mechanism when working the code base. So, I completely understand when you submitted your potential fixes via a PR instead of performing a diff or patch code snippet then inserting them into this issue as a comment.
Based on past experience, I would recommend consolidating your two PRs into just one as they are 100% related. Having two PRs for parts of the same thing is overly complex. Once you have consolidate your new code into one PR just close out the other as redundant.
For your #546 PR contents that adds in your new regression test for this issue, you are creating a multiline error message for your test results. This is non-standard. Typically, when you run bin/shtests
each test outputs a single error line for a test. As you have already done, just examine the other previous tests in src/cmd/ksh93/tests/variables.sh
or even other test files to help align your code with ksh's coding style. That is @McDutchie rule of thumb here which is to try to match the coding style of the current file including the use of tabs and spaces and syntax. Additionally, your error message still indicates interactive use.
For PR #547, I do not like "quick and dirty fix". That is my view only. As you stated, it does not address the root cause but simply places a bandaid over one symptom of the problem. Even if this fix avoids this one memory fault, more than likely the shell's state has been corrupted and could mask other issues.
@phidebian, hopefully I have been gentle as all of my comments are meant to you in the best of intentions. I would take the time and clean up your PRs as it will be good practice for you to sharpen your git skills. Please note that since you have found a long standing bug in the ksh code base, it could be a long time before it is properly addressed. FYI: When I create active variables via discipline functions (varname.get, varname.set, varname.unset); I do not set the variable with a value.
@hyenias, Ha ok I got the picture, I will submit a new PR combo as soon as refactored.
Regarding the double PR, since I am new to this, I thought that may be, one would like to test the QA alone, i.e confirm that a faulty occurrence (illegal NULL ptr in arglist lastarg in $_ setting) is detected, not only tin the provided test case, but in all other place where the bug could take place. So in short prove the test is OK, don't produce false positive or negative. Actually as you noted, it is not OK due to multi-line thing :-)
Regarding the test itself, my first attempt at it was a brute cut/paste of the above test, yet got=(...) didn't worked (i.e it hide the pb), so I went the splitted shell line way, and then later splitted the output printout because it look easier to read to me, but now I understand the principle, may be there is a dependency somewhere on a 1 liner output, will fix that.
Regarding the fix (quick'n'dirty), I proposed it and named it that way, instead of "Claiming this is the fix", because in my debugging session, I saw that the extra NULL was due to the longjmp fault return in shargbuild path. I spotted that fixing it could be potentially haze.ish, and then thought that may be the simple de-nullification could be enough, if validated by expert eyes, because this code is located in the $ setting return path only, and destabilizing shargbuild for the sake of $ in corner case was 'may be' marginal. I thought that avoid core dump while investigating more could be an option...
I can't promise I will investigate further, but as time permit I will look a bit deeper in this to find the real root cause.
Regarding gentle feedback :) you did great. I don't feel comfy in crowd, even less in loudly crowd, and similarly I run away from flaming public post. So at this point this is a success !!!
Thanx
No need to start a new PR, please just add commits to the existing one. They'll all get squashed into one at merge time.
I will try to find time to look at this soon.
For future reference, I like commits (and PRs) to be self-contained -- to have all the changes in them for a discrete fix or feature, so that the state between commits is as consistent as possible. So please also add regression tests, documentation updates, etc. in the same commit as the changes that they belong to.
Sorry I can mostly work on the WE for this.
I submitted a new PR from phidebian bug-545 branch. qa-545 and fix-545 should be ignored (I will see later ow to remove them)
It implement @hyenias streamlined test (mono line error message, shorter more pertinent test) It propose a fix to be reviewed, didn't broke the QA suite here, but again I am a starving programmer, so not many arch to test it (x86_64/ubuntu LTS 22.04).
I saw that some talk about CI test, what that is? and how should I trig them ?
This occurs only in interactive shell with nounset. EDIT: As @hyenias pointed out, this is not interactive dependent.
It occurs on debian/ubuntu x64 latest LTS, I can't test other combo.
The bugs is there since the epoch, well on github.
=============================================================== Example of occurence on debian (out of the box)
=============================================================== Same occurs on ubuntu
===============================================================
Change SHELL to your build env target to get it in debug mode.
Cheers.