Closed ideafarm closed 9 years ago
I'm now set up to debug this. I've stepped into the code that reads the array that stores color info to write it in readable form in the configuration file. It is this array that is corrupted when config is saved and then loaded. I've reproduced the cycling. The next step is to inspect the writing of the file from this array and the subsequent loading of this array from the file.
I can see that there is a second area of the config file that is being updated incorrectly during a save/load cycle; the file changes when no config changes have been made using the user interface.
All of the above testing involves opening the debugger and immediately saving the configuration and then loading it, without doing anything else. I don't change the default radix. Making a change is not necessary. The error behavior is reproduced merely with a single save/load cycle. Doing multiple save/load cycles to write a set of config files allows you to see how the values in the files cycle. (Load all of the files into an editor with multiple buffers, and then cycle the display quickly, and the problem is obvious.)
Am still actively working on this and might have a fix tomorrow.
Found a bug! In the picture, the bug is the array offset used in the two lines that I have just stepped over. This code assumes that "attr" always has a value that equals the offset into the array. This is true for the elements at offsets 0 through 12. But "attr" for offset 13 is WND_FIRST_UNUSED, which apparently evaluates to 14 since that is the element that is clobbered. For offsets at or above 13, "attr" evaluates to offset+1.
This is why repeated save/load of settings results in clobbering of the foreground and background colors for the elements at offsets 14 through 17. The colors for the element at offset 18 is not clobbered for some reason. I think that the array extends to offset 19 but during my study I neglected to take notes on what that element contains, so do not know whether it is clobbered.
PLAN (COMMENTS WELCOME):
IMO, fixing a bug should be approached as a separate project. This bug illustrates why. One approach would be to change the code to ensure that "attr" always evaluates to the correct offset. One way to do that would be to eliminate the use of "attr" and just use the offset to identify the attribute. That is probably the right way to fix this bug, but would require deep familiarity with, and understanding of, the code.
Another approach would be to just define a function that maps the value of "attr" to the correct offset. That's probably what I am going to do, because it is not likely to introduce a regression bug. Use of "attr" to index into the array is clearly wrong, and fixing these two lines to avoid doing that will eliminate the problem behavior.
One general rule that I follow that applies here is that once I've eliminated the problem behavior, I am NOT going to stop there. Follow through is very important. In this case, follow through means to scan all of the source code to identify ALL of the code that uses "attr" to index into these arrays, FOR ALL classes (not just the current class, which is "all").
So that's my plan. I am still actively working on this. Once I've eliminated the problem behavior, I will ask Jiri to make me a contributor so that I can offer the code change to him. Then I will continue to "follow through", which will probably result in more code changes, which I will offer separately.
Comments welcome.
Disclaimer: I'm not familiar with the debugger code base.
Your proposed fix sounds like a band-aid to me. There may be a reason why the code is structured the way it is. For example it may be intentional for WND_FIRST_UNUSED to have the value that it does and for attr to access the array element that it does. The real problem might be elsewhere (or not... see my disclaimer above). I recommend trying to get a "deep familiarity" with the code before making corrections that might really just be fixing the symptom rather than the problem.
Regarding contribution: the usual way is to create pull requests containing your proposed fixes. Those pull requests can be reviewed and discussed here before they are merged. Do you have your own fork of the Open Watcom code base? I believe that is a necessary prerequisite to making pull requests.
I agree with Peter, real source of problem should be fixed, not mask this problem by some hack. I looked a little bit to code and I think the value WND_FIRST_UNUSED is first value which can be used by application (based on aui project) for its own purposes, values bellow WND_FIRST_UNUSED are used by aui project internaly. If you trace to gui project this value is same as GUI_FIRST_UNUSED. This value has similar meaning as WND_FIRST_UNUSED but relate to number of attributes internaly used by gui project.
Anyway I don't think you are talking about this what you are refering is windows class index processing, not attribute index processing. It looks like that different number of internal attributes are used for GUI and UI (character mode) with different default values. It could be a problem. It is only my quick look on code and it can be wrong, because "deep familiarity" with the code is necessary.
About two hours ago, someone stole my backpack, which contained my laptop, backup drive, and other valuables. This is going to take me out of action for about 2 months; it's a total disaster. I live out of a backpack and have very few possessions. I do my development work (on IPDOS (tm) and on OW) on a cloud server, but without a laptop I cannot log into it. (Rackspace does provide a way to log into it using a web browser, but that interface is too slow and cumbersome to use for any nontrivial task.)
My two cents on fixing this: The bug appears to exist on 1.9, so it is an old bug. Watcom 10.6 was rock solid, so this bug was introduced after the code became open source. IMO, it is brain dead to use an enum to index into an array; whoever did this doesn't comprehend programming in C. Arrays should be indexed into only using variables whose values are set locally in a way that guarantees that the index will never go out of bounds.
The first thing that I would look into, if you want to do a real fix, is why the table storing this information contains an "attr" element at all.
However, I do not agree with Jiri's pejorative characterization as a "hack". IMO, this defect should be fixed in two stages. First, a workaround should be done that is easy and SAFE. All code that uses "attr" to index into these arrays should be edited to pass "attr" to a mapping function that returns the correct offset to use. This workaround is easy, it is guaranteed to not introduce another bug, and its effects are local and easy to understand. It is possible that the instability of the debugger will be totally fixed by this simple workaround. The workround should include a detailed comment that indicates what behavior it is intended to fix, and should also include suggestions for alternative approaches to making a permanent fix, which might well involve a redesign of how this configuration information is stored. (E.g. eliminating the "attr" element of each row and using the offset to identify the attribute being referenced.)
Doing the safe workaround will yield valuable information. One possibility is that WDW becomes rock solid again. Another possibility is that it remains unstable. I recommend that SAFE and EASY workarounds be used which do not require deep understanding and do not require any redesign. This is the approach that I would vote for. See how far it will take us toward the goal of rock solid behavior. Redesign and "deep understanding" changes should only be done as a last resort, given the maturity of this product, the lack of manpower, and the absence of adequate quality control and project management.
I don't think the use of enums for index is bad thing, it is used very frequently in OW without problem. But incorrect use of enums is bad. I personally prefer transparent solution even if it is more work. We are now developing release which could be prepared for future as opposite of OW1.9 which have production quality and accept only minimal change. We are in different position, we are in beta status and I prefer rework things because it will save our work in future and now we can save time by minimal testing after each change. Main testing phase and bug fixes phase start on the end of all reworks. Now have no sense to talk about safe changes and about phase for fixing something. Things can be broken for some time if final solution will be what we need. Unfortunately bugs is bad thing, but now is acceptable but if they are known then should be fixed. Core tools must be always in good state because if it is broken OW build doesn't work. As soon as we will be in production state then it will be necessary to take this into account and be much more strict. But now it is useless and complicated approach. Anyway debugger is work in progress because porting to 64-bit is not finished yet. Especially colors stuff have a few bugs, see also http://sourceforge.net/p/openwatcom/tickets/4/.
The issues of proper use of enums and proper discipline when coding array indexing are issues of art. It is your canvas, Jiri; you are the artist. I am only another artist, giving my assessment of the code.
But do not think that this is just "colors stuff". This bug causes memory to be trashed and makes the debugger both unstable and almost unusable. It is almost unusable because you can't change the default window positions or any of the options. It is extremely irritating to me to have to constantly change individual watch expressions from the decimal default to hex. I cannot change the default to hex because that causes the configuration to be saved. That in turn causes memory to be trashed and the debugger to become unstable each and every time the saved configuration is loaded, i.e. every time the debugger is launched.
I have purchased a replacement laptop and am now getting my development environment set up again. It is my intention to "adopt" wdw.exe in the sense that I will be the/an unofficial bug fixer for the debugger. Initially, my focus will be on fixing the bugs that drive me crazy. But my intention is to become someone who is able to respond quickly and effectively to anyone who reports a bug in the debugger. It will take me at least a few weeks to rearrange my life to allocate time to this consistently, but that is my intent. This will probably be the first bug that I fix. My plan is to take a crack at studying the code so that I can provide a fix along the lines that Jiri advocates.
Any contribution is welcome.
I have a trivial typo fix to readme.txt that I want to use as my first contribution. I will assume that you want to use the "fork and pull" model in which I make my own personal fork of your repository and just send you pull requests.
The fork and pull model is always preferable. One common mistake people make, though, is to issue a pull request from their master branch. After you fork, you should create a branch, call it "fix-wdw-readme," correct the typo, and issue a pull request from "fix-wdw-readme."
When its time for your next submission, your master branch will be clean and (after pulling from this main repository), identical to the master here. When you branch for your next enhancement or bug fix, you'll be starting from a point that is in sync with the main repo's master branch.
So I would call it, "fork, branch, and pull" model.
If you want, have a look at my fork for the piles of feature branches from which I've issued pull requests (I often don't delete the branches when the pull request is accepted).
I fixed the problem with save/load configuration in OW 2.0 debugger. Now it is reliable, you get same .dbg file after a few save/load steps and OW debugger is realy setup correctly from saved .dbg file Anyway these bugs (not only one) in OW debugger is there from begining (OW 1.0) and is not specific for OW 2.0. If you are interested then you can build new debugger from latest sources and check.
Thank you, Jiri. I am still preoccupied with legal problems, so don't get to do fun stuff like work on OW with you all. Best wishes and keep up the good and exciting work on OW.
OK, no problem.
fixed in git repository
How: (1) Clone current repository. (2) Build. (3) Release. (4) Copy the contents of "rel" directory to install it. (5) Build IPDOS (tm). (6) Use WDW to step into a program. (7) In "Options...", change the default radix from 10 to 16. (8) Click "Save Setup..."
Computer: Rackspace virtual server running up-to-date Windows 2012 R2.
The above actions cause the color scheme to change from black on white to gray on black.
My investigation of this behavior about a year ago persuaded me that the bug involves indexing into an array that is involved with storing or saving the setup. (As I recall, I was able to get the color scheme to cycle.)
I am now actively working on this bug, but will need some time to become familiar with the code since this is my first attempt to work with OW code. Comments or suggestions are welcome.