wxWidgets / wxWidgets

Cross-Platform C++ GUI Library
https://www.wxwidgets.org/
6.17k stars 1.77k forks source link

wxStyledTextCtrl pastes garbage characters in 3.1.0 when configured with --disable-unicode #17517

Open wxtrac opened 8 years ago

wxtrac commented 8 years ago

Issue migrated from trac ticket # 17517

component: wxStyledText | priority: normal | resolution: port to stable

2016-05-01 23:44:29: david_costanzo (David Costanzo) created the issue


I upgraded "fmslogo" (an ANSI-character MSW program compiled with i686-w64-mingw32 from cygwin) from wxWidgets 2.8.12 to 3.1.0 and ran into a bug where pasting more than 16 characters of text to my wxStyledTextCtrl subclass would paste as garbage.

When I copy and paste anything longer than 16 characters, it pasted as garbage. Furthermore, when I was doing a lot of pasting, my program eventually crashed, suggesting some kind of heap corruption.

For example, if I copied and pasted this:

ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789

it would paste as:

ÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝ

I was not able to reproduce this in the sample application stctest.exe because I was not able to get pasting to work at all--probably something wrong with my build environment.

I believe I have tracked down the problem to a bug to ScintillaWX::Paste in ScintillaWx.cpp. In particular, with this line:

   wxWX2MBbuf buf = (wxWX2MBbuf)wx2stc(evt.GetString());

In stepping through in gdb, my compiler created a temporary wxString as the return value from evt.GetString(). wx2stc() returned the correct pointer to the object's internal data. Then the temporary object was deallocated, and the memory to which "buf" pointed was deallocated along with it. So when the subsequent block of code executes, buf is unallocated.

I have patched my personal version of 3.1.0 with this code:

    // Hold the updated string in a local variable to prevent "buf"
    // from being deallocated before we're done with it.
    const wxString updatedString = evt.GetString();
    wxWX2MBbuf buf = (wxWX2MBbuf)wx2stc(updatedString);

This fixed the problem for me. Since I have to compiled my own wxWidgets anyway, there is no urgency to fix this for me.

By the way, in my program, I overrride global delete to fill the memory before freeing it. I meant to only do this in debug builds, but I have just discovered a bug in my build so that it's always used. Therefore, it's possible that the impact of using recently freed memory is more severe in my program than in others. Perhaps the default libc heap zeros the memory and that's why it looked like pasting didn't work in the sample.

wxtrac commented 8 years ago

2016-05-01 23:44:57: david_costanzo (David Costanzo) uploaded file paste-bug.PNG (4.7 KiB)

Screenshot of the bug paste-bug.PNG

wxtrac commented 8 years ago

2016-05-01 23:46:06: david_costanzo (David Costanzo) uploaded file wxwidgets-3.1.0.patch (0.7 KiB)

A patch which works for me

wxtrac commented 8 years ago

2016-05-02 03:25:08: paulclinger (Paul K) commented


Why does the description say "sometimes"? Is it not reproducible every time? I have an editor using wxwidgets 3.1.x (using wxSTC component) and have never seen anything like this (and it's under fairly heavy usage by many people, who are copying strings much longer than 16 characters). I don't doubt that it's happening in your case, but maybe it's somehow caused by the changes you made to the global delete function? Is there a specific sequence of steps leading to the strange result you see?

wxtrac commented 8 years ago

2016-05-02 07:24:19: david_costanzo (David Costanzo) commented


Why does the description say "sometimes"? Is it not reproducible every time?

I used the word "sometimes" because it didn't happen on short strings and because it's a memory bug, which is unlikely to have 100% predictable results unless you're using a debug heap. As you suggest, with my custom memory manager, when a particular string was replaced with garbage, it was 100% reproducible.

Since my understanding of the bug is using memory immediatly after freeing it, it wouldn't surprise me if it mostly works for other applications, as whether or not the buffer is re-used or its memory page is returned to the OS would depend on the particular memory manager. Have you tried running your application under a heap debugger like valgrind, Purify, or App Verifier?

It also might also be that some applications use a referenced counted wxString, so the buffer might not be deallocated when the temporary wxString object is. I don't know the internals of wxString, but perhaps it's significant that my wxString is not Unicode.

Is there a specific sequence of steps leading to the strange result you see?

The steps are just normal text pasting. I select text in a text file in notepad, hit Ctrl+c, give focus to the wxStledTextCtrl-based editor in my program, and hit Ctrl+v. Instead of the text I copied, I see the garbage in the screenshot.

Since there's doubt about this being a bug in wxWidgets, I'll see about trying to reproduce using the stc sample. I had hoped that pointing to the offending line of code would make this unneeded, as current my wxWidgets build has too many custom settings to serve as a repro and my machine is so slow that it takes many hours to rebuild wxWidgets.

wxtrac commented 8 years ago

2016-05-02 07:52:32: paulclinger (Paul K) commented


I don't know the internals of wxString, but perhaps it's significant that my wxString is not Unicode.

This may be significant; all my strings are Unicode strings. There was a relatively recent discussion about copy/paste operations failing on invalid UTF-8 code in Unicode aware configuration (https://groups.google.com/d/msg/wx-dev/61a6yK7JYxw/utzdsLpnCAAJ); see if it is similar to this case.

wxtrac commented 8 years ago

2016-05-02 07:54:09: paulclinger (Paul K) commented


Need to clarify my earlier statement; I do allow pasting UTF-8 invalid strings in my application, but I do handle that case manually by converting the string and using *Raw methods in wxSTC. It's a hackish way, but it works for me on Windows and OSX.

wxtrac commented 8 years ago

2016-05-02 09:24:08: @minoki commented


On ANSI build, wx2stc(str) is equivalent to str.mbc_str(), so the code wxWX2MBbuf buf # wx2stc(str); is equivalent to const char *bufstr.mbc_str();, which obviously go wrong if str is a temporary object.

wxCharBuffer or something should be used to store the return value of wx2stc.

By the way, I strongly recommend using Unicode build of wx because, in my opinion, ANSI build is less used today and therefore more buggy.

wxtrac commented 8 years ago

2016-05-03 05:46:34: david_costanzo (David Costanzo) changed title from wxStyledTextCtrl sometimes pastes garbage characters in 3.1.0 to wxStyledTextCtrl pastes garbage characters in 3.1.0 when configured with --disable-unicode

2016-05-03 05:46:34: david_costanzo (David Costanzo) commented

I have experimentally confirmed the observations above--that the problem is not reproducible with Unicode strings. I have updated the title for clarity and searchability.

Repro:

  1. On MSW using cygwin's i686-w64-mingw compiler, run the following commands to configure wxWidgets
export LDFLAGS='-static-libgcc -static-libstdc++'
export MAKE=gmake
../wxWidgets-3.1.0/configure --enable-stc --with-msw --enable-monolithic\
   --disable-shared --host=i686-w64-mingw32 --disable-unicode
  1. Build wxWidgets
  2. Build samples/stc
  3. Enable App Verifier (or some other heap checker) on stctest.exe
  4. Run stctest.exe
  5. Select some text and Edit -> Copy
  6. Choose Edit -> Paste

What Happens: stctext.exe stops working (crashes on a app verifier fault)

Expected Result: The text you copied is pasted at the location of the cursor, same as when "--disable-unicode" is omitted from the configure line.

By the way, I strongly recommend using Unicode build of wx because, in my opinion, ANSI build is less used today and therefore more buggy.

That's good advice. I would like to use Unicode, but this particular program is an interpreted programming language with an ANSI language engine. Text from the UI goes directly to the engine and there are a lot of naked "char *" variables throughout the code base. So it's better to have an ANSI-only UI that blocks non-ANSI characters than to have a lossy conversion. Converting this engine to Unicode is on the development roadmap but it will take a lot of time and effort. Until then, I'm happy that the latest wxWidgets still supports an ANSI build.

wxtrac commented 8 years ago

2016-05-03 16:11:02: @vadz changed status from new to closed

2016-05-03 16:11:02: @vadz changed resolution from * to fixed*

2016-05-03 16:11:02: @vadz commented

In e41a59ee075efda72b0cc665cda3ee3abc1616c1: Fix Unicode conversions in Scintilla code in ANSI build

Don't just use wxWX2MBbuf which is just a "const char*" in ANSI build and can become a dangling pointer if the string it was constructed from was temporary, as it happens with the string returned by wxStyledTextEvent::GetString().

Closes #17517.

wxtrac commented 8 years ago

2016-05-03 16:14:39: @vadz changed status from closed to portneeded

2016-05-03 16:14:39: @vadz changed resolution from fixed to port to stable

2016-05-03 16:14:39: @vadz commented

This should be backported to 3.0 too if no problems are found with it in 3.1.

wxtrac commented 3 years ago

2021-02-24 00:41:11: @imciner2 commented


Is it worth porting this back to stable at this point, or should this just be closed?

wxtrac commented 3 years ago

2021-02-24 00:53:46: @vadz commented


Cherry pick doesn't apply cleanly and while backporting it manually shouldn't be very hard, it would need to be tested in 3.0 before merging, so I don't plan to do it. If anybody else does, please do, otherwise this will get closed when 3.2 is released.