Closed m-matsubara closed 4 months ago
Committed the fix to DoShiftTabKey.
I cannot reproduce the issue of crash when pasting a string containing U+00A0. Could you please open an issue with detailed instructions for reproducing it?
I don't see how replacing #160
with WideChar(#$00A0
) helps. Although it is used in some places in the source code the typecasting is redundant. I would probably settle for just #$A0, but then we should consistently enforce it throughout and I don't think it is worth the trouble.
From the Delphi docs:
A control string is a sequence of one or more **control characters**, each of which consists of the # symbol followed by an unsigned integer constant from 0 to 65,535 (decimal) or from $0 to $FFFF (hexadecimal) in UTF-16 encoding, and denotes the character corresponding to a specified code value. Each integer is represented internally by 2 bytes in the string.
Still cannot see the reason for change. e.g.
while (P^ >= #1) and ((P^ <= #32) or (P^ = #160))
is more consistent than
while (P^ >= #1) and ((P^ <= #32) or (P^ = #$00A0))
And in any case how is this related to the bug in the title?
thank you.
To reproduce, simply paste the text containing U+00A0. (I wanted to include a text sample, but U+00A0 in the comment seemed to be converted to U+0020.)
The cause of the hangup is that case statement line 6108 in function TCustomSynEdit.TextWidth(P: PChar; Len: Integer): Integer;
is not processed correctly, resulting in an infinite loop.
The type cast for WideChar is certainly unnecessary, but I wrote it in a similar way in function TCustomSynEdit.IsWordBreakChar(AChar: WideChar): Boolean;
, so I adapted it.
I removed the type cast as it is unnecessary.
Note that #$A0 does not solve the problem, and it seems that you need to write #$00A0.
To reproduce, simply paste the text containing U+00A0.
I cannot reproduce this.
After copy paste:
Non breaking spaces are highlighted. Spaces are not.
In the following case, if the character of P^ is U+00A0, the case on line 6108 will not be processed and will be processed with break in 6110.
If you modify #160
to #$00A0
, the case on line 6108 will be handled.
6104: while P < PEnd do
6105: begin
6106: case P^ of
6107: #9: Inc(Result, fTabWidth * fCharWidth - Result mod (fTabWidth * fCharWidth));
6108: #32..#126, #160: Inc(Result, FCharWidth);
6109: else
6110: break;
6111: end;
6112: end;
If you modify
#160
to#$00A0
, the case on line 6108 will be handled.
Is this a compiler bug or what?? Do you get this in both 32 bits and 64 bits? Which Delphi version are you using? As I said above I cannot reproduce it here (only tried Win64).
sorry.
The following RSS-391 String with non-ASCII characters directly attached to a #xx or #$xx literal corrupts the final string
may be the cause.
https://blogs.embarcadero.com/rad-studio-12-1-athens-patch-1-available/
I'll investigate. Please wait a moment.
I tried it with Delphi 12 and Delphi 12.1 patch 1, but neither worked properly.
(Target platform is Windows 64bit)
Very strange. I am also using Delphi 12 with patch 1. (The patch fixes an unrelated issue).
Win32 or Win64?
Any compiler options that may affect this?
Can you run the following console app?
program CharTest;
{$APPTYPE CONSOLE}
uses
System.SysUtils;
var
P : PChar;
S: string;
begin
S := #160;
P := PChar(S);
case P^ of
#32: WriteLn('Space');
#160: WriteLn('NB Space');
end;
ReadLn;
end.
What do you get?
I haven't had enough coffee yet, but could this have anything to do with {$HIGHCHARUNICODE ON/OFF}?
@MShark67 You are my hero! This was driving me crazy.
I haven't had enough coffee yet, but could this have anything to do with {$HIGHCHARUNICODE ON/OFF}?
This indeed might explain it. The docs say the default value is OFF. Is there a compiler option that affects this?
So in the Japanese ANSI Codepage #160
corresponds to another Unicode letter. (@m-matsubara could you please confirm this) while in my ANSI codepage and @MShark67 one ord(WideChar(#160)) = 160
, so just by luck it is working OK.
I would suggest the following.
WideChar(#$00B4)
will be removed.@m-matsubara @MShark67 Any volunteers for doing this?
Is #160
treated as U+F8F0
?
It seems that when #160
is processed with MultiByteToWideChar
, it becomes U+F8F0
.
program Project1;
{$APPTYPE CONSOLE}
uses
System.SysUtils;
var
P : PChar;
S: string;
begin
{$HIGHCHARUNICODE OFF}
S := #160;
P := PChar(S);
case P^ of
#32: WriteLn('Space ' + IntToHex(ord(P^)));
#160: WriteLn('NB Space ' + IntToHex(ord(P^)));
else WriteLn('else ' + IntToHex(ord(P^)));
end;
{$HIGHCHARUNICODE ON}
S := #160;
P := PChar(S);
case P^ of
#32: WriteLn('Space ' + IntToHex(ord(P^)));
#160: WriteLn('NB Space ' + IntToHex(ord(P^)));
else WriteLn('else ' + IntToHex(ord(P^)));
end;
{$HIGHCHARUNICODE OFF}
S := #$00A0;
P := PChar(S);
case P^ of
#32: WriteLn('Space ' + IntToHex(ord(P^)));
#160: WriteLn('NB Space ' + IntToHex(ord(P^)));
else WriteLn('else ' + IntToHex(ord(P^)));
end;
{$HIGHCHARUNICODE ON}
S := #$00A0;
P := PChar(S);
case P^ of
#32: WriteLn('Space ' + IntToHex(ord(P^)));
#160: WriteLn('NB Space ' + IntToHex(ord(P^)));
else WriteLn('else ' + IntToHex(ord(P^)));
end;
ReadLn;
end.
It seems that when
#160
is processed withMultiByteToWideChar
, it becomesU+F8F0
.
This explains everything.
#160
is ascii (or ansi) character ?
#$00A0
is Unicode character ?
#160
is ascii (or ansi) string ?#$00A0
is Unicode string ?
It will be clear if you reed the docs
thunks.
understood.
Then #$00A0
seems appropriate.
Created new issue #95 Volunteers to provide a PR invited.
Fixed a bug where
U+00A0
(No-break space) was handled incorrectly, causing a hang when pastingU+00A0
. In addition, all parts that are treated as#160
are changed toWideChar(#$00A0)
.SynEdit.pas:8066 also corrected the wrong conditional expression. (or → and)