Open GoogleCodeExporter opened 9 years ago
So, as posted on the mailing list, the patch to SDL input to allow unicode is
pretty
small:
int SDLInput::convertKeyCharacter(SDL_Event event):
- int value = 0;
-
- if (keysym.unicode < 255)
- {
- value = (int)keysym.unicode;
- }
+ int value = keysym.unicode;
However, gcn::TextBox and gcn::TextField would also require changes to make
sure the
cursor position isn't in the middle of a multi-byte unicode character (changes
available in TMW source). But maybe this should be some kind global option
somewhere,
in order not to bother people who are not using unicode.
Rendering unicode strings correctly is up to the font class.
Original comment by b.lindeijer
on 18 Feb 2008 at 10:54
For this to work correctly, the enum in key.hpp will have to be changed, since
special keys like shift, alt, ctrl have id's that start at 1000, which overlaps
some
unicode characters.
Original comment by final...@gmail.com
on 18 Feb 2008 at 11:02
Ah true, our SDLInput class uses a copy of that enum with LEFT_ALT starting at
-1000
instead of 1000, to avoid conflicts with higher character codes.
Original comment by b.lindeijer
on 18 Feb 2008 at 11:13
Hi,
I just finished my UTF-8 solution for guichan.
All widgets that display texts without any manipulation (window, button, ...)
may
simply use an UTF-8 aware fonts. TextBox and TextField are not one of those
widgets,
so I had created this package.
This package contains:
- UTF-8 version of TextField (UTF8TextField)
- UTF-8 version of TextBox (UTF8TextBox)
- UTF8StringEditor - helper class for manipulating UTF-8 strings
- SDLUTF8TrueTypeFont - Extended SDLTrueTypeFont (from guichan addons dir)
- key.diff - solves the issue reported by finalman applying b.lindeijer
solution :)
- utf8 template library from http://utfcpp.sourceforge.net/
I didn't test the SDLUTF8TrueTypeFont class, I use an other UTF-8 SDL_ttf
solution in
my project, but it was too depended off other stuff, so I simply modified
SDLTrueTypeFont (getStringIndexAt is from my original class, so it should work).
I made UTF8StringEditor an external class, so more widgets may use it and
because
later, you could write more obscure string editors based on std::string (even
for
encodings like UCS-4 which use fixed 32bit integers for storing single
character).
The screenshot attached shows some international characters displayed with
DejaVuSans. I have no idea what the texts means, I just copied some random
characters
from http://dejavu.sourceforge.net/wiki/index.php/Testing, so please don't
blame me
if it is something insulting :)
I hope this code will find itself in guichan in a future release, but until
then, you
can simply use this package (remember to apply the patch to guichan if you
intend to
use unicode characters >= 1000).
I think you may find it useful, if so, I would appreciate some comments.
Original comment by nexat...@gmail.com
on 24 Mar 2008 at 2:55
Attachments:
It seems that you are on to something here. I like the fact that non of the
original
widgets needs to change, that you use specific UTF8 aware widget instead when
it's
needed.
I also like the fact that an UFT8 package can be isolated from the rest of
Guichan,
perhaps in it's own namespace under the gcn namespace.
However, the use of a plain std::string seems to me a bit risky. I mean the
string
itself can't be used as a regular std::string. Perhaps a better approach would
be to
let the UTF8, or unicode, aware widgets work on an abstract string
implementation
that looks like an ordinary string but works with different encodings. If you
want to
change the encoding you don't change the stringeditor (it's not needed), you
simply
pass another instance of the abstract gcn::string to the widgets. The abstract
string
could use a fixed bit length (say 16 bits per character even though some
characters
in fact are made of 8 bits) so random access works properly.
The core widgets that need to edit text or display text could also be changed
so they
better abstract the way strings are used making it easier to implement unicode
aware
widgets that use string that's not an std::string.
Original comment by olof.nae...@gmail.com
on 24 Mar 2008 at 11:00
I love UTF-8 because it is ASCII/Latin1 compatible. I used before ISO-8859-2,
but
problems with character tables where constant!
UTF-8 was invented to allow older applications (using 8 bit integers as
character
cell) to use UNICODE characters with minimum to no modification to existing
software.
Since most of the widgets will transparently work with UTF-8, why should
anything be
changed?
From guichan about page:
"" Guichan is a small, efficient C++ GUI library designed for games """
Since guichan is made for games, how many text fields a game may have ? Only a
few,
probably in high score editor and maybe in options (unless you are making a
Space
Empires clone) and ofcourse the mana world login/registration/character setup.
If
some one is using single byte character sets, they may use ImageFont with all
widgets. If some one needs real UNICODE support, he uses UTF-8 versions of
widgets
where required. Performance costs are minimum for a few widget rarely used, also
remember that SDL_ttf uses UNICODE internaly (I think UCS-2), so when you
render a
Latin1 string it is also converted to UNICODE.
There is also a std::wstring class, nice, but same as wchar_t, it's 16bit wide
on
windows and 32bit wide on linux. MinGW, which is very popular under windows,
doesn't
have std::wstring. Also, UTF-8 looks way better in C++ source code and doesn't
need
special handling from C++ compiler.
Such abstract string whould have to use UCS-4 for example, and then be
converted to
anything the user wants with iconv or something similar, but how whould the
source
code look ?.
Oh, and if I remember correctly, wxWidgets uses UTF-8 internaly!
Original comment by nexat...@gmail.com
on 24 Mar 2008 at 11:38
The derivations from the abstract string class can use any type of encoding. It
would
be completely transparent to the widgets. The only thing that needs to know
about the
encoding is the font.
Personally I don't really care about the number of text fields or text areas,
it's
totally irrelevant. The big issue here is that it needs to be easy to use, it
has to
scale well, it should require as little change to Guichan as possible and it
should
not break the usage of plain old ascii std::strings.
I can't tell you how the source code would look like, but it would be an
implementation of a string that's aware of it's encoding, much like your
StringEditor. I think going with the Java approach making an UTF-16
implementation
is a good start. That way users who need unicode can use a special UTF-16
string and
unicode aware widgets, and users who don't need unicode can stick with
std::string
and the normal non unicode aware widgets.
If the core widgets are changed so that they always use virtual methods for
accessing
and drawing strings then much of the code can be reused. Unicode aware widgets
could
simply inherit protected from a core widget and reuse most of the core widget's
code,
only substituting the way text is handled.
Original comment by olof.nae...@gmail.com
on 24 Mar 2008 at 12:30
UTF-8 doesn't have any problems with byte endianess since it uses 8bit code
sequences
(I don't realy know why UTF-8 has support for BOM).
On the other hand, UTF-16 is also variable length character encoding. A single
character may take one 16bit integer or two 16 bit integers, so it doesn't
solve the
requirement for iterating through the string to get character position from byte
position!
UTF-8 advantages:
- It's a good standard, all C++ compiles will handle it transparently.
- Most editors can handle UTF-8 source files.
- No special magic is required in source code to create UTF-8 constants.
- SDL_ttf can not handle UTF-16 string, only Latin1, UTF-8 and UCS-2
- Allegro, by default assumes all strings are in UTF-8
I understand having a string class like utf8::string would be nice, but I
couldn't
find one anywhere, and I am not about to write such freak (I am not an expert
on STL,
I made a doxygen html help for GNU STL implementation, it's 36MB in size!!!!).
Don't
you think it is out of the scope of guichan ?
I think making TextField and TextBox in form of templates could handle the
situation.
typedef TTextBox<std::string> TextBox; // for backward compatibility
This way, TextBox whould still work for everyone using it today. My UTF8TextBox
will
still work and could be used until a brave soul writes utf8::string,
utf16::string
ucs32::string or whatever.
However, I still think UTF-8 is the most portable solution to handle
internationalization in the least painfull way.
Or maybe it is time to start a new project called Portable Unicode C++ string
that
will implement strings in a way similar to Python or Java strings.
Anyway, what do I have to do with this package so it may go to guichan/addons ?
Original comment by nexat...@gmail.com
on 24 Mar 2008 at 1:14
I think the small changes required for UTF-8 support in TMW and these classes by
Przeme show that there is really no need to abstract away std::string just to
have
UTF-8 support. The only thing you need is some helper functions that allow you
to
modify the string, calculate the length in characters, etc.
I like the implementation by Przeme and look forward to seeing these classes
available in Guichan. I also think it's a good idea to base the UTF-8 support on
UTF8-CPP. I've got two small remarks on his code:
* Style: Please don't use tabs, Guichan uses 4 spaces to indent code.
* Efficiency: Couldn't UTF8StringEditor::insertChar use utf8::append on an empty
std::string using the std::back_inserter (noted in UTF8-CPP docs for
utf8::append),
and then then use the normal std::string insert method to insert the new
unicode part
at the requested index?
Original comment by b.lindeijer
on 24 Mar 2008 at 10:58
All the UTF8 aware widgets inherits publicly from a core widget. Isn't it
better with
a protected inheritance revealing only the valid functions or have a public
inheritance overloading all functions that deal with text? If you use the
UTF8TextBox
I don't really see the point in having setCaretColumn available (or other
methods
that deal with non UTF8 strings) as only setCaretColumnUTF8 should be used (and
other
methods that deal with UTF8 strings).
Also you still have some problems adapting to our code conventions. We always
use a
new line for all brackets. We don't have shortened names for methods in
Guichan, we
try to keep our code consistent so it will be easier to use.
Another thing, isn't the parameter byteOffset in all of the StringEditor
functions
more like a characterOffset? Perhaps naming the parameter simply offset will
make
it's use clearer.
I think your code could be added as an add on. I'm planning on incorporating
the add
ons into the main source code, but under another namespace so add ons will be
easier
to spot in the future.
Original comment by olof.nae...@gmail.com
on 25 Mar 2008 at 6:09
I use tabs with size set to 4. I will change them to spaces.
b.lindeijer, I'm not STL guru, but I se it's time to start learning. I will try
to
implement insertChar with back_inserter.
"""
All the UTF8 aware widgets inherits publicly from a core widget. Isn't it
better with
a protected inheritance revealing only the valid functions or have a public
inheritance overloading all functions that deal with text? If you use the
UTF8TextBox
I don't really see the point in having setCaretColumn available (or other
methods
that deal with non UTF8 strings) as only setCaretColumnUTF8 should be used (and
other
methods that deal with UTF8 strings).
"""
UTF8 version of those methods could be avoided if Caret functions where virtual.
I will modify the string editor so offset will be always byte offset while
index will
be character index.
I will upload the updated code later today.
And my name is Przemek :)
Original comment by nexat...@gmail.com
on 25 Mar 2008 at 10:07
It shouldn't matter if they are virtual or not, if someone casts an UTF8TextBox
to a
TextBox well then they probably know what they are doing. Of course, making the
functions virtual would make it possible to perform such a cast and use the
TextBox.
The functions aren't virtual as they might be called from a constructor. But
perhaps
they could be made virtual. Anyway, I still think you should go with overloading
methods than adding new ones and letting the old ones that can't be used laying
around.
Original comment by olof.nae...@gmail.com
on 25 Mar 2008 at 10:36
Now I see using back_inserter makes more sense and the code is more clear.
I hope everything is ok now.
Since guichan doesn't implement a clipboard nor selections in textbox and
textfield,
leaving original caret manipulation functions may be useful, because someone
may want
to modify the text without UTF-8 knowledge:
void ctrlCPressed() {
TextField* myTextField = getUTF8TextField();
int caret = myTextField->getCaretPosition();
std::string x = myTextField->getText();
x.insert(caret, clipboard);
myTextField->setText(x);
myTextField->setCaretPosition(caret + clipboard.size());
}
Original comment by nexat...@gmail.com
on 25 Mar 2008 at 11:27
Attachments:
> It shouldn't matter if they are virtual or not, if someone casts an
UTF8TextBox to a
TextBox well then they probably know what they are doing.
Actually you don't need to cast an UTF8TextBox to a TextBox, you could simply
assign
it. But everybody should of course know what they are doing.
If somebody instanciates an UTF8TextField instead of a normal TextField, I
think they
should realize that the std::string you get/set _must_ be a valid UTF-8 encoded
string. As such, the suggested implementation of ctrlCPressed is completely
wrong in
my opinion. If somebody wants to write code like that, he shouldn't be using an
UTF8TextField.
So anyway I would prefer the appropriate methods to be made virtual, so that
they can
be overridden with proper UTF-8 behaviour. I really don't want to need to
bother with
functions named like setCaretColumnUTF8 (I hadn't noticed the methods were
called
like this before).
Original comment by b.lindeijer
on 26 Mar 2008 at 8:04
ctrlCPressed() method is correct. UTF8TextField & UTF8TextBox always returns
caret
positions in correct places before a character or after the last character. So
inserting other UTF-8 string on caret position is valid, just like moving the
caret
by the size of the inserted text.
This is an ugly solution, but still, the behaviour will be as expected.
Original comment by nexat...@gmail.com
on 26 Mar 2008 at 8:21
Ah, you're right. And now I see where you're coming from since the code that
sets the
caret position would break if setCaretPosition was an UTF-8 aware method (since
it
gets a byte index and would treat it as a character index).
Now I'm not so sure anymore, maybe we should have both versions available...
Original comment by b.lindeijer
on 26 Mar 2008 at 8:33
I am sure both versions should be avaible. The question is, which method should
return character index and which the byte index.
I think the Caret position places the caret at specified character index, so
maybe
instead of getCaretPositionUTF8, there should be getCaretPositionByte(). This
would
require virtual methods in TextField and TextBox (for set/get caretPosition and
caretColumn).
Original comment by nexat...@gmail.com
on 26 Mar 2008 at 9:59
It might be good to keep both methods, the UTF8 string ones and the original
ones,
but I still don't like the public inheritance, it has to be changed if this is
to be
added to Guichan.
I want the UTF8 methods to skip the UTF8 suffix (as a user should be safe in
assuming
an UTF8 widget works on UTF8 strings) and that new methods are added that works
like
the original ones where their purpose are clearly stated with good names and
good
documentation.
Remember, getCaretPosition should only return the position of the caret in the
string. Now a caret's position is determined by the number of characters before
the
caret. The name implies it should have _nothing_ to do with UTF8. We don't want
to
confuse people, like Björn :) A method that takes UTF8 into consideration
should be
added, but with another name, perhaps getCaretPositionByte that explains it
doesn't
return a normal caret position. I think getCaretPositionInUTF8Bytes is even
better.
If someone wants to implement a clipboard like your suggestion then they use the
setCaretPosition method completely wrong as you have to take UTF8 into account,
with
your implementation it's in my opinion just a coincidence it works as the
implementation of setCaretPosition is wrong.
Original comment by olof.nae...@gmail.com
on 26 Mar 2008 at 4:38
I propose to add to TextField virtual getCaretBytePosition and for TextBox get
getCaretByteColumn (and the setters). That way, everybody will know that
CaretPosition use character index and CaretBytePosition use byte index. Then, my
clipboard example will always work the same for Latin, UTF-8, UTF-16 and UTF-32
as
long as the clipboard and the edited text are both in the same encoding (not
counting
base64, utf-7 and alike).
An alternative solution could be to expose StringEditor which already have
getOffset
and countCharacters. However it will only work for UTF8TextBox/Field since it
doesn't
have.
I think you can't decide what I18n policy you want to implement in guichan.
For future releases (like v1.0.0), the throuth is that using my StringEditor
class
(modfied to remove ALL raw string manipulations I left in TextBox and TextField
because of possible performance losses) is the best solution. Then, all you
need is
to provide 2 string editors in guichan core:
ByteStringEditor/LatinStringEditor/ASCIIStringEditor (or anything else you
prefere)
and my UTF8StringEditor. That way, user will be able to manipulate any type of
strings inside text boxes and in code. This makes everything easyer. That way,
there
is only one TextBox class and one TextField class. If you think UTF-32 is
better, you
write UTF32StringEditor and every textbox and text field may use UTF32 strings
(casting from (char*) to (Uint32*) isn't that hard :) )
This is my clipboard example using StringEditor:
void ctrlCPressed() {
TextField* myTextField = getTextFieldWithSomeStrangeEncoding();
int caret = myTextField->getCaretPosition();
StringEditor *editor = myTextField->getStringEditor();
std::string x = myTextField->getText();
editor.insert(text, clipboard, caret);
myTextField->setText(x);
myTextField->setCaretPosition(caret + editor->getLength(clipboard));
}
Something like string editors is also implemented in PHP. You have multibyte
string
functions which operates on byte strings:
$myExoticTextLength = mb_strlen($string, "EUC-JP");
$myPolishTextLength = mb_strlen("ąźćśłółŹŻżĘ", "UTF-8");
I understand this is not the best solution for an operating system GUI, but for
a
game gui, this is much more than most other toolkits provide.
Original comment by nexat...@gmail.com
on 26 Mar 2008 at 5:37
Should I make a patch for TextBox and TextField to support virtual carret
methods ?
Original comment by nexat...@gmail.com
on 28 Mar 2008 at 7:33
If you could get the whole thing in one patch, that would be welcome. If you
want
added files to show up in 'svn diff', you'll have to 'svn add' them first.
Original comment by b.lindeijer
on 1 Apr 2008 at 9:29
Please look at Glib::ustring for an implementation of a utf-8
std::string-compatible
class. In essence, from a user's point of view the only major change is that
the
indices become per-character instead of per-byte.
Original comment by douso...@gmail.com
on 15 Jul 2008 at 11:10
Disregarding any further unicode support, the keycodes starting with 1000
inhibit any custo, implementation of unicode support.
Please change that to: "LEFT_ALT = -1000" - that's good enough for now.
Original comment by klaus.bl...@web.de
on 21 Mar 2009 at 3:38
Hello. I have never used Guichan but want to add a comment:
Your library looks good, but the fact that it does not support unicode made me
not
test it or use it for my game.
I am just "one" person, but I want to tell it is important for me.
Good luck
Original comment by christia...@gmail.com
on 5 Jun 2010 at 9:46
I feel it's my fault, 2 years ago I was supposed to send a patch...
Original comment by nexat...@gmail.com
on 7 Jun 2010 at 6:14
Hi!
Any chance to get some kind of solution for this problem anytime soon? Custom
incompatibly patches guichan forkes creeping everywhere which isn't really a
nice situation
Original comment by siccegge...@gmail.com
on 14 Jun 2011 at 6:29
The current version of Guichan will always be ASCII only.
Original comment by olof.nae...@gmail.com
on 14 Jun 2011 at 7:50
For openSUSE I've decided to update to 0.8.2 and apply the unicode patch, which
I hope it will be maintained in the future. Thanks for enabling this feature,
even if we have to run away from upstream, but for sure it's a most welcome one.
Original comment by nmo.marques
on 7 Oct 2011 at 5:41
Original issue reported on code.google.com by
mateusz....@gmail.com
on 15 Sep 2007 at 9:06