jrincayc / ucblogo-code

Berkeley Logo interpreter
https://people.eecs.berkeley.edu/~bh/logo.html
GNU General Public License v3.0
181 stars 34 forks source link

Unicode support #5

Open Alexey-Slyusar opened 4 years ago

Alexey-Slyusar commented 4 years ago

I learned from Dr. Harvey's personal page that UCBLogo is again under active development. Great news, I must say!

Is there a goal to add Unicode support?

Anyway, based on my own experience, UCBLogo interpreter, in accompany with awesome three volumes of CSLS, are the best, richest and the most comprehensive environment available both for computer hobbyists who interested in Logo philosophy of education and for independent learners. So many thanks to @brianharvey for great effort made to provide everyone with such excellent environment and to @jrincayc for taking over further development!

Alexey-Slyusar commented 4 years ago

BTW I've built WxWidgets version of UCBLogo from this repo on my Ubuntu 18.04 Linux without any problems. But due to the Unicode locale I can interact with the interpreter only in English. The interpreter doesn't show text entered when I switch keyboard layout to my native language.

That is why I usually use UCBLogo 5.5-3 Ubuntu deb package to get UCBLogo on my machine. I run UCBLogo inside Emacs with properly tuned MULE to get interpreter in my native language. It works fine for me, but I think that this is too rigid way to get things work for complete beginners.

jrincayc commented 4 years ago

I think unicode support would be useful. I would recommend doing it as UTF-8 instead of wide chars or similar. Patches/pull requests to do this would be good. (It would probably be awhile before I get to it.)

Alexey-Slyusar commented 4 years ago

I am not a professional programmer, I am a kind of ERP accounting functional consultant. I use Logo and little bit of Scheme (many thanks to Dr. Harvey also, Simply Scheme book is great!) to do some math-oriented activities with my kids and for my personal intellectual development. So, unfortunately, I cannot contribute production C/C++ code. But I can help with testing and with documentation.

вт, 5 нояб. 2019 г. в 06:24, Joshua Cogliati notifications@github.com:

I think unicode support would be useful. I would recommend doing it as UTF-8 instead of wide chars or similar. Patches/pull requests to do this would be good. (It would probably be awhile before I get to it.)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jrincayc/ucblogo-code/issues/5?email_source=notifications&email_token=AKDK5JFHJP6YDAELTDFMIBTQSDRN3A5CNFSM4JIYGOR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDBPDXQ#issuecomment-549646814, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKDK5JC2OTS2AHS3OWCCDMTQSDRN3ANCNFSM4JIYGORQ .

jrincayc commented 4 years ago

Hm, so you can't type in anything when it is in your keyboard layout?

Alexey-Slyusar commented 4 years ago

Yes, that is exactly what I experience. But I have the same experience, when I just install regular Ubuntu 18.04 deb package - ucblogo 6.0+dfsg-2 - rather than build the program from source code. That is why, I think that this is not a compilation related problem.

Also, when I try to copy non-English text from, say, Emacs and past it into the interpreter window, I get an error: ../src/common/unichar.cpp(65): assert "Assert failure" failed in ToHi8bit(): character cannot be converted to single byte. Here is the error window: Выделение_012 And here is complete BACKTRACE output:

ASSERT INFO:
../src/common/unichar.cpp(65): assert "Assert failure" failed in ToHi8bit(): character cannot be converted to single byte

BACKTRACE:
[1] wxUniChar::ToHi8bit(unsigned int)
[2] wxEvtHandler::ProcessEventIfMatchesId(wxEventTableEntryBase const&, wxEvtHandler*, wxEvent&)
[3] wxEventHashTable::HandleEvent(wxEvent&, wxEvtHandler*)
[4] wxEvtHandler::TryHereOnly(wxEvent&)
[5] wxEvtHandler::ProcessEventLocally(wxEvent&)
[6] wxEvtHandler::ProcessEvent(wxEvent&)
[7] wxWindowBase::TryAfter(wxEvent&)
[8] wxEvtHandler::SafelyProcessEvent(wxEvent&)
[9] wxMenuBase::SendEvent(int, int)
[10] g_closure_invoke
[11] g_signal_emit_valist
[12] g_signal_emit
[13] gtk_widget_activate
[14] gtk_menu_shell_activate_item
[15] g_signal_emit_valist
[16] g_signal_emit
[17] gtk_main_do_event
[18] g_main_context_dispatch
[19] g_main_context_iteration
[20] gtk_main_iteration
[21] wxGUIEventLoop::Dispatch()
[22] wxEntry(int&, wchar_t**)
[23] __libc_start_main
Alexey-Slyusar commented 4 years ago

It is possibly worth to note also, that I do not have such problem with UCBLogo exe package from Dr. Harvey's personal web page on Windows 10. Windows version allows type in when it is in my keyboard layout, except several lower-case characters. All entered upper-case text handles correctly.

brianharvey commented 4 years ago

I'm tempted to recommend 32-bit characters. This is very old code; it assumes all over the place that characters are fixed-width (e.g., doing pointer arithmetic). One particularly problematic thing is that when you take the BUTFIRST of a word, you get a pointer into the same block of memory, so BF runs in constant time instead of having to copy the string. The node that represents a word includes a pointer to the beginning of the malloc'ed block, a pointer to the start of this word within the block, and a character count (so BUTLAST also works in constant time).

jrincayc commented 4 years ago

Yes, I can replicate that error. I will try to solve it, tho' it may be some months before I have time to look into it.

Alexey-Slyusar commented 4 years ago

Thank you!

Beginner starts learn Logo using so-called "Linguistic model". I mean that one usually get acquainted with Logo using metaphors like "talk with the Turtle" or "talk to Logo". And at this stage it is crucial to have an opportunity to practice this Linguistic model using ones native language.

But sooner or later, there comes a point, when the Linguistic model just stops working and one should replace it by Evaluation model that described in CSLS vol. 1 (Little Elves / people metaphor is very powerful one). At this stage, using native language becomes less important, I think. Non-English speaker can imagine that he or she should learn some kind of non-native English-Elvish alphabet and language in order to appropriate more powerful Evaluation mental model and discover all secrets of Logo, and further Scheme/Lisp, Python or any other programming language if it is necessary. But there is still a problem with handling data presented in non-English without Unicode support.

All above is from my kids and my own learning experience with Logo.

It is also matter of fact that one can use Racket environment with accompany with another Dr. Harvey's awesome book, called Simply Scheme, for similar purpose. Racket and its Simply-Scheme package supports Unicode right from the box and it is matter of trivial Simply Scheme package adjustment to get Simply Scheme procedures works with non-English characters. It is because Scheme is used as implementation language for Simply Scheme package rather that C/C++, so the code is more accessible for understanding and modifications.

But I know from my own experience, that UCBLogo in accompany with CSLS trilogy are still better choice for beginners and independent learners.

And there are beautiful math-oriented books over there that use Logo as a medium for mathematical exploration such as Turtle Geometry by Abelson and DiSessa, Investigations in Algebra by Al Cuoco, Approaching Precalculus Mathematics Discretely by Philip G. Lewis. But in this case, using Unicode to support Linguistic model is not an issue, though.

It is clear also, that it is non-commercial project, so we not expect that Unicode support will be added instantly. We very appreciate your time and efforts and grateful for your and Dr. Harvey's attention to this issue.

I'm not sure that I made myself perfectly clear here. So I am sorry for this long and vaguely formulated post. I just tried to express why Unicode support is important from Logo philosophy point of view. But as I said earlier, it is just my own humble opinion.

And much more important thing here is that we LOVE UCBLogo!

jrincayc commented 4 years ago

FYI: how to replicate: Paste нояб (Russian for nov) into the terminal with the Menu item Paste. This can be trapped in the debugger by breaking on wxTerminal::DoPaste()

jrincayc commented 4 years ago

Code to convert to utf8 (which will not display the data, but ascii first can be used to check that the data is there):

diff --git a/wxTerminal.cpp b/wxTerminal.cpp
index c395cf0..fe7a2e0 100644
--- a/wxTerminal.cpp
+++ b/wxTerminal.cpp
@@ -1092,6 +1092,8 @@ void wxTerminal::DoPaste(){
                  wxTextDataObject data;
                  wxTheClipboard->GetData( data );
                  wxString s = data.GetText();
+                 wxCharBuffer cbuff = s.utf8_str();
+                 const char *buff = cbuff.data();

                  int i; 
                  //char chars[2];
@@ -1099,9 +1101,9 @@ void wxTerminal::DoPaste(){
                  int num_newlines = 0;
                  int len;
                  char prev = ' ';
-                 for (i = 0; i < s.Length() && input_index < MAXINBUFF; i++){
+                 for (i = 0; i < strlen(cbuff)+1 && input_index < MAXINBUFF; i++){
                    len = 1;
-                   c = s.GetChar(i);
+                   c = cbuff[i];
                    if (c == '\n') {
                      num_newlines++;
                    }
jrincayc commented 4 years ago

Some messing around with utf8: https://github.com/jrincayc/ucblogo-code/compare/utf8_play

image

Alexey-Slyusar commented 4 years ago

Hi Joshua, I've built a logo version from utf8_play branch, but I cannot reproduce the displaying the Cyrillic text. Copying - pasting a Cyrillic text from another place do not work either.

image

I did: git clone https://github.com/jrincayc/ucblogo-code.git -b utf8_play cd ucblogo-code/ ./configure make

jrincayc commented 4 years ago

Yes, utf8_play needs a lot of work. Pasting only works from the menu, not from control v. Cursor and backspace movement work very poorly. I still haven't decided if changing to wchar_t or utf8 would be simpler. Both are going to be a fair bit of work.

Alexey-Slyusar commented 4 years ago

I see. Thank you.

пн, 27 апр. 2020 г. в 16:03, Joshua Cogliati notifications@github.com:

Yes, utf8_play needs a lot of work. Pasting only works from the menu, not from control v. Cursor and backspace movement work very poorly. I still haven't decided if changing to wchar_t or utf8 would be simpler. Both are going to be a fair bit of work.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jrincayc/ucblogo-code/issues/5#issuecomment-619940358, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKDK5JFF5I65EHVAEROQ67DROVYBRANCNFSM4JIYGORQ .

davidcostanzo commented 3 years ago

I still haven't decided if changing to wchar_t or utf8 would be simpler. Both are going to be a fair bit of work.

I can share my experience if it'll help you decide. I just finished updating FMSLogo (an old fork of UCBLogo) to support Unicode and I ended up using wchar_t. On Windows, sizeof(wchar_t) == 2, which means that there are still variable-length characters. As a result, FMSLogo really only supports the BMP.

From my experience, the hard part about supporting Unicode was not the internal character representation, but backward compatibility. For FMSLogo, if you use UTF-8 files, your program is required to start with the Unicode signature (a BOM). The enables any program written in ANSI code pages to continue to run. FMSLogo had many other backward compatibility risks that I don't think UCBLogo has (networking, calling into DLLs, direct access to keyboard events, etc), so this might not be as difficult for you. If you're willing to have a clean break with all old non-ASCII programs, your life will be much easier.