gtaubman / doneyet

A NCurses based To-Do list manager!
MIT License
31 stars 6 forks source link

Unable add accented words #46

Open Chavao opened 1 year ago

Chavao commented 1 year ago

Describe the bug I tried to add words with accents, I'm Brazilian so we have a lot of words with accents. When I press the acute accent and the following letter, the letter disappears.

E.g.

I wrote fiancé

image

image

To Reproduce Steps to reproduce the behavior:

  1. Press a to add a new Task
  2. Write an accented word

Expected behavior Appear the accented word, like fiancé

Software configuration (please complete the following information):

Chavao commented 1 year ago

If you have any tips on how to solve it, I could open a PR to fix it. I didn't do it yet, because I'm not a C++ programmer, so it's hard to figure out the path to solve it.

stweise commented 1 year ago

hi @Chavao thank you so much for the input. I am looking into options right away.

stweise commented 1 year ago

Support for wide characters if full of pitfalls it turns out. Finding these is hard and I have therefore create a minimal testing application to verify the basics before I go into much more detail in code.

I also simplified the build process and would like to verify that on different platforms.

please check out https://github.com/gtaubman/doneyet/tree/wide_char_support build using cmake (mkdir build; cd build; cmake ..; make; ) and run wide_input_test (./wide_input_test) and let me know if you are able to add accented characters

this is what it looks like for me on some input:

Bildschirmfoto 2023-09-29 um 12 34 50

let me know if it works for you too and please give me some details about your platform (OS, Version)

This test: Macbook Air M2, macOS 13.5.2 (22G91)

stweise commented 1 year ago

Also works on Xubuntu (Ubuntu with XFCE) 22.04.02 LTS

stweise commented 1 year ago

hmm seems I had ncursesw already installed on those machines, testing on a plain machine right now

Kubuntu (Ubuntu with KDE) 22.04.03 LTS

needed to install these apt install cmake pkg-config libncursesw5-dev libncurses-dev

works as well

Chavao commented 1 year ago

I tried on Lubuntu 22.04.3 LTS using zsh on Tilix and it worked fine

image

stweise commented 1 year ago

awesome, then I know it works for you and me :-)

I am working on cleaning the code right now, something I have been pushing off for years. This should simplify the task of adding, tracking and debugging the changes.

I will focus in the serialization first, as I fear that this is where all of the extension will break. for reference see

void Task::Serialize(Serializer* s) and
void Serializer::WriteString(string str)

currently tasks are stored by writing chars which are 8 bit -> 1 byte but our new characters are wchar_t, which is 16 bit or 64 bit > 2 or 4 byte depending on the compiler used

gtaubman commented 1 year ago

Wow, great start Steffen!!

On Fri, Sep 29, 2023 at 9:23 AM Steffen Weise @.***> wrote:

awesome, then I know it works for you and me :-)

I am working on cleaning the code right now, something I have been pushing off for years. This should simplify the task of adding, tracking and debugging the changes.

I will focus in the serialization first, as I fear that this is where all of the extension will break. for reference see

void Task::Serialize(Serializer* s) and void Serializer::WriteString(string str)

currently tasks are stored by writing chars which are 8 bit -> 1 byte but our new characters are wchar_t, which is 16 bit or 64 bit > 2 or 4 byte depending on the compiler used

— Reply to this email directly, view it on GitHub https://github.com/gtaubman/doneyet/issues/46#issuecomment-1740891306, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABN4QMOGRKKWYCIEHBEA5WLX43DTHANCNFSM6AAAAAA472O2FQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

stweise commented 1 year ago

I reworked the serializer to support the wide characters. Created a little test application if you like to test too

./wide_serializer_test takes wide input and writes it together with a lengthy test string into a file called bla

Bildschirmfoto 2023-10-05 um 08 38 49

hex dump shows correct width information & printable ascii chars.

Bildschirmfoto 2023-10-05 um 08 39 28

Now I will adept most of the other classes handle any kind of string.

stweise commented 1 year ago

Support for wide characters is well underway. I am ready to set up a basic new project now, even using emoji

starting ./wide_serializer_test given this input

Bildschirmfoto 2023-10-11 um 22 03 15

and this rough code structure

 Project *p =  new Project(input);
 Serializer* s = new Serializer("", "./bla");
 s->SetVersion(NOTES_VERSION);
 p->AddTaskNamed(L"new task meet with fiancé");
 p->AddTaskNamed(L"cool new öäü");

we already have this binary representation now

Bildschirmfoto 2023-10-11 um 22 03 40

i also already read this back and the wide strings match perfectly.

Next up: make sure all the ncurses drawing still works.

stweise commented 12 months ago

Wanna give a little heads-up.

I am still working on this. The way that wide char support is handled in ncurses forms (form.h) is really unclear to me. There are no additional functions offered and the underlying implementation seems to just accept rendered UTF-8 in char *. I am going to create a simple forms application and verify this behavior. Doing so in the main application is way too tedious.

stweise commented 8 months ago

I am happy to report that I was able to build a version that works with UTF-8. Please see attached screenshot.

The development branch is located at https://github.com/gtaubman/doneyet/tree/without_wide_char

⚠️ Warning: I have cleaned up the file format for doneyet and made several cleanup changes, don't be overwhelmed by the amount of change you see there.

Still todo:

Bildschirmfoto 2024-02-07 um 20 17 50
gtaubman commented 8 months ago

Wow fantastic work!! No, I never tested or even attempted windows support. Mac and linux were the only OSes I tried.

On Wed, Feb 7, 2024 at 2:39 PM Steffen Weise @.***> wrote:

I am happy to report that I was able to build a version that works with UTF-8. Please see attached screenshot.

The development branch is located at https://github.com/gtaubman/doneyet/tree/without_wide_char

⚠️ Warning: I have cleaned up the file format for doneyet and made several cleanup changes, don't be overwhelmed by the amount of change you see there.

Still todo:

  • provide a safe way for contributors to test (file format breaking change)
  • provide a safe transition for the file format change
  • test extensively on Linux & Mac @.*** https://github.com/gtaubman did you ever really support this for Windows?)
  • document the input methods (utf-8 on Linux \u+008 - Ctrl + Shift + U)
  • refactor change into packets someone can review/understand if needed

Bildschirmfoto.2024-02-07.um.20.17.50.png (view on web) https://github.com/gtaubman/doneyet/assets/16466973/e432357e-014b-4170-89b5-8169441554fa

— Reply to this email directly, view it on GitHub https://github.com/gtaubman/doneyet/issues/46#issuecomment-1932742442, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABN4QMP3VV4NINANWQLPPQTYSPJ63AVCNFSM6AAAAAA472O2FSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZSG42DENBUGI . You are receiving this because you were mentioned.Message ID: @.***>