i3 / i3status

Generates status bar to use with i3bar, dzen2 or xmobar
BSD 3-Clause "New" or "Revised" License
601 stars 253 forks source link

Feature request: handle clipped UTF-8 in print_file_contents() #410

Open j39m opened 4 years ago

j39m commented 4 years ago

My naive reading of RFC 3629 suggests that it's easy to detect UTF-8 octet sequences. May I submit a pull request to further truncate file contents upon encountering clipped UTF-8?

orestisfl commented 4 years ago

I feel that the solution here should be to read max_chars utf-8 characters instead of max_chars bytes.

j39m commented 4 years ago

Agreed. I'm thinking

Does this seem like a reasonable approach?

orestisfl commented 4 years ago

However, all of this sounds too much to implement in i3status. If you want to see C code for this there is glib: https://gitlab.gnome.org/GNOME/glib/-/blob/master/glib/gutf8.c. Specifically we'll need g_utf8_validate, g_utf8_strlen and dependencies.

AFAIK, glib is not an i3status dependency like in i3.

j39m commented 4 years ago

When I originally envisioned this, I was only thinking of peeking at the leading 4-5 bits of any byte (marginally improving on the status quo but not implementing proper UTF-8 support). Would you prefer a PR that creates a glib dependency or are you saying this is infeasible for now?

orestisfl commented 4 years ago

Now that I think of it the following is not that hard:

  1. Allocate 4 * max_chars memory
  2. Read 4 * max_chars memory
  3. while i < 4 * max_chars
    1. cnt++
    2. if buf[i] == 0: break
    3. else if buf[i] starts with 0: i++
    4. else if buf[i] starts with 110: i+=2
    5. else if buf[i] starts with 1110: i+=3
    6. else if buf[i] starts with 11110: i+=4
    7. if cnt == max_chars: break
  4. buf[i] = '\0';

Notes:

  1. We don't fully check for bad characters, let pango handle them
  2. Each utf8 char is a maximum of 4 bytes so we can allocate all the memory we need from the start and read the max possible number of bytes
j39m commented 4 years ago

Looks good to me, thank you!

Would you be willing to add this test case to your CL?

orestisfl commented 4 years ago

Well, our implementation looks very similar, didn't know you had already started writing this. Anyway, let's see what the other members say about this. I'll add you as a co-author in the final commit if approved.