Closed irssibot closed 9 years ago
diff-irssi-utf8.txt
Index: fe-common/core/utf8.h
===================================================================
--- fe-common/core/utf8.h (revision 5189)
+++ fe-common/core/utf8.h (working copy)
@@ -12,5 +12,6 @@
int mk_wcwidth(unichar c);
#define unichar_isprint(c) (((c) & ~0x80) >= 32)
+#define is_utf8_leading(c) (((c) & 0xc0) != 0x80)
#endif
Index: fe-text/textbuffer.c
===================================================================
--- fe-text/textbuffer.c (revision 5189)
+++ fe-text/textbuffer.c (working copy)
@@ -23,6 +23,7 @@
#include "module.h"
#include "misc.h"
#include "formats.h"
+#include "utf8.h"
#include "textbuffer.h"
@@ -157,6 +158,16 @@
if (left > 0 && data[left-1] == 0)
left--; /* don't split the commands */
+ /* don't split utf-8 character. (assume we can split non-utf8 anywhere. */
+ if (left < TEXT_CHUNK_USABLE_SIZE && !is_utf8_leading(data[left])) {
+ int i;
+ for (i = 1; i < 4 && left >= i; i++)
+ if (is_utf8_leading(data[left - i])) {
+ left -= i;
+ break;
+ }
+ }
+
memcpy(chunk->buffer + chunk->pos, data, left);
chunk->pos += left;
This patch is broken and results in sporadic segfaults. See #875, #877.
Interesting. My friend and me have used this patch for years without any crashes. Sorry to cause trouble to others. I will follow up this issue.
This is revised patch.
diff-irssi-utf8-2.txt
Index: src/fe-common/core/utf8.h
===================================================================
--- src/fe-common/core/utf8.h (revision 5189)
+++ src/fe-common/core/utf8.h (working copy)
@@ -12,5 +12,6 @@
int mk_wcwidth(unichar c);
#define unichar_isprint(c) (((c) & ~0x80) >= 32)
+#define is_utf8_leading(c) (((c) & 0xc0) != 0x80)
#endif
Index: src/fe-text/textbuffer.c
===================================================================
--- src/fe-text/textbuffer.c (revision 5189)
+++ src/fe-text/textbuffer.c (working copy)
@@ -23,6 +23,7 @@
#include "module.h"
#include "misc.h"
#include "formats.h"
+#include "utf8.h"
#include "textbuffer.h"
@@ -154,6 +155,17 @@
chunk = buffer->cur_text;
while (chunk->pos + len >= TEXT_CHUNK_USABLE_SIZE) {
left = TEXT_CHUNK_USABLE_SIZE - chunk->pos;
+
+ /* don't split utf-8 character. (assume we can split non-utf8 anywhere. */
+ if (left < len && !is_utf8_leading(data[left])) {
+ int i;
+ for (i = 1; i < 4 && left >= i; i++)
+ if (is_utf8_leading(data[left - i])) {
+ left -= i;
+ break;
+ }
+ }
+
if (left > 0 && data[left-1] == 0)
left--; /* don't split the commands */
Irssi: Client: irssi 0.8.15 (20100403 1617)
Tokavikan kirjan perusteella oletan että tyyppi saattaa tiet��ä mistä puhuu tossa vikassa.
skandit_sarki.png
This issue should be closed. Handled in https://github.com/irssi/irssi/pull/12
This task has been relocated to Github @ https://github.com/irssi/irssi/pull/12
irssi use TEXT_BUFFER to store all text. TEXT_BUFFER will internally maintain text with smaller "chunk".
However, if an UTF-8 character was split -- the head of the character is in one chunk and remain parts are in another chunk, the character will be treated as corrupted and display incorrectly.
How to reproduce: