Closed correabuscar closed 5 months ago
I can reproduce it without foo_qterminal
and simply by entering less -R --shift 5 -- ./less.txt
. It doesn't happen in Konsole.
this didn't happen in qterminal 1.4.0
If so, since apparently nothing was changed in this regard when QTerminal was ported to Qt6 (but that should be checked again), it may be related to Qt6. Although Konsole is also based on Qt6, it uses qt6-5compat
, while QTerminal doesn't.
I don't know yet, didn't look in the code.
If you want to check the code (which would be very appreciated), this is what was done for porting to Qt6: https://github.com/lxqt/qtermwidget/pull/532/files (unfortunately, its commits weren't squashed before merging; hence mentioning the PR and not a commit). Small changes were also made after it, but I don't think they're relevant. It consists of using Qt6 methods and dropping of legacy encodings (QTextCodec
).
I superficially checked out the PR and whatever was after it as well, but, to by novice eye, nothing popped out as the culprit.
Further, I switched qtermwidget
and qterminal
repos to git tag 1.4.0
and downgraded dev-util/lxqt-build-tools-2.0.0-r1::gentoo
to dev-util/lxqt-build-tools-0.13.0::gentoo
to be able to build, and can confirm the issue doesn't exist in 1.4.0
. EDIT: actually I'm not sure I'm testing it properly, because it doesn't have a colorscheme(this is due to me specifying ok it works with LD_LIBRARY_PATH=
to point to qtermwidget's build dir, ok i'll use make install DESTDIR=
somewhere) so it has white background, this might affect the issue. And left/right arrows don't work at all!cmake -DCMAKE_INSTALL_PREFIX=/tmp/qtw1 ..
for both repos and make install
then run them from there with LD_LIBRARY_PATH="/tmp/qtw1/lib64"
So for sure it works on 1.4.0 now that I properly checked!
I'm unclear whether this means that 1.4.0 is using some kind of qt5 even though I've still only dev-qt/qtbase 6.7.2 (maybe it does, it it's backward compatible?), but if not, then maybe something between that tag and the current git master on one or both of the repos could be the culprit?
Looking a bit harder, I do seem to still have something about qt5, so I guess maybe it's using that?
dev-qt/qtcore Installed versions: 5.15.14
dev-qt/qtconcurrent Installed versions: 5.15.14
dev-qt/qtbase Installed versions: 6.7.2
I'll check the diff since 1.4.0 tag
What I know for sure is that there were characters which Qt5 showed correctly (in any Qt5 app) but Qt6 didn't. That regression is fixed in qt6-base 6.7.1-4 from Arch. Qt 6.7.2 will come soon to my distro (Manjaro), but it's very unlikely that the regression has come back with it (I'll see soon).
... but if not, then maybe something between that tag and the current git master on one or both of the repos could be the culprit?
Your tests suggest it. I have 1.4.0 on my old laptop and will check it later. So, if it works fine there, we only need to find the needle in the haystack ;)
I think you were right though, 1.4.0 working and 2.0.0 not, given the difference in changes, it doesn't seem to be either qterminal or qtermwidget.
I had a superficial look and found, possibly not relevant: https://bugreports.qt.io/browse/QTBUG-117051
but there's something that ah it's removed 2 �
which strengthens my suspicion that it might be in qt itself. but still, something in there can do this.
That being said, i still haven't checked qterminal's 1.4.0 diff, only qtermwidget's (and superficially) EDIT: done, nothing jumps out to me, but I should note I've never programmed anything with qt before, so I've no idea what's what :D
I'm switching my focus on qt itself
That being said, i still haven't checked qterminal's 1.4.0 diff, only qtermwidget's (and superficially)
I'll join you soon. If the problem is in our commits, two pairs of eyes couldn't miss it.
Progress: It does look related to the issue I mentioned above.
diff --git a/lib/Emulation.cpp b/lib/Emulation.cpp
index c88ac12..a8ed6ea 100644
--- a/lib/Emulation.cpp
+++ b/lib/Emulation.cpp
@@ -224,9 +224,13 @@ void Emulation::receiveData(const char* text, int length)
* U+10FFFF
* https://unicodebook.readthedocs.io/unicode_encodings.html#surrogates
*/
+ qDebug() << "!!!!!!!!!! text:" << text;
QString str = QString::fromUtf8(text, length);
+ qDebug() << "!!!!!!!!!! str:" << str;
auto encoded = _fromUtf8(str);
- std::wstring unicodeText = encoded.data.toStdWString();
+ qDebug() << "!!!!!!!!!! encoded:" << encoded;
+ std::wstring unicodeText = encoded.data.toStdWString();
+ qDebug() << "!!!!!!!!!! unicodeText:" << QString::fromStdWString(unicodeText.c_str());
//send characters to terminal emulator
for (size_t i=0;i<unicodeText.length();i++)
with this patch i can redirect output to file and have export QT_LOGGING_RULES="*=true"
(unsure if needed atm), but in that file I see that the very braille char itself which is made out of 3 bytes, ends before the 3rd byte is processed, in that chunk or what. Therefore qt6 thinks it's invalid utf8 and replaces those 2 chars that made it in with replacement char. Then the next chunk, has that 3rd char which is also invalid utf8 (i presume, didn't double check), and it's replaced with replacement char as well.
Well, maybe I jumped the gun a little, as the above shows that 'text' was already having those replacement chars, so the transform didn't happen in this very patch place, but somewhere else, but the idea remains that something split it into chunks that didn't end on utf8 boundary or something, and then something else tried to make the chunks utf8 and thought the last char was invalid utf8 because it was split due to chunking of sorts. You'd know exactly what/why, I'm guessing. Remember I've 0 knowledge of Qt stuff.
I think that it may be a Qt6 problem which Konsole avoids automatically by using the Qt5 methods provided by qt6-5compat
.
This is what I've done so far.
First I checked QTerminal 1.4.0 and found no issue in this regard. A screenshot:
Then I started to compare 1.4.0 and 2.0.0 with each other and focused on Emulation.cpp
. The only relevant change I found was in Emulation::receiveData
(as you did), and its Qt6 version was OK.
Later I'll compare other files that may be related to this, but I think the problem happens in Emulation::receiveData
. If that's the case, the cause should be in Qt6.
The only relevant change I found was in Emulation::receiveData (as you did), and its Qt6 version was OK.
the thing is that for some reason the text
parameter to receiveData
already seems to have the replacement chars set, so that means whatever calls receiveData, passes us the already-broken chunk...
the thing is that for some reason the
text
parameter toreceiveData
already seems to have the replacement chars set, so that means whatever calls receiveData, passes us the already-broken chunk...
ok I see the wrong assumption here on my part:
qDebug() << "!!!!!!!!!! text:" << str;
in fact converts the const char* text
arg of receiveData
into UTF8 before printing it, that's why I see it already as �
, so it's not that receiveData already receives it like that, but the subsequent UTF8 conversions inside receiveData transform it into �
(including my qDebug() attempt)
I realized that because I just couldn't get this patch to work(it just wouldn't trigger), heh:
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 9cf3516..3533c86 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -153,6 +153,7 @@ set_target_properties( ${QTERMWIDGET_LIBRARY_NAME} PROPERTIES
VERSION ${QTERMWIDGET_VERSION}
)
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fexceptions")
if(APPLE)
target_compile_definitions(${QTERMWIDGET_LIBRARY_NAME}
diff --git a/lib/Emulation.cpp b/lib/Emulation.cpp
index c88ac12..afc2c1e 100644
--- a/lib/Emulation.cpp
+++ b/lib/Emulation.cpp
@@ -224,9 +224,31 @@ void Emulation::receiveData(const char* text, int length)
* U+10FFFF
* https://unicodebook.readthedocs.io/unicode_encodings.html#surrogates
*/
+ qDebug() << "!!!!!!!!!! length:" << length;
+ qDebug() << "!!!!!!!!!! text:" << text;
QString str = QString::fromUtf8(text, length);
+ qDebug() << "!!!!!!!!!! str:" << str;
auto encoded = _fromUtf8(str);
- std::wstring unicodeText = encoded.data.toStdWString();
+ qDebug() << "!!!!!!!!!! encoded:" << encoded;
+ std::wstring unicodeText = encoded.data.toStdWString();
+ qDebug() << "!!!!!!!!!! unicodeText:" << QString::fromStdWString(unicodeText.c_str());
+ if (length >= 3 &&
+ ((text[length - 3] == '\xEF' &&
+ text[length - 2] == '\xBF' &&
+ text[length - 1] == '\xBD')||(text[0]=='\xEF' && text[1]=='\xBF' && text[2]=='\xBD'))) {
+
+ qDebug() << "Replacement character detected at the end of the input.";
+
+ // Option 1: Print stack trace
+ //printStackTrace();
+
+ // Option 2: Crash the program
+ // This can be done using assertions or by throwing an exception
+ // Q_ASSERT(false); // This will crash the program with a debug assertion
+ throw std::runtime_error("Replacement character detected at the end of the input. Crashing as requested.");
+ // Option 2: Crash the program using assertion
+ Q_ASSERT(false); // This will crash the program with a debug assertion
+ }
//send characters to terminal emulator
for (size_t i=0;i<unicodeText.length();i++)
therefore you may be right indeed the cause should be in Qt6.
However, I'm unclear what calls receiveData and why would it have worked before in 1.4.0 if the 'text' arg doesn't end in utf8 boundary. Will explore more. EDIT: Hmm, I'm thinking in qt5 there was no replacement? and that's why it worked with splitting on non-utf8 char boundary ? because the original bytes were left intact? unclear
I've a stacktrace with who's calling receiveData
:
(gdb) bt2
executing: 'frame apply all -q frame'
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44 return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
#1 0x00007ffff5ea9513 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
78 return __pthread_kill_implementation (threadid, signo, 0);
#2 0x00007ffff5e5164e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
26 int ret = __pthread_kill (__pthread_self (), sig);
#3 0x00007ffff5e399c5 in __GI_abort () at abort.c:79
79 raise (SIGABRT);
#4 0x00007ffff60a6198 in __gnu_cxx::__verbose_terminate_handler () at /usr/src/debug/sys-devel/gcc-14.1.1_p20240622/gcc-14-20240622/libstdc++-v3/libsupc++/vterminate.cc:95
95 abort();
#5 0x00007ffff60bfa3d in __cxxabiv1::__terminate (handler=<optimized out>) at /usr/src/debug/sys-devel/gcc-14.1.1_p20240622/gcc-14-20240622/libstdc++-v3/libsupc++/eh_terminate.cc:48
48 handler ();
#6 0x00007ffff60a5b0f in std::terminate () at /usr/src/debug/sys-devel/gcc-14.1.1_p20240622/gcc-14-20240622/libstdc++-v3/libsupc++/eh_terminate.cc:58
58 __cxxabiv1::__terminate (get_terminate ());
#7 0x00007ffff60bfe5c in __cxxabiv1::__cxa_throw (obj=<optimized out>, tinfo=0x7ffff629ddd8 <typeinfo for std::runtime_error>, dest=0x7ffff60d8b50 <std::runtime_error::~runtime_error()>) at /usr/src/debug/sys-devel/gcc-14.1.1_p20240622/gcc-14-20240622/libstdc++-v3/libsupc++/eh_throw.cc:98
98 std::terminate ();
#8 0x00007ffff7f34220 in Konsole::Emulation::receiveData (this=<optimized out>, text=text@entry=0x555555dba100 "\240\277\"\033[m\033[38;2;204;204;204m\033[48;2;0;0;0m, \033[m\033[38;2;179;246;192m\033[48;2;0;0;0m\"⢿\"\033[m\033[38;2;204;204;204m\033[48;2;0;0;0m,\033[m\r\n", ' ' <repeats 12 times>, "\033[m\033[38;2;179;246;192m\033[48;2;0;0;0m\"⡇\"\033[m\033[38;2;204;204;204m\033[48;"..., length=length@entry=1027) at /tmp/qtermwidget/lib/Emulation.cpp:247
247 throw std::runtime_error("Replacement character detected at the front of the input. Crashing as requested.");
#9 0x00007ffff7f7369b in Konsole::Session::onReceiveBlock (this=0x5555558d96f0, buf=0x555555dba100 "\240\277\"\033[m\033[38;2;204;204;204m\033[48;2;0;0;0m, \033[m\033[38;2;179;246;192m\033[48;2;0;0;0m\"⢿\"\033[m\033[38;2;204;204;204m\033[48;2;0;0;0m,\033[m\r\n", ' ' <repeats 12 times>, "\033[m\033[38;2;179;246;192m\033[48;2;0;0;0m\"⡇\"\033[m\033[38;2;204;204;204m\033[48;"..., len=1027) at /tmp/qtermwidget/lib/Session.cpp:902
902 _emulation->receiveData( buf, len );
#10 0x00007ffff67485b8 in doActivate<false> (sender=0x5555556fe530, signal_index=16, argv=0x7fffffffe170) at /usr/src/debug/dev-qt/qtbase-6.7.2/qtbase-everywhere-src-6.7.2/src/corelib/kernel/qobject.cpp:4098
4098 callFunction(receiver, QMetaObject::InvokeMetaMethod, method_relative, argv);
#11 0x00007ffff66f8ce9 in QMetaObject::activate (sender=sender@entry=0x5555556fe530, m=m@entry=0x7ffff7fb8100 <Konsole::Pty::staticMetaObject>, local_signal_index=local_signal_index@entry=0, argv=argv@entry=0x7fffffffe170) at /usr/src/debug/dev-qt/qtbase-6.7.2/qtbase-everywhere-src-6.7.2/src/corelib/kernel/qobject.cpp:4146
4146 doActivate<false>(sender, signal_index, argv);
#12 0x00007ffff7f952aa in Konsole::Pty::receivedData (this=this@entry=0x5555556fe530, _t1=<optimized out>, _t2=<optimized out>) at /tmp/qtermwidget/build/lib/moc_Pty.cpp:178
178 QMetaObject::activate(this, &staticMetaObject, 0, _a);
#13 0x00007ffff7f64ef0 in Konsole::Pty::dataReceived (this=0x5555556fe530) at /tmp/qtermwidget/lib/Pty.cpp:323
323 emit receivedData(data.constData(),data.size());
#14 0x00007ffff67485b8 in doActivate<false> (sender=0x5555556fe220, signal_index=3, argv=0x7fffffffe248) at /usr/src/debug/dev-qt/qtbase-6.7.2/qtbase-everywhere-src-6.7.2/src/corelib/kernel/qobject.cpp:4098
4098 callFunction(receiver, QMetaObject::InvokeMetaMethod, method_relative, argv);
#15 0x00007ffff66f8ce9 in QMetaObject::activate (sender=sender@entry=0x5555556fe220, m=m@entry=0x7ffff6a31de0 <QIODevice::staticMetaObject>, local_signal_index=local_signal_index@entry=0, argv=argv@entry=0x0) at /usr/src/debug/dev-qt/qtbase-6.7.2/qtbase-everywhere-src-6.7.2/src/corelib/kernel/qobject.cpp:4146
4146 doActivate<false>(sender, signal_index, argv);
#16 0x00007ffff675772b in QIODevice::readyRead (this=this@entry=0x5555556fe220) at /usr/src/debug/dev-qt/qtbase-6.7.2/qtbase-everywhere-src-6.7.2_build/src/corelib/Core_autogen/include/moc_qiodevice.cpp:211
211 QMetaObject::activate(this, &staticMetaObject, 0, nullptr);
#17 0x00007ffff7f62fb4 in KPtyDevicePrivate::_k_canRead (this=0x5555556f4c50) at /tmp/qtermwidget/lib/kptydevice.cpp:146
146 emit q->readyRead();
#18 0x00007ffff7f94fc9 in KPtyDevice::qt_static_metacall (_c=QMetaObject::InvokeMetaMethod, _id=1, _o=<optimized out>, _a=0x7fffffffe430) at /tmp/qtermwidget/build/lib/moc_kptydevice.cpp:105
105 case 1: { bool _r = _t->d_func()->_k_canRead();
#19 KPtyDevice::qt_static_metacall (_o=<optimized out>, _c=<optimized out>, _id=<optimized out>, _a=0x7fffffffe430) at /tmp/qtermwidget/build/lib/moc_kptydevice.cpp:98
98 void KPtyDevice::qt_static_metacall(QObject *_o, QMetaObject::Call _c, int _id, void **_a)
#20 0x00007ffff67485b8 in doActivate<false> (sender=0x5555556fe200, signal_index=5, argv=0x7fffffffe430) at /usr/src/debug/dev-qt/qtbase-6.7.2/qtbase-everywhere-src-6.7.2/src/corelib/kernel/qobject.cpp:4098
4098 callFunction(receiver, QMetaObject::InvokeMetaMethod, method_relative, argv);
#21 0x00007ffff66f8ce9 in QMetaObject::activate (sender=sender@entry=0x5555556fe200, m=m@entry=0x7ffff6a2f620 <QSocketNotifier::staticMetaObject>, local_signal_index=local_signal_index@entry=2, argv=argv@entry=0x7fffffffe430) at /usr/src/debug/dev-qt/qtbase-6.7.2/qtbase-everywhere-src-6.7.2/src/corelib/kernel/qobject.cpp:4146
4146 doActivate<false>(sender, signal_index, argv);
#22 0x00007ffff66f96a3 in QSocketNotifier::activated (this=this@entry=0x5555556fe200, _t1=<optimized out>, _t2=...) at /usr/src/debug/dev-qt/qtbase-6.7.2/qtbase-everywhere-src-6.7.2_build/src/corelib/Core_autogen/include/moc_qsocketnotifier.cpp:202
202 QMetaObject::activate(this, &staticMetaObject, 2, _a);
#23 0x00007ffff66f97ea in QSocketNotifier::event (this=0x5555556fe200, e=<optimized out>) at /usr/src/debug/dev-qt/qtbase-6.7.2/qtbase-everywhere-src-6.7.2/src/corelib/kernel/qsocketnotifier.h:82
82 Q_DECL_CONSTEXPR operator DescriptorType() const noexcept { return sockfd; }
#24 0x00007ffff7a1c7ed in QApplicationPrivate::notify_helper (this=<optimized out>, receiver=0x5555556fe200, e=0x7fffffffe5a0) at /usr/src/debug/dev-qt/qtbase-6.7.2/qtbase-everywhere-src-6.7.2/src/widgets/kernel/qapplication.cpp:3287
3287 consumed = receiver->event(e);
#25 0x00007ffff676578c in QCoreApplication::notifyInternal2 (receiver=0x5555556fe200, event=0x7fffffffe5a0) at /usr/src/debug/dev-qt/qtbase-6.7.2/qtbase-everywhere-src-6.7.2/src/corelib/kernel/qcoreapplication.cpp:1142
1142 return self->notify(receiver, event);
#26 0x00007ffff6765924 in QCoreApplication::sendEvent (receiver=<optimized out>, event=<optimized out>) at /usr/src/debug/dev-qt/qtbase-6.7.2/qtbase-everywhere-src-6.7.2/src/corelib/kernel/qcoreapplication.cpp:1583
1583 return notifyInternal2(receiver, event);
#27 0x00007ffff6503b2e in socketNotifierSourceDispatch (source=0x555555664770) at /usr/src/debug/dev-qt/qtbase-6.7.2/qtbase-everywhere-src-6.7.2/src/corelib/kernel/qeventdispatcher_glib.cpp:75
75 QCoreApplication::sendEvent(p->socketNotifier, &event);
#28 0x00007ffff5d18d36 in g_main_dispatch (context=context@entry=0x7ffff0000ef0) at ../glib-2.78.6/glib/gmain.c:3476
3476 need_destroy = !(* dispatch) (source, callback, user_data);
#29 0x00007ffff5d1caa8 in g_main_context_dispatch_unlocked (context=0x7ffff0000ef0) at ../glib-2.78.6/glib/gmain.c:4284
4284 g_main_dispatch (context);
#30 g_main_context_iterate_unlocked (context=context@entry=0x7ffff0000ef0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib-2.78.6/glib/gmain.c:4349
4349 g_main_context_dispatch_unlocked (context);
#31 0x00007ffff5d1d3b6 in g_main_context_iteration (context=0x7ffff0000ef0, may_block=1) at ../glib-2.78.6/glib/gmain.c:4414
4414 retval = g_main_context_iterate_unlocked (context, may_block, TRUE, G_THREAD_SELF);
#32 0x00007ffff64ed625 in QEventDispatcherGlib::processEvents (this=0x55555565e900, flags=...) at /usr/src/debug/dev-qt/qtbase-6.7.2/qtbase-everywhere-src-6.7.2/src/corelib/kernel/qeventdispatcher_glib.cpp:394
394 bool result = g_main_context_iteration(d->mainContext, canWait);
#33 0x00007ffff67927fa in QEventLoop::exec (this=this@entry=0x7fffffffe7f0, flags=..., flags@entry=...) at /usr/src/debug/dev-qt/qtbase-6.7.2/qtbase-everywhere-src-6.7.2/src/corelib/global/qflags.h:34
34 constexpr inline Q_IMPLICIT operator uint() const noexcept { return uint(i); }
#34 0x00007ffff67929ac in QCoreApplication::exec () at /usr/src/debug/dev-qt/qtbase-6.7.2/qtbase-everywhere-src-6.7.2/src/corelib/global/qflags.h:74
74 constexpr inline Q_IMPLICIT QFlags(Enum flags) noexcept : i(Int(flags)) {}
#35 0x00007ffff6d9460d in QGuiApplication::exec () at /usr/src/debug/dev-qt/qtbase-6.7.2/qtbase-everywhere-src-6.7.2/src/gui/kernel/qguiapplication.cpp:1926
1926 return QCoreApplication::exec();
#36 0x00007ffff7980930 in QApplication::exec () at /usr/src/debug/dev-qt/qtbase-6.7.2/qtbase-everywhere-src-6.7.2/src/widgets/kernel/qapplication.cpp:2555
2555 return QGuiApplication::exec();
#37 0x000055555557e615 in main (argc=<optimized out>, argv=<optimized out>) at /tmp/qterminal/src/main.cpp:219
219 int ret = app->exec();
(gdb)
I used this patch:
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 9cf3516..3533c86 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -153,6 +153,7 @@ set_target_properties( ${QTERMWIDGET_LIBRARY_NAME} PROPERTIES
VERSION ${QTERMWIDGET_VERSION}
)
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fexceptions")
if(APPLE)
target_compile_definitions(${QTERMWIDGET_LIBRARY_NAME}
diff --git a/lib/Emulation.cpp b/lib/Emulation.cpp
index c88ac12..5cfc653 100644
--- a/lib/Emulation.cpp
+++ b/lib/Emulation.cpp
@@ -224,9 +224,30 @@ void Emulation::receiveData(const char* text, int length)
* U+10FFFF
* https://unicodebook.readthedocs.io/unicode_encodings.html#surrogates
*/
+ qDebug() << "!!!!!!!!!! length:" << length;
+ qDebug() << "!!!!!!!!!! text:" << text;
QString str = QString::fromUtf8(text, length);
+ qDebug() << "!!!!!!!!!! str:" << str;
auto encoded = _fromUtf8(str);
- std::wstring unicodeText = encoded.data.toStdWString();
+ qDebug() << "!!!!!!!!!! encoded:" << encoded;
+ std::wstring unicodeText = encoded.data.toStdWString();
+ qDebug() << "!!!!!!!!!! unicodeText:" << QString::fromStdWString(unicodeText.c_str());
+ if (length >= 3 &&
+ (
+ /*(str[length - 3] == '\xEF' &&
+ str[length - 2] == '\xBF' &&
+ str[length - 1] == '\xBD')
+ ||*/
+ //(str[0]=='\xEF' && str[1]=='\xBF' && str[2]=='\xBD'))
+ //str.contains(QChar::ReplacementCharacter))
+ str[0]==QChar::ReplacementCharacter)
+ ) {
+
+ qDebug() << "Replacement character detected at the front of the input.";
+ throw std::runtime_error("Replacement character detected at the front of the input. Crashing as requested.");
+ // Option 2: Crash the program using assertion
+ Q_ASSERT(false); // This will crash the program with a debug assertion
+ }
//send characters to terminal emulator
for (size_t i=0;i<unicodeText.length();i++)
then ran this:
#!/usr/bin/env -S -i PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin:${PATH}" LANG="${LANG}" TERM='xterm-256color' DISPLAY="${DISPLAY}" HOME="${HOME}" bash --noprofile --norc
cp ./qterminal.ini ~/.config/foo_qterminal.conf
declare -x XDG_RUNTIME_DIR="/run/user/1000"
export QT_LOGGING_RULES="*=true"
#export QT_LOGGING_RULES="qt.*=true"
#qterminal -p foo_qterminal -e 'less -R --shift 5 -- ./less.txt'
#LD_LIBRARY_PATH="/var/tmp/portage/x11-libs/qtermwidget-2.0.0/work/qtermwidget-2.0.0_build" qterminal -p foo_qterminal -e 'less -R --shift 5 -- ./less.txt'
#LD_LIBRARY_PATH="/tmp/qtw1/usr/local/lib64" /tmp/qterminal/build/qterminal -p foo_qterminal -e 'less -R --shift 5 -- ./less.txt'
#LD_LIBRARY_PATH="/tmp/qtw1" /tmp/qterminal/build/qterminal -p foo_qterminal -e 'less -R --shift 5 -- ./less.txt'
#export XDG_RUNTIME_DIR="/run/user/1000"
#/tmp/qterminal/build/qterminal -p foo_qterminal -e 'less -R --shift 5 -- ./less.txt'
#/tmp/qterminal/build/qterminal -p foo_qterminal -e 'less -R --shift 5 -- ./less.txt'
#LD_LIBRARY_PATH="/tmp/qtw1/lib64" /tmp/qtw1/bin/qterminal -p foo_qterminal -e 'less -R --shift 5 -- ./less.txt'
#LD_LIBRARY_PATH="/tmp/qtw2/lib64" /tmp/qtw2/bin/qterminal -p foo_qterminal -e 'less -R --shift 5 -- ./less.txt'
echo "set environment LD_LIBRARY_PATH /tmp/qtw2/lib64" > /tmp/gdb_cmds.txt
echo "run -p foo_qterminal -e 'less -R --shift 5 -- ./less.txt' > /tmp/foo7.log 2>&1 " >> /tmp/gdb_cmds.txt
gdb -x /tmp/gdb_cmds.txt /tmp/qtw2/bin/qterminal
then held down Ctrl+L
inside that less
that started in qterminal
Ah and qterminal and qtermwidged were both built like:
git cloned ... chdir into it
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=/tmp/qtw2 ..
make
make install
A more robust patch which wouldn't trigger if the input has replacement chars, might be:
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 9cf3516..3533c86 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -153,6 +153,7 @@ set_target_properties( ${QTERMWIDGET_LIBRARY_NAME} PROPERTIES
VERSION ${QTERMWIDGET_VERSION}
)
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fexceptions")
if(APPLE)
target_compile_definitions(${QTERMWIDGET_LIBRARY_NAME}
diff --git a/lib/Emulation.cpp b/lib/Emulation.cpp
index c88ac12..31cbdab 100644
--- a/lib/Emulation.cpp
+++ b/lib/Emulation.cpp
@@ -208,6 +208,23 @@ void Emulation::sendMouseEvent(int /*buttons*/, int /*column*/, int /*row*/, int
// default implementation does nothing
}
+int countReplacementCharactersInText(const char* text, int length) {
+ int count = 0;
+ for (int i = 0; i < length - 2; ++i) {
+ if (static_cast<unsigned char>(text[i]) == 0xEF &&
+ static_cast<unsigned char>(text[i + 1]) == 0xBF &&
+ static_cast<unsigned char>(text[i + 2]) == 0xBD) {
+ count++;
+ i += 2; // Skip the next two bytes
+ }
+ }
+ return count;
+}
+
+int countReplacementCharactersInString(const QString& str) {
+ return str.count(QChar::ReplacementCharacter);
+}
+
/*
We are doing code conversion from locale to unicode first.
TODO: Character composition from the old code. See #96536
@@ -224,9 +241,41 @@ void Emulation::receiveData(const char* text, int length)
* U+10FFFF
* https://unicodebook.readthedocs.io/unicode_encodings.html#surrogates
*/
+ qDebug() << "!!!!!!!!!! length:" << length;
+ qDebug() << "!!!!!!!!!! text:" << text;
QString str = QString::fromUtf8(text, length);
+ qDebug() << "!!!!!!!!!! str:" << str;
auto encoded = _fromUtf8(str);
- std::wstring unicodeText = encoded.data.toStdWString();
+ qDebug() << "!!!!!!!!!! encoded:" << encoded;
+ std::wstring unicodeText = encoded.data.toStdWString();
+ qDebug() << "!!!!!!!!!! unicodeText:" << QString::fromStdWString(unicodeText.c_str());
+// if (length >= 3 &&
+// (
+// /*(str[length - 3] == '\xEF' &&
+// str[length - 2] == '\xBF' &&
+// str[length - 1] == '\xBD')
+// ||*/
+// //(str[0]=='\xEF' && str[1]=='\xBF' && str[2]=='\xBD'))
+// //str.contains(QChar::ReplacementCharacter))
+// str[0]==QChar::ReplacementCharacter)
+// ) {
+//
+// qDebug() << "Replacement character detected at the front of the input.";
+// throw std::runtime_error("Replacement character detected at the front of the input. Crashing as requested.");
+// // Option 2: Crash the program using assertion
+// Q_ASSERT(false); // This will crash the program with a debug assertion
+// }
+ // Count replacement characters in 'text'
+ int textReplacementCount = countReplacementCharactersInText(text, length);
+ // Count replacement characters in 'str'
+ int strReplacementCount = countReplacementCharactersInString(str);
+
+ // Compare the counts
+ if (strReplacementCount > textReplacementCount) {
+ qDebug() << "More replacement characters found in 'str' than in 'text'.";
+ throw std::runtime_error("Something got replaced with replacement character. Crashing as requested.");
+ Q_ASSERT(false);//not reached
+ }
//send characters to terminal emulator
for (size_t i=0;i<unicodeText.length();i++)
So it would only trigger upon detecting replacements with replacement chars. Only tested on non-replacement chars input, though.
This is the qt6 code where the replacement is made: https://github.com/qt/qtbase/blob/c0afb1ae836b779fbf0b0b60b7c55517e2e2f00f/src/corelib/text/qstringconverter.cpp#L663
I've patched out that 'if' and can confirm it's replacing the braille char with nothing on screen. So that's the code.
But I think this might be solvable by qterminal/qtermwidget if someone figures out how that receiveData is called(I assume it's not called by qt6 but I may be wrong here, even though I did show the stacktrace above) and then maybe do a tentative QString str = QString::fromUtf8(text, length);
before actually sending the data, then check if the number of replacement chars in str
is higher than those in text
then know that replacements were made (just like in my above comment, at the end, that patch - can do the counting) - or something simpler and just check if the end 1 to 3 utf8 chars (since valid utf8 is max 4 bytes, thus 3 of 4 would be potentially invalid when split and each of those max. 3 bytes gets replaced by a full utf8 replacement char
aka �
) are replacement chars, and thus then maybe try to send the buffer minus at most those 3 bytes at the end, which would be presumably replaced when converted to utf8(with replacement chars), and remember to ensure that next buffer to send contains these unsent ones at the beginning. So that then, the received buffer is always on utf8 boundary and qt6 won't be replacing anything.
I've no idea how this actually works, but definitely ensuring(somehow) that the send buffer is split only on utf8 char boundaries would prevent qt6 from spewing replacement chars.
I'm guessing somehow 1.4.0 qt5 didn't replace those chars? else I can't explain how it worked before.
EDIT:
A first look (superficial) suggests that the reason it worked in 1.4.0 is because it was using a converter with internal state like QString QUtf8::convertToUnicode(const char *chars, int len, QTextCodec::ConverterState *state)
, but in qt6 it doesn't. I haven't really double checked this, so I might be wrong.
I followed it to that code (unless I made a mistake along the way), which is here: https://github.com/qt/qtbase/blob/86c62c8f6088ec148512457cb7e964661ba643b0/src/corelib/codecs/qutfcodec.cpp#L543
But if it did somehow keep internal state, as per the comment, it suggests it would recognize valid continuation(for example) if it was split in chunks like it is above in receiveData:
// See above for buffer requirements for stateless decoding. However, that
// fails if the state is not empty. The following situations can add to the
// requirements:
// state contains chars starts with requirement
// 1 of 2 bytes valid continuation 0
// 2 of 3 bytes same 0
// 3 bytes of 4 same +1 (need to insert surrogate pair)
// 1 of 2 bytes invalid continuation +1 (need to insert replacement and restart)
// 2 of 3 bytes same +1 (same)
// 3 of 4 bytes same +1 (same)
QString result(len + 1, Qt::Uninitialized);
Wow I made a discovery, the replacement chars exist in receiveData even in 1.4.0, but somehow they didn't make it on screen. Will update this with the patch to test this.
Ok I jumped the gun a little. What I meant to say is that using the same utf8 conversion like qtermwidget uses in version 2, even tho using qt5, the same replacement chars can be seen.
In other words, this change is really the culprit:
diff --git a/lib/Emulation.cpp b/lib/Emulation.cpp
index 723f5d2..c88ac12 100644
--- a/lib/Emulation.cpp
+++ b/lib/Emulation.cpp
@@ -250,8 +224,9 @@ void Emulation::receiveData(const char* text, int length)
* U+10FFFF
* https://unicodebook.readthedocs.io/unicode_encodings.html#surrogates
*/
- QString utf16Text = _decoder->toUnicode(text,length);
- std::wstring unicodeText = utf16Text.toStdWString();
+ QString str = QString::fromUtf8(text, length);
+ auto encoded = _fromUtf8(str);
+ std::wstring unicodeText = encoded.data.toStdWString();
//send characters to terminal emulator
for (size_t i=0;i<unicodeText.length();i++)
because that QString::fromUtf8(text, length);
is stateless and it thus has no choice but to transform any byte at the end of the chunk that isn't valid utf8 into the replacement char, while the previous way of doing it, was keeing a state (presumably) inside that _decoder
.
This patch to test on 1.4.0:
diff --git a/CMakeLists.txt b/CMakeLists.txt
index ce50ed0..1747931 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -153,6 +153,7 @@ set_target_properties( ${QTERMWIDGET_LIBRARY_NAME} PROPERTIES
VERSION ${QTERMWIDGET_VERSION}
)
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fexceptions")
if(APPLE)
target_compile_definitions(${QTERMWIDGET_LIBRARY_NAME}
diff --git a/lib/Emulation.cpp b/lib/Emulation.cpp
index 723f5d2..19afe3a 100644
--- a/lib/Emulation.cpp
+++ b/lib/Emulation.cpp
@@ -38,6 +38,7 @@
#include <QThread>
#include <QTime>
+#include <QDebug>
// KDE
//#include <kdebug.h>
@@ -239,6 +240,23 @@ void Emulation::sendMouseEvent(int /*buttons*/, int /*column*/, int /*row*/, int
TODO: Character composition from the old code. See #96536
*/
+int countReplacementCharactersInText(const char* text, int length) {
+ int count = 0;
+ for (int i = 0; i < length - 2; ++i) {
+ if (static_cast<unsigned char>(text[i]) == 0xEF &&
+ static_cast<unsigned char>(text[i + 1]) == 0xBF &&
+ static_cast<unsigned char>(text[i + 2]) == 0xBD) {
+ count++;
+ i += 2; // Skip the next two bytes
+ }
+ }
+ return count;
+}
+
+int countReplacementCharactersInString(const QString& str) {
+ return str.count(QChar::ReplacementCharacter);
+}
+
void Emulation::receiveData(const char* text, int length)
{
emit stateSet(NOTIFYACTIVITY);
@@ -250,8 +268,24 @@ void Emulation::receiveData(const char* text, int length)
* U+10FFFF
* https://unicodebook.readthedocs.io/unicode_encodings.html#surrogates
*/
+ qDebug() << "!!!!!!!!!! length:" << length;
+ qDebug() << "!!!!!!!!!! text:" << text; //yes replacement chars here because it get converted to utf8
+ QString str = QString::fromUtf8(text, length);
+ qDebug() << "!!!!!!!!!! str:" << str; //yes replacement chars here
QString utf16Text = _decoder->toUnicode(text,length);
+ qDebug() << "!!!!!!!!!! utf16Text:" << utf16Text;//no repl. chars here, gets truncated before it
std::wstring unicodeText = utf16Text.toStdWString();
+ qDebug() << "!!!!!!!!!! unicodeText:" << QString::fromStdWString(unicodeText.c_str());//none here too
+ // Count replacement characters in 'text'
+ int textReplacementCount = countReplacementCharactersInText(text, length);
+ // Count replacement characters in 'str'
+ int strReplacementCount = countReplacementCharactersInString(str);
+
+ // Compare the counts
+ if (strReplacementCount > textReplacementCount) {
+ qDebug() << "More replacement characters found in 'str' than in 'text'.";
+ throw std::runtime_error("Something got replaced with replacement character. Crashing as requested.");
+ }
//send characters to terminal emulator
for (size_t i=0;i<unicodeText.length();i++)
in 1.4.0 it is ignoring the incomplete utf8 char at the end, and on next chunk it's somehow[^1] able to see the correct whole braille char, as if it remembered that it ignored the first byte of it and considered it when processing the first 2 bytes of the next chunk which are the last 2 bytes of the braille char.
I think maybe that _decoder
field is keeping track of what's seen before, which means, it sees the first byte (of the 3) of the braille char at the end of a chunk and it ignores it but keeps track of its state, internally, then when second chunk is given to it, then it uses what's seen(that 1 byte) and thus reconstructs the braille char at the beginning of this second chunk. And that's why it works in 1.4.0 . But in 2.0.0 since it's stateless, well it can't remember that it's seen a first byte in a prev. chunk, so it has to replacement char all of the 3 because they're split into two different calls.
So I guess this means that qtermwidget would have to keep track of at most 3 bytes at the end of the chunk of text(+length) that it received in receiveData
, so that on the next chunk can use those when processing the start of that chunk, and thus reconstruct whatever utf8 char was split into two chunks/calls. Or, get back to using _decoder
as field and it presumably keeps track of this. Therefore the problem isn't really in qt6.
[^1]: I think this is because of that internal state maybe?! which I mentioned in the edit at the bottom of prev comment.
Here I have a proof-of-concept patch that does what I suspect _decoder
did in 1.4.0, but this is for 2.0.0, that is: it keeps track of the last 1 to 3 bytes of the chunk(of text
argument) if they're not valid utf8 and reuses those in the next chunk, by prepending them at the beginning.
It's very crappy patch, but as a p.o.c. it should do, to show how 2.0.0 can be made to work, if we keep track of the state just like I presume _decoder
did in 1.4.0.
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 9cf3516..3533c86 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -153,6 +153,7 @@ set_target_properties( ${QTERMWIDGET_LIBRARY_NAME} PROPERTIES
VERSION ${QTERMWIDGET_VERSION}
)
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fexceptions")
if(APPLE)
target_compile_definitions(${QTERMWIDGET_LIBRARY_NAME}
diff --git a/lib/Emulation.cpp b/lib/Emulation.cpp
index c88ac12..ed9c7f1 100644
--- a/lib/Emulation.cpp
+++ b/lib/Emulation.cpp
@@ -25,6 +25,7 @@
// System
#include <cstdio>
#include <cstdlib>
+#include <cstring> // for memcpy
#include <unistd.h>
#include <string>
@@ -208,6 +209,23 @@ void Emulation::sendMouseEvent(int /*buttons*/, int /*column*/, int /*row*/, int
// default implementation does nothing
}
+int countReplacementCharactersInText(const char* text, int length) {
+ int count = 0;
+ for (int i = 0; i < length - 2; ++i) {
+ if (static_cast<unsigned char>(text[i]) == 0xEF &&
+ static_cast<unsigned char>(text[i + 1]) == 0xBF &&
+ static_cast<unsigned char>(text[i + 2]) == 0xBD) {
+ count++;
+ i += 2; // Skip the next two bytes
+ }
+ }
+ return count;
+}
+
+int countReplacementCharactersInString(const QString& str) {
+ return str.count(QChar::ReplacementCharacter);
+}
+
/*
We are doing code conversion from locale to unicode first.
TODO: Character composition from the old code. See #96536
@@ -219,14 +237,105 @@ void Emulation::receiveData(const char* text, int length)
bufferedUpdate();
- /* XXX: the following code involves encoding & decoding of "UTF-16
- * surrogate pairs", which does not work with characters higher than
- * U+10FFFF
- * https://unicodebook.readthedocs.io/unicode_encodings.html#surrogates
- */
+// /* XXX: the following code involves encoding & decoding of "UTF-16
+// * surrogate pairs", which does not work with characters higher than
+// * U+10FFFF
+// * https://unicodebook.readthedocs.io/unicode_encodings.html#surrogates
+// */
+// qDebug() << "!!!!!!!!!! length:" << length;
+// qDebug() << "!!!!!!!!!! text:" << text;
+// QString str = QString::fromUtf8(text, length);
+// //QString str = fromUtf8(text, length);
+// qDebug() << "!!!!!!!!!! str:" << str;
+// auto encoded = _fromUtf8(str);
+// qDebug() << "!!!!!!!!!! encoded:" << encoded;
+// std::wstring unicodeText = encoded.data.toStdWString();
+// qDebug() << "!!!!!!!!!! unicodeText:" << QString::fromStdWString(unicodeText.c_str());
+//// if (length >= 3 &&
+//// (
+//// /*(str[length - 3] == '\xEF' &&
+//// str[length - 2] == '\xBF' &&
+//// str[length - 1] == '\xBD')
+//// ||*/
+//// //(str[0]=='\xEF' && str[1]=='\xBF' && str[2]=='\xBD'))
+//// //str.contains(QChar::ReplacementCharacter))
+//// str[0]==QChar::ReplacementCharacter)
+//// ) {
+////
+//// qDebug() << "Replacement character detected at the front of the input.";
+//// throw std::runtime_error("Replacement character detected at the front of the input. Crashing as requested.");
+//// // Option 2: Crash the program using assertion
+//// Q_ASSERT(false); // This will crash the program with a debug assertion
+//// }
+// // Count replacement characters in 'text'
+// int textReplacementCount = countReplacementCharactersInText(text, length);
+// // Count replacement characters in 'str'
+// int strReplacementCount = countReplacementCharactersInString(str);
+//
+// // Compare the counts
+// if (strReplacementCount > textReplacementCount) {
+// qDebug() << "More replacement characters found in 'str' than in 'text'.";
+// throw std::runtime_error("Something got replaced with replacement character. Crashing as requested.");
+// Q_ASSERT(false);//not reached
+// }
+ qDebug() << "!!!!!!!!!! length:" << length;
+ qDebug() << "!!!!!!!!!! text:" << text; //yes replacement chars here because it get converted to utf8
QString str = QString::fromUtf8(text, length);
- auto encoded = _fromUtf8(str);
- std::wstring unicodeText = encoded.data.toStdWString();
+ qDebug() << "!!!!!!!!!! str:" << str; //yes replacement chars here
+ // Count replacement characters in 'text'
+ int textReplacementCount = countReplacementCharactersInText(text, length);
+ // Count replacement characters in 'str'
+ int strReplacementCount = countReplacementCharactersInString(str);
+
+ //char combinedBuffer[3 + length];
+ //std::vector<char> combinedBuffer(_last_max_three_bytes_of_text.size() + length);
+
+ // do we have any new invalid utf8 bytes at the end of this new chunk?
+ int the_diff=strReplacementCount-textReplacementCount;
+ qDebug() << "!!!!!!!!!! the_diff:" << the_diff;
+ qDebug() << "!!!!!!!!!! _how_many_of_last_max_three_are_in:" << _how_many_of_last_max_three_are_in;
+
+
+
+ std::vector<uint8_t> combinedBuffer(_how_many_of_last_max_three_are_in + length);
+ //"allocates memory on the heap, but it is automatically deallocated when the vector goes out of scope, such as when the function exits"
+ int combinedLength = 0;
+ if (_how_many_of_last_max_three_are_in > 0) {
+ //copy last state bytes from the end of a prev. chunk which woulda been invalid utf8
+ //copy to the beginning of this new chunk
+ std::memcpy(combinedBuffer.data(), _last_max_three_bytes_of_text.data(), _how_many_of_last_max_three_are_in);
+ combinedLength+=_how_many_of_last_max_three_are_in;
+ }
+
+ // Compare the counts
+ if (the_diff>0) {
+ qDebug() << "!!!!!!!!!! More replacement characters("<<the_diff<<") found in 'str' than in 'text'.";
+ //throw std::runtime_error("Something got replaced with replacement character. Crashing as requested.");
+ }
+
+ // Copy new text to the combined buffer
+ // but not any invalid utf8 bytes from the end, if any.
+ std::memcpy(combinedBuffer.data() + combinedLength, text, length-the_diff);
+ combinedLength += length-the_diff;
+
+ // Try to convert the combined buffer to UTF-8
+ QString str2 = QString::fromUtf8(reinterpret_cast<const char*>(combinedBuffer.data()), combinedLength);
+ qDebug() << "!!!!!!!!!! resultAftermods:" << str2;
+ auto encoded = _fromUtf8(str2);
+ qDebug() << "!!!!!!!!!! encoded:" << encoded;
+ std::wstring unicodeText = encoded.data.toStdWString();
+ qDebug() << "!!!!!!!!!! unicodeText:" << QString::fromStdWString(unicodeText.c_str());
+// //QString utf16Text = _decoder->toUnicode(text,length-the_diff);
+// QString utf16Text = _decoder->toUnicode(reinterpret_cast<const char*>(combinedBuffer.data()),combinedLength);
+// qDebug() << "!!!!!!!!!! utf16Text:" << utf16Text;
+
+ // Store the last up to 3 bytes of invalid UTF-8 from the end of the text
+ if (the_diff > 0) {
+ _how_many_of_last_max_three_are_in = the_diff;
+ std::memcpy(_last_max_three_bytes_of_text.data(), text + length - the_diff, the_diff);
+ } else {
+ _how_many_of_last_max_three_are_in = 0;
+ }
//send characters to terminal emulator
for (size_t i=0;i<unicodeText.length();i++)
diff --git a/lib/Emulation.h b/lib/Emulation.h
index 98ba2fe..27edaaf 100644
--- a/lib/Emulation.h
+++ b/lib/Emulation.h
@@ -470,6 +470,8 @@ protected:
const KeyboardTranslator* _keyTranslator; // the keyboard layout
+ std::array<uint8_t, 3> _last_max_three_bytes_of_text = {0};
+ int _how_many_of_last_max_three_are_in = 0;
protected slots:
/**
So to apply that, on qtermwidget:
cd /tmp
git clone https://github.com/lxqt/qtermwidget.git
cd qtermwidget
patch -p1 -i /tmp/theabove.patch
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=/tmp/qtw2 ..
make && make install
Then same build steps(minus the patch) for qterminal itself, (or use system one? then mod the path below), and then:
I run it like:
#!/usr/bin/env -S -i PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin:${PATH}" LANG="${LANG}" TERM='xterm-256color' DISPLAY="${DISPLAY}" HOME="${HOME}" bash --noprofile --norc
cp ./qterminal.ini ~/.config/foo_qterminal.conf
declare -x XDG_RUNTIME_DIR="/run/user/1000"
export QT_LOGGING_RULES="*=true"
echo "set environment LD_LIBRARY_PATH /tmp/qtw2/lib64" > /tmp/gdb_cmds.txt
echo "run -p foo_qterminal -e 'less -R --shift 5 -- ./less.txt' > /tmp/foo7.log 2>&1 " >> /tmp/gdb_cmds.txt
gdb -x /tmp/gdb_cmds.txt /tmp/qtw2/bin/qterminal
Then press and hold Ctrl+L
while less
is showing inside the started qterminal
. I notice it doesn't break (well, to be fair I also removed the throwing of exception), and then i can quit and check the /tmp/foo7.log
file with nvim
, search for �
and while I see it shows after and before on the text
and str
lines (see screenshot below) it doesn't show on the resultAftermods
line at all, and futher, the next chunk has the braille char proper.
@correabuscar
This worked here (based on https://doc.qt.io/qt-6/qstringdecoder.html):
diff -ruNp qtermwidget-orig/lib/Emulation.cpp qtermwidget/lib/Emulation.cpp
--- qtermwidget-orig/lib/Emulation.cpp
+++ qtermwidget/lib/Emulation.cpp
@@ -52,7 +52,7 @@ Emulation::Emulation() :
_keyTranslator(nullptr),
_usesMouse(false),
_bracketedPasteMode(false),
- _fromUtf8(QStringEncoder::Utf16)
+ _toUtf16(QStringEncoder::Utf8)
{
// create screens with a default size
_screen[0] = new Screen(40,80);
@@ -224,9 +224,9 @@ void Emulation::receiveData(const char*
* U+10FFFF
* https://unicodebook.readthedocs.io/unicode_encodings.html#surrogates
*/
- QString str = QString::fromUtf8(text, length);
- auto encoded = _fromUtf8(str);
- std::wstring unicodeText = encoded.data.toStdWString();
+ QByteArray ba(text, length);
+ QString str = _toUtf16(ba);
+ std::wstring unicodeText = str.toStdWString();
//send characters to terminal emulator
for (size_t i=0;i<unicodeText.length();i++)
diff -ruNp qtermwidget-orig/lib/Emulation.h qtermwidget/lib/Emulation.h
--- qtermwidget-orig/lib/Emulation.h
+++ qtermwidget/lib/Emulation.h
@@ -494,7 +494,7 @@ private:
bool _bracketedPasteMode;
QTimer _bulkTimer1{this};
QTimer _bulkTimer2{this};
- QStringEncoder _fromUtf8;
+ QStringDecoder _toUtf16;
};
}
Please test it.
If I'm not mistaken, the logic used in the Qt6 port was incorrect.
@correabuscar
This worked here (based on https://doc.qt.io/qt-6/qstringdecoder.html):
diff -ruNp qtermwidget-orig/lib/Emulation.cpp qtermwidget/lib/Emulation.cpp --- qtermwidget-orig/lib/Emulation.cpp +++ qtermwidget/lib/Emulation.cpp @@ -52,7 +52,7 @@ Emulation::Emulation() : _keyTranslator(nullptr), _usesMouse(false), _bracketedPasteMode(false), - _fromUtf8(QStringEncoder::Utf16) + _toUtf16(QStringEncoder::Utf8) { // create screens with a default size _screen[0] = new Screen(40,80); @@ -224,9 +224,9 @@ void Emulation::receiveData(const char* * U+10FFFF * https://unicodebook.readthedocs.io/unicode_encodings.html#surrogates */ - QString str = QString::fromUtf8(text, length); - auto encoded = _fromUtf8(str); - std::wstring unicodeText = encoded.data.toStdWString(); + QByteArray ba(text, length); + QString str = _toUtf16(ba); + std::wstring unicodeText = str.toStdWString(); //send characters to terminal emulator for (size_t i=0;i<unicodeText.length();i++) diff -ruNp qtermwidget-orig/lib/Emulation.h qtermwidget/lib/Emulation.h --- qtermwidget-orig/lib/Emulation.h +++ qtermwidget/lib/Emulation.h @@ -494,7 +494,7 @@ private: bool _bracketedPasteMode; QTimer _bulkTimer1{this}; QTimer _bulkTimer2{this}; - QStringEncoder _fromUtf8; + QStringDecoder _toUtf16; }; }
Please test it.
If I'm not mistaken, the logic used in the Qt6 port was incorrect.
The patch works indeed! (EDIT: real world btop
is fixed as well)
That is amazing what you can do if you know what's going on (and if you know QT/C++, unlike me haha) ! Much appreciated and very cool to see such a tiny change that does exactly the same thing that 1.4.0 did with _decoder
(i've tested it to be so)
This is really cool patch! Cheers!
I don't know why it didn't even cross my mind to check online for documentation, as you pointed out in another place "The encoder remembers any state that is required between calls,..."
I don't know why it didn't even cross my mind to check online for documentation
Actually, this part of Qt's documentation is exceptionally obscure and needs to be rewritten (I think I've seen a bug report about the doc), perhaps because it's new. If it wasn't for a few examples in it, I wouldn't understand anything.
Since your experience in dealing with characters is more than mine, please test the patch under various circumstances. I'll make a PR and release 2.0.1 after merging it. But first I should check another part of the code, where the new Qt6 encoding/decoding methods are also used.
Thanks a lot for the report and also for the discussion!
nit: I've corrected insecure shebang in the above comments, added PATH
prefix: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin:
full:
#!/usr/bin/env -vS -i PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin:${PATH}" LANG="${LANG}" TERM='xterm-256color'
diff:
-#!/usr/bin/env -vS -i PATH="${PATH}" LANG="${LANG}" TERM='xterm-256color'
+#!/usr/bin/env -vS -i PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin:${PATH}" LANG="${LANG}" TERM='xterm-256color'
because I don't want a user set PATH
(possibly unintentionally if not maliciously) to execute a non-system bash
executable.
And just in case anyone stumbled upon this thinking #!/usr/bin/env bash
is a secure portable shebang, it's not without that PATH
prefix esp. if the potential script ends up being installed system-wide.
(anyway, I'm hiding this comment as off-topic)
EDIT2: fixed in qterminal 2.0.1 EDIT: jump to working/tested patch by tsujan here below.
Expected Behavior
this didn't happen in qterminal 1.4.0 or so, iirc so, not randomly replacing chars with
�
sounds like the expected behavior.Current Behavior
a braille character like
⠿
:is being shown, sometimes, as
���
note how it's made out ofUTF-8: e2 a0 bf
, this must be why there's 3 replacement character(s)�
:Possible Solution
I don't know yet, didn't look in the code. Took a while to find reproduction steps.
Steps to Reproduce (for bugs)
try to run this bash script(after you get the two files from below first):
Note that, apparently the
cp
line isn't necessary[^1]. You'll needless.txt
file which I've generated fromnvim
viascript
command... well it doesn't matter I guess: less.txt qterminal.ini.txt (^ you'll have to rename this to remove the .txt extension, couldn't find the .ini to upload here without make it a .txt) put both files in current dir before running the above bash script from current dir.That should've made the
less
command open in a newqterminal
of 172x54 size(but size doesn't matter apparently, not withless
), you can now proceed as follows:either press Ctrl+L once and you shall see the aforementioned braille char transform into 3 replacement chars (futher Ctrl+L presses have no effect here)
or press right arrow key to move 5 chars to the right and see the same braille char transform into those 3 replacement chars (after this pressing Ctrl+L here would "fix" it)
Here's before step 3 and after step 3 is done:
Context
btop
gets messed up visually, but only inqterminal
, not inside sayalacritty
.System Information
LANG=en_US.utf8
$ less --version
less 643 (PCRE2 regular expressions)
chars
command/executable is https://github.com/antifuchs/chars[^1]: so it would work with empty factory-generated profile(on my Gentoo system, but who knows if I've some patches or what not), so I see it work even in a small-sized qterminal the same, but when I tried it with
nvim
this was definitely not at all reproducible with any other sizes - seems to work now withless
.