Closed xiaoyifang closed 1 day ago
Not reproducable on my machine (Linux) :sweat_smile:
Not reproducable on my machine (Linux) 😅
restrict it to Windows
dsl use QTextStream read the description from .ann file. QTextStream will use system's default codec. need to set the encoding .
example(generate with AI)
QTextCodec* detectEncoding(const QByteArray& data) {
// 尝试检测编码
QTextCodec::ConverterState state;
QTextCodec* codec = QTextCodec::codecForName("UTF-8");
codec->toUnicode(data.constData(), data.size(), &state);
if (state.invalidChars > 0) {
// 如果有无效字符,尝试其他编码
codec = QTextCodec::codecForName("ISO 8859-1");
}
return codec;
}
int main() {
QFile annFile("path/to/your/file.txt");
if (!annFile.open(QIODevice::ReadOnly | QIODevice::Text)) {
qDebug() << "Failed to open file.";
return -1;
}
QByteArray data = annFile.readAll();
annFile.close();
QTextCodec* detectedCodec = detectEncoding(data);
QTextStream annStream(&annFile);
annStream.setCodec(detectedCodec);
annFile.open(QIODevice::ReadOnly | QIODevice::Text);
QString content = annStream.readAll();
qDebug() << "File content:" << content;
annFile.close();
return 0;
}
readAll can be replaced with readline
The default behavior of QTextStream
is trying to use one of the Unicode encodings.
By default, UTF-8 is used for reading and writing, but you can also set the encoding by calling setEncoding(). Automatic Unicode detection is also supported. When this feature is enabled (the default behavior), QTextStream will detect the UTF-8, UTF-16 or the UTF-32 BOM (Byte Order Mark) and switch to the appropriate UTF encoding when reading.
The file is UTF16 without BOM, we cannot reliably detect the byteorder.
On my Linux system, the encoding of the annStream
detected is Utf8, but somewhat displayed correctly just by accident.
The file is wrong. This is not fixable.
I sent a short message to the dict author.
I don't think we can do something here. The original code works accidentally in the original GD because QTextStream in Qt4/5 don't try to detect Utf8.
The file is UTF16 without BOM, we cannot reliably detect the byteorder.
I think maybe we can .
JMDict Furigana, JMDict+: https://jd4gd.com/jmdictplus.html
Originally posted by @darlopvil in https://github.com/xiaoyifang/goldendict-ng/issues/1875#issuecomment-2490265261
The description seems have some encoding issue.