File parsing bug - Githubissues

freesoft0000 commented 6 years ago

Two test files in Test.zip test.zip

freesoft0000 commented 6 years ago

Whether to consider the language file external, easy to add multiple languages

RaiKoHoff commented 6 years ago

Please test one of the latest development beta versions (see #160): Using encoding detection settings:

Result:

Please tell in detail the problem with file goroutine_channel.go:

freesoft0000 commented 6 years ago

Downloaded the latest beta, version number is 4.18.819.1142. According to your coding settings, open two files are still garbled Test system for Win7 64-bit The beta version does not provide a simplified Chinese language file

Notepad2-mod also has this problem.

RaiKoHoff commented 6 years ago

Please remove those files from the "File History" and try loading them again. (The (wrong) encoding stored in "File History" forces Notepad3 to use this encoding, assuming that it is the correct encoding for this file.) If it is still garbled, please tell me, what is the resulting encoding, if you try to force the CED to re-ecode the file:

freesoft0000 commented 6 years ago

The most recent document record has been deleted. According to the settings you gave, modify the default encoding, and then open the file, is still garbled. Follow the purple part of the figure below to set up one by one without being displayed properly. The document is normal only if you choose to force code detection. Only this method works.

RaiKoHoff commented 6 years ago

I am not able to reproduce your issue with the files provided:

Even if I choose another default encoding:

the file is loaded with the correct encoding (UTF-8):

Be sure to discard the file from Notepad3's history:

Please provide your Notepad3.ini (cleared by sensitive information: file history, search strings, etc.).

freesoft0000 commented 6 years ago

History file has been deleted ini.zip

RaiKoHoff commented 6 years ago

Even with your .ini, the GO file loads correct as UTF-8 ? The beta version number you mentioned in your comment (4.18.819.1142) is strange, it has never been put to beta channel ? Please load development beta from here: https://drive.google.com/drive/folders/0B7X3F11Wq7qSZmNacmNJaGR0MFk?usp=sharing (for example Notepad3Portable_TinyExpr_4.18.820.1063.7z), it also includes a simplified Chinese language DLL (change [Settings2] PreferredLanguageLocaleName=zh-CN, if it does not detect your locale settings correctly)

freesoft0000 commented 6 years ago

Default settings. Open the file you want to test. There are three ways to display the document correctly.

Notepad3 Detect document coding problem, each time is Unicode encoding, double-click the taskbar marker to Utf-8 encoding is not displayed correctly.

I am downloading Notepad3 from the address below https://ci.appveyor.com/project/rizonesoft/notepad3/branch/master

https://drive.google.com It is difficult to visit this website from China.

freesoft0000 commented 6 years ago

The same test document Everedit can be displayed correctly. notepad++ can be displayed correctly. EmEditor can be displayed correctly.

RaiKoHoff commented 6 years ago

Downloading current AppVeyor Artifact (don't matter if 64-bit or 32-bit build) (v.4.18.820.1144) to empty directory, using no .ini file (which will fallback to internal defaults) loading the provided .go file results in correct UTF-8 encoding ??? Maybe related to regional Windows settings 🤔 - Very strange ?

freesoft0000 commented 6 years ago

V.4.18.820.1144 32 bit version The problem remains

RaiKoHoff commented 6 years ago

I am not able to reproduce this issue 😲, so I am not able to debug this 🤔. Anybody else there to give a hint or reproduce this issue: Attaching GitHub's AppVeyor Artifact (v.4.18.820.1144), no .ini-file, .go-src. Encoding_problem.zip

RaiKoHoff commented 6 years ago

Trying to force Notepad3 simulating the Chinese CodePage 936 (seen in your screen-shot) and skipping the ANSI CodePage detection, which leads the system to assume the file to be CP-936 and not UTF-8 does not show garbage! I can't reproduce the "Unicode (UTF-16) LE" (aka Unicode) detection shown by your system ?

freesoft0000 commented 6 years ago

My test environment is the simplified Chinese Windows operating system. The result of the test is the same as my feedback. The test does not have this problem in the English version of the Windows operating system and the document appears correctly.

RaiKoHoff commented 6 years ago

What happens, if you check ON the "Skip UNICODE detection" option of the Encoding Settings dialog?

freesoft0000 commented 6 years ago

It's OK to skip the test. But there's still a problem. ANSI code is detected, so the simplified Chinese characters are not displayed correctly.

RaiKoHoff commented 6 years ago

I am not able to reproduce the problem on my machine, so I have to guess... The problem is related to the Unicode detection, possible related also to a litte- vs. big- endianess mismatch? 🤔. I revisit the Unicode detection and tried to harden the detection, please try development version _TinyExpr_4.18.821.1065. Notepad3_exe.zip

freesoft0000 commented 6 years ago

4.18.821.1065 The problem remains. Default settings. ANSI encoding is detected. Chinese cannot be displayed correctly.

The correct display should be the following picture

freesoft0000 commented 6 years ago

The test document is Utf-8 without a BOM. Manually change the test document to Utf-8 with a BOM, and then use NOTEPAD3 again to open it correctly. Indicates that there is a problem with NOTEPAD3 document encoding detection. v.4.18.820.1144 and 4.18.821.1065 can display utf-8 with BOM correctly

Test the document. Test environment for the Simplified Chinese version of Windows, the English system can not reproduce this problem. test.zip

freesoft0000 commented 6 years ago

https://blog.csdn.net/zb361419953/article/details/54408488/ https://www.cnblogs.com/cyq1162/p/9183424.html https://www.jianshu.com/p/38f24ee67e7f https://github.com/notepad-plus-plus/notepad-plus-plus

RaiKoHoff commented 6 years ago

UTF-8 w/ BOM is easy to detect, because the "BOM" at the beginning of the file is quite unique. I prefer the name "Signature" instead of "BOM" for UTF-8, since it is not really a "Byte Order Mark" as it is in UTF-16 LE/BE. Pure UTF-8 is harder to distinguish from ANSI CodePages, 7-bit codes are the same in both ... since there is no signature to identify. Recently I switched to a new faster UTF-8 validator, maybe this is bad. I am going to provide a version with old validator ...

freesoft0000 commented 6 years ago

notepad++ is open source, you can refer to its code detection

RaiKoHoff commented 6 years ago

I am using the "Compact Encoding Detection (CED)" by Google plus additional guarding algorithms. One of this guards seems to be too restrict, rejecting your files as UTF-8 (while CED detected it fine). I relaxed the restictions, please test development beta _TinyExpr_4.18.821.1066. np3portableapp.zip Ed.: ensure that "Skip ANSI Code Page detection" is NOT checked 😉

freesoft0000 commented 6 years ago

v4.18.821.1065 v4.18.821.1066 ensure that "Skip ANSI Code Page detection" is NOT checked The two versions of the software can display the test document correctly when they are set up.

freesoft0000 commented 6 years ago

4.18.822.1155 Open test document is still garbled

RaiKoHoff commented 6 years ago

Yes, of cause, the changes published in v4.18.821.1065 v4.18.821.1066 have not been merged into master yet.

RaiKoHoff commented 6 years ago

Please see nano doc at https://github.com/rizonesoft/Notepad3/issues/618#issuecomment-415023218

RaiKoHoff commented 6 years ago

Latest commit should allow detection of UTF-8 while Skip ANSI Code Page detection is checked (ON, the default).

freesoft0000 commented 6 years ago

Notepad3 v4.18.823.1166 OK

hpwamr commented 5 years ago

As far as I am concerned, this issue may be closed....

rizonesoft / Notepad3

File parsing bug #614