zufuliu / notepad4

Notepad4 (Notepad2⨯2, Notepad2++) is a light-weight Scintilla based text editor for Windows with syntax highlighting, code folding, auto-completion and API list for many programming languages and documents, bundled with file browser plugin matepath.
Other
3.34k stars 212 forks source link

Freeze when opening large binary files (>600 MB) #396

Closed Sethur closed 2 years ago

Sethur commented 2 years ago

Version: v4.21.11r3986 OS: Windows 10 Arch: x64

Issue When opening large binary files (several hundred MBs), NotePad2 never finishes loading (at least not for several minutes until I killed the process) and gives no warning.

Suggested Fix Loading very large binary files should trigger a warning or avoid freezing by different means.

zufuliu commented 2 years ago

What's you system? Can you share the file?

concatenate six copy of Code.exe (Visual Code 1.62.2 , 121 MB), it's very smooth to open the 728 MB code6.exe.

D:\Tools\VSCode>cat Code.exe Code.exe Code.exe Code.exe Code.exe Code.exe > code6.exe
zufuliu commented 2 years ago

There used to be an option FileLoadWarningMB, the default value has been increased during times. The option was finally removed in commit 23908185980c0064ce36417021cb12f9343710c4 (no 2GB warning after 7442db69cab14dcc8d6d56d60e35fbf1a1e05f5d) during implementing large file support (issue #125). I don't think it's worth to bring it back.

Common cases Notepad2 will become freeze:

  1. First line of the file is ridiculous too long, so counting total characters and columns (the statusbar item Col current/total columns and Ch current/total characters ) for this line takes very long time.
  2. (when word wrap is enabled, which is the default), first few lines (lines around 1MB) of the file is ridiculous too long, text is only drawn after wrapping finished.
  3. There is infinite loop in our code.

🙏I hope you take time to make a reproducer, e.g. by reduce the file smaller and smaller, until Notepad2 no longer freeze.

Sethur commented 2 years ago

@zufuliu Thank you for your quick response and sorry for the delay.

I was able to reproduce the problem in a way that you can repeat yourself. The files that trigger this crash for me have a both a very large XML header (>10 MB) and a very large XML footer (>10 MB), in-between of which an even larger portion of binary data (>100 MB) is situated. So the overall structure is:

  1. XML Header > 10 MB
  2. Binary Data > 100 MB
  3. XML Footer > 10 MB

I was trying to reproduce this with publicly available large XML files and found, that I can only trigger the crash when there is a UTF-8 byte-order-mark at the very beginning of the file (otherwise the crash does not happen).

So to reproduce, do the following steps:

  1. Download this large XML
  2. Convert the file to UTF-16-LE
  3. Add the following UTF-BOM to the very beginning: FF FE
  4. Execute cat nasa.xml > testfile.log
  5. Execute for /l %i in (1,1,100) do cat c:\windows\explorer.exe >> testfile.log
  6. Execute cat nasa.xml >> testfile.log

Now try opening that file with notepad2. Don't forget adding the UTF-BOM, otherwise it will actually work :-)

Sethur commented 2 years ago

Update: I just found out that it should be UTF-16, not UTF-8... edited the last comment accordingly.

Sethur commented 2 years ago

Another Update: While working on this, I also found that Notepad2 pretty much crashes when trying to reencode the above-mentioned nasa.xml to UTF16-LE via F8, so the crash has probably something to do with the handling of UTF-16.

zufuliu commented 2 years ago

Hi @Sethur, seems I can not reproduce the crash, following is my steps (on Win10 21H1 19043.1348 x64):

  1. Download the 23.8 MB nasa.xml
  2. Open it with Notepad2, then File -> Encoding -> UTF16 LE BOM, save as nasa16.xml
  3. Open the 47.7 MB nasa16.xml in HxD, to ensure that it has the BOM FF FE (screenshot 1)
  4. run cat nasa16.xml >> testfile.log in cmd
  5. run for /l %i in (1,1,100) do cat c:\windows\explorer.exe >> testfile.log in cmd
  6. run cat nasa16.xml >> testfile.log in cmd
  7. open the 559 MB testfile.log in Notepad2, Notepad2 prompts up the inconsistent line endings dialog (screenshot 2), and shows the file as 615 MB on statusbar

HxD Screenshot: nasa16 xml

Notepad2 Screenshot: testfile log

Sethur commented 2 years ago

@zufuliu It seems like the issue with UTF16-LE-BOM files also had something todo with my local settings. I downloaded a fresh release (x64) and when using that, the crash upon converting nasa.xml to UTF-16-LE did not occur.

To reproduce the issue I managed to generate a 15 MB testfile that contains garbage XML and some binary data. When using a file this small, the fresh install of Notepad2 mentioned above does not crash completely, but it will take a very long time to get responsive again (on my system, it will take well over one minute).

I hope you can reproduce the issue with the testfile attached. It should be easy to make the loading time even longer by adding more binary data to the end of the file.

crashtest.zip

Sethur commented 2 years ago

PS: I used a fresh install of this release in the latest reproduction trials:

Notepad2_en_x64_v4.21.11r3986.zip

zufuliu commented 2 years ago

OK, I can reproduce the "crash" with the attached file, the file is opened immediately, but Notepad2 is not responding for a long time. The hangs seems related to word wrap, however Notepad2 still hangs on scrolling to bottom even word wrap is disabled.

zufuliu commented 2 years ago

Not a bug but performance issue, added a ticket at https://sourceforge.net/p/scintilla/feature-requests/1422/

zufuliu commented 2 years ago

commit 2ab9eda02d5c31eef6e09b0eef3c16da88279fde reduced the time for crashtest.log by about 1/3 ~ 1/2, with GDI (Settings -> Advanced Settings -> Rendering Technology -> Legacy GDI), the time reduced to about 15 seconds.

zufuliu commented 2 years ago

After c954c4cec64bc661d57188230310491180d4bec0 Notepad2 no longer freeze for file with very long lines, but it still need to find other method to reduce line layout time.

zufuliu commented 2 years ago

With parallel layout (see https://sourceforge.net/p/scintilla/feature-requests/1427/) the time for longest line in crashtest.zip reduced to 9 seconds on 4 core i5/i7 with Direct2D (GDI not tested), I will push the changes after released v4.22.01 on this weekend.

zufuliu commented 2 years ago

Please test latest builds (from https://github.com/zufuliu/notepad2/actions or https://ci.appveyor.com/project/zufuliu/notepad2, not v4.22.01), after 419cbafa60f7097f7ea8fea9c039db4b4801da93, Notepad2 is faster on system with 4 or more logical processors.

Sethur commented 2 years ago

@zufuliu Thanks for your hard work. I just tried release 419cbaf and it took only about 3 seconds to open the mixed XML and binary files that were causing the crash before. This issue can probably be closed now.

zufuliu commented 2 years ago

@Sethur thanks for the test, c1371d0749a56dec52965391ad8ef41ab156bcd7 make the code even faster, so you can use this version. as there are further improvements (see https://sourceforge.net/p/scintilla/feature-requests/1427/) the issue can be kept open.