shriprem / FWDataViz

Fixed Width Data Visualizer plugin for Notepad++. Turns Notepad++ into Excel for fixed-width data files. Displays cursor position data. Jumps to specific fields. Folding Record Blocks. Extracts Data. Builtin dialogs to configure file-type, record-type & fields; Themes & Colors; and Folding. Handles homogenous, mixed & multi-line records.
GNU General Public License v2.0
37 stars 6 forks source link

Visualizer Styles don't work on large files #85

Closed mattsmac closed 1 year ago

mattsmac commented 1 year ago

Description of the Issue

When I try to apply a visualizer to a large file the colors don't appear. The field information still works, however. In my case, this is a text file of about 1GB, with over 1 million lines of about 900 characters each. However, if I copy and paste the entire content into a new tab, the visualizer applies the color styling without issue. Also, if I cut the file down to about 100,000 lines, the style works correctly. So it seems to have something to do with the size of the file, and that the file was opened from disk, rather than being created in a new tab.

Steps to Reproduce the Issue

  1. Open a very large file (1GB with 1,000,000 or more lines, 900+ chars per line)
  2. Try to apply any of the FW file types.
  3. The field data works, but the colors don't apply.
  4. Copy and paste the content of the file into a new tab.
  5. Try to apply any of the FW file types.
  6. The colors show as intended.

Expected Behavior

The colors should show on the source file, without having to copy and paste the content.

Actual Behavior

The colors did not show until after copying and pasting the content into a new tab.

Debug Information

Notepad++ v8.5.4 (64-bit) Build time : Jun 17 2023 - 20:42:45 Path : C:\Program Files\Notepad++\notepad++.exe Command Line : "D:\Junk\test (2).txt" Admin mode : OFF Local Conf mode : OFF Cloud Config : OFF OS Name : Windows 11 Pro (64-bit) OS Version : 22H2 OS Build : 22621.1992 Current ANSI codepage : 1252 Plugins : BigFiles (0.1.3) ComparePlugin (2.0.2) CSVLint (0.4.6.5) FWDataViz (2.6.2) NPPJSONViewer (2.0.5)

shriprem commented 1 year ago

Thank you for reporting this issue. I am able to replicate it.

In fact, I am able to pin the issue down to a precise point when the visualizer stops working. The zip file below contains Threshold.txt, derived by copy-pasting the contents from an older version of the ICD-10 Order Codes file several dozen times over.

Zip file with Threshold.txt: Threshold.zip

The file size of Threshold.txt is precisely 209,715,200 bytes. Open this file as is in NP++. FWDataViz will fail to visualize it. Delete just a single space at the end of this file, and save it with a different name, say: Threshold2.txt. Then close this 2nd file and reopen it in NP++. This time, FWDataViz will visualize the Threshold2.txt file, of size 209,715,199 bytes.

I was able to replicate this behavior on three devices, with different memory & hardware, and operating systems: Windows 11, Windows 10 and Windows 7. So the issue is do entirely with the code. Perhaps one or more integer-class variables too small for the task. Although, the number 209,715,200 in HEX 0C80 0000 seems unremarkable to me at this point of time.

I haven't had the chance to review the code yet. And, I may need a few days to isolate and fix the issue in the code. But being able to replicate the issue is a good start. Will keep you posted.

shriprem commented 1 year ago

Well! after spending 6+ hours debugging my plugin code, the resolution to your issue turned out be a new Preferences setting in Notepad++.

For optimal performance, Notepad++ turns off the styling in documents for large files. However, this can be modified to suit individual user's preference. At this screen:

image

As you can see, the default value for this is 200MB. Which is exactly 209,715,200 that I had independently uncovered in my previous post.

If I had been aware of this new preference setting in Notepad++ earlier, I would have resolved your issue immediately. This setting was added to Notepad++ code in this commit. And it was released as part of Notepad++ v8.4.7.

FWDataViz's performance is optimized to be independent of the actual file size. It achieves this by visualizing only those lines that are visible at any given time in the viewport of Notepad++. Initially when the file is loaded, or whenever there is a cursor movement, FWDataviz will start visualizing, starting with the top visible line and continuing till the bottom visible line. Hence whatever maybe the total line count of the file, FWDataViz stays focused only within that small range of lines visible in the Notepad++ viewport.

So if you are going to be using FWDatViz plugin with Notepad++ to view large documents, you can safely increase the preference setting in Notepad++ to cover your expected file sizes.

However, note that FWDataViz has to visualize the full length of each line. So if your data file has lines running up to several thousands of characters each, the visualizing performance of FWDataViz will indeed be affected. But the 900+ characters per line of your data file is well within the optimal range of FWDataViz.

mattsmac commented 1 year ago

Thank you for researching this! That was definitely the issue!