notepad-plus-plus / notepad-plus-plus

Notepad++ official repository
https://notepad-plus-plus.org/
Other
22.76k stars 4.59k forks source link

[feature request] automatic detection of indentation #13981

Open DanielT opened 1 year ago

DanielT commented 1 year ago

In Notepad++ the settings "Tab size" and "Replace by space" are too inflexible.

I often edit files from different sources / projects, which use various different conventions. It seems like my settings for tabs and spaces are always wrong when I open a file.

Meanwhile VSCode has a setting that is described like this: "Editor: Detect Indentation Controls whether [Editor: Tab Size] and [Editor: Insert Spaces] will be automatically detected when a file is opened based on the file contents."

This works well, and I wish Notepad++ had the same feature.

alankilborn commented 1 year ago

Use the EditorConfig plugin.

DanielT commented 1 year ago

As far as I can tell, the EditorConfig plugin allows me to set .editorconfig files for certain paths, but it does not enable automatic detection. So, while this is an improvement for projects already on my machine, it doesn't solve the issue automatically for every file going forward, the way automatic detection would. In short: it's a band aid, but not a solution.

alankilborn commented 1 year ago

EditorConfig is a reasonable solution, long accepted in the industry. Visual Studio's text editor knows how to use it; probably VSCode does as well...

However, what exactly is your criteria for determination when opening an existing file? Don't just say "however VSCode does it" -- that would be a cop out.

Examples to get you started:

As I'm attempting to hint, this is not a trivial problem to solve (i.e., EditorConfig exists for a good reason). But, if you have some great ideas about it, let's hear them.

DanielT commented 1 year ago

I don't know the vscode implementation, but it doesn't seem that difficult to come up with a rough algorithm:

1) get the whitespace at the start of all lines; if a solid majority (say 90%) is spaces, then the indentation should be spaces; on the other hand if the majority (e.g. 90%) is tabs, then indent with tabs. If it is undecided, then the people editing were confused and the existing global setting should be used. 2) For the depth, if using spaces, bin the indentation values and find out if the majority of depths are divisible by some integer, and if that exists then use that integer as the indent depth. if there is no clear majority, or tabs are selected then fall back to the setting. 3) Finally, for performance, maybe don't look at the entirety of huge files and limit the analysis to the first (x) MB of input. If someone has a huge file that changes the indent style half way through, then that's their problem.

alankilborn commented 1 year ago

That's actually pretty good. Now I know you're intelligent. :-)

victorel-petrovich commented 1 year ago

For efficiency, how about looking at just the very first usage of indentation? It will also give the user a quick way to change what Npp should apply for this file.

DanielT commented 1 year ago

Looking at only the first usage of indentation seems like it would not be enough to avoid strange edge cases. For example, what if a file start with a comment:

/** description of my file / class / function / whatever
 * Lots of insightful text here ...
 */

Then you would detect the indentation as being one space, since the first usage of indentation would be " *" on the second line of the comment

victorel-petrovich commented 1 year ago

...after skipping initial comments then.

DanielT commented 1 year ago

Define "comment" in a way that is language and file-format agnostic.

That seems like a can of worms to me, where you would perpetually be adding fixes to catch up with someone's edge case. A statistical approach would be inherently much more robust.

mpheath commented 1 year ago

Could base on an existing SciTEBase::DiscoverIndentSetting

Converted to Python code as a more convenient test example It does not actually change indent settings yet as that is not the current focus of the test. DiscoverIndentSetting.py ```py from __future__ import print_function def discover_indent_setting(): text = editor.getText() newline = True indent = 0 # current line indentation tab_sizes = [0, 0, 0, 0, 0, 0, 0, 0, 0] # number of lines with corresponding indentation (index 0 - tab) prev_indent = 0 # previous line indentation prev_tab_size = -1 # previous line tab size for i in range(len(text)): ch = text[i] if ch in ('\r', '\n'): indent = 0 newline = True elif newline and ch == ' ': indent += 1 elif newline: if indent: if indent == prev_indent and prev_tab_size != -1: tab_sizes[prev_tab_size] += 1 elif indent > prev_indent and prev_indent != -1: if indent - prev_indent <= 8: prev_tab_size = indent - prev_indent tab_sizes[prev_tab_size] += 1 else: prev_tab_size = -1 elif ch == '\t': tab_sizes[0] += 1 prev_indent = -1 else: prev_indent = 0 newline = False # maximum non-zero indent top_tab_size = -1 for j in range(8 + 1): if tab_sizes[j] and (top_tab_size == -1 or (tab_sizes[j] > tab_sizes[top_tab_size])): top_tab_size = j # set indentation if top_tab_size == 0: print('Use tabs') print('getIndent:', editor.getIndent()) print('getTabWidth:', editor.getTabWidth()) print('editor.setTabWidth({})'.format(editor.getIndent() or editor.getTabWidth())) elif top_tab_size != -1: print('Use spaces') print('getIndent:', editor.getIndent()) print('editor.setIndent({})'.format(top_tab_size)) else: print('Stay with default settings') # debug: print('tab_sizes:', tab_sizes) # debug: discover_indent_setting() ``` Test on SciTEIO.cxx that contains the function. ``` Use tabs getIndent: 0 getTabWidth: 4 editor.setTabWidth(4) tab_sizes: [1331, 4, 0, 0, 0, 0, 0, 0, 0] ``` OK, it's tab indented. Try on self DiscoverIndentSetting.py ``` Use spaces getIndent: 0 editor.setIndent(4) tab_sizes: [0, 0, 0, 0, 15, 0, 0, 0, 13] ``` OK, though 4 space indent is 15x and 8 space indent is 13x. Sometimes the 8 space indent are >= 4 space indent and so the setting can be set as 8. So, would need to watch the setting. Could limit it to 4 spaces though languages like Fortran IIRC may start from column 7, so the up to 8 might be required.