Textadept is a fast, minimalist, and remarkably extensible cross-platform text editor for programmers.
Typing causes CPU heating at 500 lines 28 kByte large bash lexed file with large string areas. #506

oOosys commented 7 months ago

It's really annoying and making Textadept almost useless. Is it a general valid issue because of lexers written in Lua or can the bash lexer be improved? SciTE and other editors with comparable text highlighting do not have any delays and don't let the CPU-fans run hot while typing within the same file. The sizes of the scripts I am writing are over time increasing. I was already some weeks ago wondering how does it come that the CPU-fans start to run from time to time without apparent reason and expected all, but not typing text in Textadept as the actual cause. Today I was able to reliably reproduce the issue identifying without any doubt fast (up to 350 chars/s) typing of text in Textadept as the cause.

The main question here is, does it make more sense with increasing size and complexity of the source code to go back to SciTE·from where I had come to Textadept or to does it make more sense to take a closer look at the lexers writing eventually own ones suitable to handle larger code?

What about re-design of the approach to lexing by skipping the dependence on a lexing library? Wouldn't it make it all much easier? With a well made template lexers designed to take advantage of techniques used to speeding up responses to typing and copying/pasting?

orbitalquark commented 7 months ago

It would be helpful to post the script or a script that exhibits the issue so that I can identify what's causing the slowdown. There was a large slowdown identified about 6 months ago (https://github.com/orbitalquark/scintillua/commit/5802bd4b04eae25368b227c985544a074407091a), but that might be separate issue.

oOosys commented 7 months ago

By the way: I am using Ubuntu proportional font set and tabs in my scripts, so the text looks strange if viewed using a monospaced font. The disadvantage of using tabs is that it needs that you stick to fonts with same proportion of characters in relation to the space character in order to get the positions right as intended. Scintilla seems to use the pixel width of the space character as reference for calculation of the tab positions in the text. If you switch to Ubuntu font then the colums will perfectly align. The shell script below is a valid shell script you can run using bash.

˙ ' A keyboard is the most essential peripheral of any computer system so it makes sense
to configure its function to suit custom needs on the Linux operating system, so that the keys on your keyboard will correspond to the letters or symbols that are being typed on screen. This is why it is so important to personalize the keyboard layout of your system to work with your keyboard model. In addition, we can configure custom keyboard settings which makes life a little easier.

    You can list all system-wide defined keyboard shortcuts defined in xconf database with
        ~ $ xfconf-query -c xfce4-keyboard-shortcuts -lv
Notice that for some weird historical reasons <Primary> and <Control> refer to the same 
    Ctrl/Strg key on the keyboard. 
There are two guiApps, one for Window related functions and one general purpose which
allow to configure keyboard shortcuts: 
        ~ $ xfce4-keyboard-settings
        ~ $ xfwm4-settings
Both kinds of keyboard shortcuts are listed by xfconf-query. 
The keyboard shortcuts seem to have higher priority than keyboard mapping described
You can list the currently used keyboard layout key mapping with
        ;   xmodmap -pke   ;˙  
Below two lines out the output of the command above (with leading Tab, further Tabs added AND the Unicode characters in brackets):
    keycode  60 = (.)period   (>)greater    ()period  ()greater     (˙)dead_abovedot   (ˇ)dead_caron    (·)periodcentered   (÷)division
    keycode  61 = (/)slash   (?)question    ()slash   ()question        (¿)questiondown   (???)dead_hook    (???)dead_belowdot   (˙)dead_abovedot
    keycode  65 = ( )space ()NoSymbol ( )space
                                ^-- change to   periodcentered   
    keycode   9 = Escape NoSymbol Escape
                            ^-- find a meaningful usage for [Shift]+[Esc]
There are four pairs of entries:    first item in a pair describes the result of pressing the key without and the second value with pressed SHIFT:
                               Key  |/| Shift                       + Key
        Ctrl                + Key   |/| Shift   + Ctrl              + Key
        AltGr               + Key   |/| Shift   + AltGr             + Key
        CapsLock + AltGr    + Key   |/| Shift   + CapsLock  + AltGr + Key

Some useful mutliple key control sequences: 

        °   ->  [Shift][AltGr][;]           (degree)
        ˙   ->  [AltGr][.][.]               (dead_abovedot  -> abovedot)
        ·   ->  [Shift][ ]              (periodcentered -> middle dot)
                    ^-- default setting:    [CapsLock][AltGr][.]

See  <X11/keysymdef.h> for a  list  of  keysym  names. Keysyms matching Unicode characters not covered in the list of names may be also specified  as  "U0020"  to  "U007E"  and  "U00A0"  to "U10FFFF" for all possible Unicode characters.

Customizing keyboard key mapping in three simple steps:
    ~ $ xmodmap -pke > ~/.Xmodmap   # save current settings to default configuration file
    ~ $ e ~/.Xmodmap                    # keep only changed lines in the configuration file
    ~ $ xmodmap ~/.Xmodmap          # activate changes (add to Session and Startup list)
    ~ $ man  xmodmap
for more details on usage of   xmodmap   
###### Here is the place which lets the CPU  run hot if typed fast, fast and faster at
###### 300 and more strokes/second
###### I have disabled wrapping lines, but this did not help ...


# ======================================================================
# ~ $ xfconf-query -c xfce4-keyboard-shortcuts -lv
#   ~ $ xmodmap -pke
> exit status: 0
# Iterate over each line of text stored in the variable $currentKeyboardLayoutKeyMapping
while IFS= read -r line; do 
#-> while ...; do invokes running a loop as long as   read   succeeds to get a line from the content of the variable which will be used as stdin for read utilizing the '<<<' construct of providing stdin content
#   IFS=                -> sets IFS (Internal Field Separator) to an empty value to prevent trimming the leading and trailing spaces. Specifying IFS within the while construct prevents IFS from being globally changed and affect subsequent code following the loop
#   IFS= read -r line       -> reads each line into the variable  line preserving leading and trailing whitespace. Trimming leading and trailing whitespace would be else the default behavior with IFS set to whitespace 
#           -r          -> option tells   read   to treat backslashes (\) literally, disabling the default behavior of interpretation of backslash escapes found in the text of the line and converting them to the characters specified by the escape sequence
# fields                -> BASH array with items (fields) got splitting  $line  using space as the delimiter according to the syntax of specification of a bash array items
#           fields[@]       -> evaluates to all items of the fields array
#           ${#fields}      -> evaluates to number of items in a bash array 
#           $(#fields[@]}   -> improves readability of the above expression with two hints towards indicating the total amount of items in the array 
    if (( num_fields > max_fields )); then # update the maximum number of fields ( if necessary )
done <<< "$currentKeyboardLayoutKeyMapping" # '<<<' provides the content of the shell variable on the right side to the   read   function which requests input in order to extract lines from it 
# Print the maximum number of fields
echo "Maximum number of keyboard input levels: $(( ($max_fields-3)/2 ))"

# Initialize an empty array to store skipped items
# Iterate over each line in the variable
while IFS= read -r line; do
    # Split the line into fields using space as the delimiter
    # Loop through each field beginning with the fourth one:
    for ((i = 3; i < ${#fields[@]}; i++)); do
        # Check if the field is a single ASCII letter or digit
        if [[ "$field" =~ ^[[:alnum:]]$ ]]; then
            # Add the ASCII letter or digit to the array of ASCII letters and digits
            # Add the skipped item to the array of skipped items
done <<< "$currentKeyboardLayoutKeyMapping"

# Sort and remove duplicates from the array of ASCII letters and digits
sorted_unique_ascii_letters_digits=($(printf "%s\n" "${ascii_letters_digits[@]}" | sort -u))

# Print the sorted and unique ASCII letters and digits
echo "Sorted and unique ASCII letters and digits:"
printf "%s\n" "${sorted_unique_ascii_letters_digits[@]}"

# Print the skipped items
# Sort and remove duplicates from the array of ASCII letters and digits
sorted_skipped_items=($(printf "%s\n" "${skipped_items[@]}" | sort -u))
echo "Sorted and uniquie skipped items:"
printf "%s\n" "${sorted_skipped_items[@]}"

P.S. I am using a custom version of Textadept, so to upgrade to newest changes I need to understand them and patch my code version with them. I mean that I had branched away at version 10 ... but ... except enabling keypress and key release for all keys my version is still almost identical with this of Textadept.

orbitalquark commented 7 months ago

Textadept backtracks from the current position to the beginning of a given "style" in order to start matching a lexer's grammar. In your case, when you have a string spanning hundreds of lines, Textadept needs to backtrack over each character until the string start, match the string and subsequent text, and then highlight it. This will take more time than static lexers like those used by SciTE because they backtrack to the beginning of the line only before matching and highlighting.

I've experienced this with Textadept's own lexers/lexer.lua when editing the 650+ module documentation block when it was a lexed as a single block. (It has since been split into single-line comments, resulting in better performance.)

Unfortunately, I don't know of a solution, other than to split up lexical entities, if possible, in the source file.

oOosys commented 7 months ago

Does it mean that the lexer cares about parts of the document which are past the viewed part at each keystroke instead of skipping them from consideration because irrelevant?

I am currently on the path to approaching programming the oOo-way and are making some progress. From point of view of oOo concept splitting strings into smaller parts is not an option. The oOo-way makes excessive use of multi-line comments and of mixing code and comments which are executable parameters in disguise. I will need earlier or later an own parser which will make things simple without the need of backtracking or other techniques due to weird design of a programming language syntax ... anyway I need to decide which path I would like to go - stick with Textadept or go back to SciTE? The C-code of Lexilla lexers looks good to me ... somehow even better than building upon a lexer library in Lua ... Another way is to apply highlighting only to some part of the text around the viewed window not caring much about the correctness, which would be in most cases ok, and if not ... it does not really matter much as the oOo way will probably not really need any highlighting in the sense of a syntax of a programming language.

orbitalquark commented 7 months ago

Textadept syntax highlights the remainder of the buffer that is out of view in "idle time" by default. You can disable/change this via the view.idle_styling option (https://orbitalquark.github.io/textadept/api.html#view.idle_styling).

If you do turn it off, you may see lags while rapidly scrolling through a buffer (or scrolling immediately to the end of a long buffer).

oOosys commented 7 months ago

"""view.idle_styling -> The idle styling mode. This mode has no effect when view.wrap_mode is on.""" -> what is mostly the case if I am working on a text. Any other idea what to switch off?

orbitalquark commented 7 months ago

No, I don't think Scintilla provides another option. If you turn off word wrap, do you still notice a slowdown? Scintilla's line layout routines for word wrapping can be noticeably slower than without word wrap. The lexer may not be to blame here.

oOosys commented 7 months ago

After adding this to init.lua:

--                          SPEEDING·UP·SLOW·LEXING 
view.idle_styling = view.IDLESTYLING_NONE
The idle styling mode. This mode has no effect when view.wrap_mode is on.
    view.IDLESTYLING_NONE Style all the currently visible text before displaying it.
    view.IDLESTYLING_TOVISIBLE Style some text before displaying it and then style the rest incrementally in the background as an idle-time task.
    view.IDLESTYLING_AFTERVISIBLE Style text after the currently visible portion in the background.
    view.IDLESTYLING_ALL Style text both before and after the visible text in the background.
    The default value is view.IDLESTYLING_ALL.

the heating up of the CPU is no more and scrolling down is OK·and without noticable delay.

THANKS·:) ... this solved the problem, so that I can close this issue as solved. Why stick to the ALL option then as default???

orbitalquark commented 7 months ago

I'm glad you were able to fix it!

It used to be view.IDLESTYLING_NONE by default, but like I mentioned before, I would be seeing more lag than not when scrolling through buffers. It bothered me, so I changed to view.IDLESTYLING_ALL. I believe SciTE also uses this option by default.

Different users will have different use cases, so it's good that there's at least an option to change this :)

oOosys commented 7 months ago

The current plan for the path using the oOo-way of thinking and approaching programming is to make different parts of "code" different by using different ranges of Unicode characters. This would turn lexing to colorizing characters based on their range within the Unicode ... this can't be then probably called lexing anymore ... it will be colorizing differently different Unicode code points ... no more backtracking, no more caring about the state the lexer is at to change it on given conditions ... The most difficult part on the oOo-way seems to make its advantages clear to others and spark enthusiasm. The benefits will be really apparent only after the system is equipped with thousands of files all designed in the oOo-way in mind - this can take years or even decades ... LLMs are only useful because of the huge database they are build upon. The oOo-way of approaching computer systems can show the power of the concept only if build upon Gigabytes or well organized data ... a solution to the by me personally perceived as mess approaches to solve the problem of having too much programming languages with adding even more new languages to the mess.

oOosys commented 7 months ago

By the way: if you want to scroll fast through large files ... Helix can be a solution ... Best if I could add modal way of editing as an switchable option in Textadept ... Helix approach goes well along with what I consider a better way of editing ( the reason why I am not using vi, but using Helix ). .