Behavior of undo after formatting

sogaiu commented 3 years ago

It appears that after one formats code using Ctrl-Shift-F (at least on Linux) that undo has no effect.

saikyun commented 3 years ago

Main problem atm is that only solution I can come up with involves keeping the whole file in memory, or dumping it to disk. Are there other ways to do this without potentially eating a lot of ram? I'm guessing it might be hard not to keep track of the whole file as long as one would use external tools like jfmt.

saikyun commented 2 years ago

Would be neat to implement a diffing algorithm to only keep the changes, rather than the full file. One way that seems pretty doable that I read about (https://wiki.c2.com/?DiffAlgorithm):

The core of diff algorithms seeks to compare two sequences and to discover how the first can be transformed into the second by a sequence of operations using the primitives delete-subsequence, and insert-subseqence. If a delete and an insert coincide on the same range then it can be labeled as a change-subsequence. The user doesn't always want to be informed as to which subsequences remain identical, but this information is nonetheless typically computed as a by-product of the basic algorithm. The sophisticated way to find those editing operations is to sort the two sequences in order of the values of the elements of the sequences; for when the basic unit is a text line, usually a hashcode of each line is used to represent the line, and the sequences are sorted by hashcode in order to discover groups of identical elements/lines. Once identical elements are found, they are clumped together into runs which were adjacent in the pre-sorted sequences. One common variant looks for the longest possible such runs, which is an O(N^2) approach, but for N equal to the number of possibilities in each group, not the number of total elements/lines, so this is usually acceptably fast. At this stage all matching regions between the two sequences/files have been found. Running sequentially through the elements/lines that have not been matched now trivially indicates which regions have been inserted, deleted, or changed. Usually moved regions are indicated as an insert and a delete, but with some extra effort (not typically done by most diff programs) they can be reported as moved. Actually source code in C can of course be found in e.g. the GNU version of diff or the one from Version 7 Unix, also available online. The smallest versions of this algorithm can be implemented in less space than the above text of this article; it's less complicated than it may sound.

sogaiu commented 2 years ago

I didn't digest the above deeply, but a cursory look reminded me of "the string-to-string correction problem". I came across mention of that in section 3.3.1 of a chapter in "The Architecture of Open Source Applications" about Bash.

saikyun / freja

Behavior of undo after formatting #16