skuro / plantuml-mode

A major mode for editing PlantUML sources in Emacs
GNU General Public License v3.0
511 stars 96 forks source link

Indentation wrong when lines start with tab (indent-tabs-mode t) #144

Open mtoboid opened 3 years ago

mtoboid commented 3 years ago

Hi @skuro,

when working with tabbed indent (yes I know, use spaces...) the indentation of plantuml-mode was a bit off. I tracked it down to the indentation regexes, and \s behaving differently to what one would normally expect (it does NOT match whitespace). Replacing ^\s* with ^[[:blank:]] to match whitespace at the beginning of a line works though. I have also written a small test for the tab issue, as it didn't manifest in the already existing test cases.

Would be great if you could have a look at the changes when you have time.

best mtoboid

exot commented 3 years ago

Hey @mtoboid,

I had a look at your PR (thanks for you diligence in working this out!). If I understand this correctly, using \s* actually does work the way it had been used so far, because in strings \s is simply interpolated to a single space – but of course this will miss tabs, as you have pointed out. Indeed, to use the actual \s special regexp character, one would have to use \s in strigs, and then more specifically \s- to match whitespaces. However, this will also match newlines (if I'm not wrong here), resulting in some tests to fail in a subtle manner. So using [[:blank:]] as you suggest it is the right way, I think.

However, there are some \s* and \s+ still present in your PR; wouldn't it be better to replace them by [[:blank:]] as well? I tried this here and it seems to work (at least all tests pass). What do you think?

Related to this: I think the whole set of indentation regexps should be refactored for better maintainability, maybe using rx notation?

Best,

  Daniel

mtoboid commented 3 years ago

Hi Daniel,

first of all thanks for your kind comment!

1) I didn't replace all \s; only the ones at the start of a line ^\s as I wanted to fix indentation issues I had. I personally only use tabs (if at all) for indentation, but you might well have a point there. Replacing all \s with [[:blank:]] should certainly not break anything. However, the current tests are likely not covering all exotic corner cases, as already tab indentation slipped past. In general I think your suggestion is good.

2) With rx notation I assume you mean the macro that produces regexes from lisp code. I am actually not very good at lisp/elisp and personally prefer the perl-style-regex form, as that is what I'm used to. Up to now I couldn't see the benefit of rx notation, sorry, I think the whole issue is down to preference. I can see that for a lisp project one might consider switching to rx, but that would be up to @skuro I guess.

Best,

Tobias

exot commented 3 years ago

Hey Tobias,

thanks for your reply!

First of all, I realized I made a mistake while replacing all \s with [[:blank:]], leaving some erroneous spaces in some regular expressions. I tried to fix it here, could you have a look?

As for your preference of Perl-style regular expressions over rx, I totally agree! I like Perl-style regexp's much more, but since Emacs' regular expression syntax is sufficiently (and most often subtly) different from Perl's, I thought using rx might help here. But I think you are right, as rx will certainly add a huge bloat to the regular expressions because of its verbosity.

I would like to propose another way to refactor the regular expressions used for indentation, though: how about collecting all regular expressions into the single variables plantuml-regexp-indent-start and plantuml-regexp-indent-end, respectively, and getting rid of the special purpose variables such as plantuml-regexp-indent-newif-start and the like? This would allow to make use of common patterns such as "The line might begin with an arbitrary number of spaces" and "The line could end with an arbitrary number of spaces" and specify them only once instead of every time. Furthermore, tinkering with the regular expressions would not require to change two variables anymore (the special-case one and the general plantuml-indent-regexp-(start|end) one), but only one.

What do you think about this? Maybe I will just try it out and we could have a look at it? @skuro: what do you think about this idea?

Thank you all

  Daniel

mtoboid commented 3 years ago

Hi Daniel,

sorry for the late reply, had lots to deal with the last weeks.

I also had a look at your own develop branch, and you seem to be doing a great job in consolidating several pull requests! This is sort of going off-topic for this pull request here, and I wondered if there is a possibility to switch this discussion to its own thread - e.g. under 'Issues' on your fork? Tried to open an issue, but there is no such option...

Generally I would agree, that one could improve on the regexes (e.g. in plantuml-indent-regexp-legend-start the 'legend' appears multiple times).

As mentioned before my lisp isn't great, but I could certainly look at regexes if you intend to give them an overhaul. Having said that, my problem is that I haven't used PlantUML extensively, and am therefore not really aware of all use-cases, hence what code/string constructs to expect. The tests provide some idea, but having actual plantuml files would be good. (Perhaps re-writing the test function to use two files as input before<>after instead of using two strings?). The regexes do seem to fall into categories - ( ... end | { ... } etc.) which could be used to simplify and group them. However, I personally wouldn't combine them all into just two regexes (start & end), if I understood you correctly there, as I think these regexes would be quite difficult to read and would likely be error-prone.

Agreeing on a good system before starting to re-organize them might be a good idea.

One way I imagine this could work, would be to build the regexes with two components, one just a list of the keywords, the other the general pattern for start or end (^[:space:]......$), (^[:space:]... end ...$), and then combining them into the final regexes that get used. This would make it easier to add new keywords, but might have drawbacks I missed.

If you want to give it a shot I'd be up for contributing, but I also think it is likely more work than it looks - worthwhile though if new features can be implemented.

No idea what @skuro thinks about this, but I guess if you want to discuss the regexes further we should switch to a different thread.

All the best Tobias

exot commented 2 years ago

Hey Tobias,

thank you for your thoughtful feedback, and please apologize my very late reply. I had to tend to a lot of other things in the meantime and could only spend very little time on working on plantuml-mode (and the time I had I have spent on trying to write a file-exporter …)

Your thoughts on refactoring the regular expressions sound very reasonable to me, and go much further than I had thought about it so far. Yes, grouping all regular expressions into a "start-expression" and an "end-expression" might be too much, but your proposal (if I understand it correctly) to factor out common patterns and to fill in the specifics using placeholders sounds very good to me.

I think what would help me most during development and bug-fixing would be to also have a function that allows to re-evaluate all regular expressions used by the font-locker, maybe I will try that as well.

What I will definitively do is to open an issue for this!

Best,

  Daniel

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

exot commented 2 years ago

Bump to make stale-bot happy.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.