grosser / ruco

Desktop-style, Intuitive, Commandline Editor in Ruby. "Better than nano, simpler than vim."
MIT License
130 stars 10 forks source link

Syntax Highlighting Thoughts #7

Closed ajpalkovic closed 12 years ago

ajpalkovic commented 13 years ago

I was thinking of something like this for defining syntax highlighting rules for ruco: https://gist.github.com/975627

It's a lot like nano's syntax highlighting, but it's pure ruby and I wanted to add a cool feature for essentially inheriting syntax rules. For instance, erb and php are html files, but when you add <% %> or <?php ?> the rules change to something else. It's silly to redefine rules for the same language, so I wanted to simplify syntaxes with a kind of inheritance.

I also wanted some nesting within rules. For instance, in html, I might want to first match a tag like and then inside of that I might match attribute names and values. In nano that was always harder because there was no good nesting, so I think this will greatly simplify syntax definitions and potentially make them more powerful.

grosser commented 13 years ago

sounds like a good idea and the syntax dsl seems nice. I think the dsl should not hard-coded to colors, rather use something like "string" "tag" etc, so users can say something like "string" -> green, "tag" -> blue. I did not think of writing a new syntax highlighting library, since there are plenty available, some of which must be good :)

There is a colors branch that has some basic color support. The only problem was that once you start using colors, everything needs colors and there is no black in curses, only dark gray which was kind of ugly :<

ajpalkovic commented 13 years ago

It looks like the only commit on the colors branch was adding a todo, is that right?

Did you have other syntax highlighters in mind? I didn't think about using a 3rd party one which is why I started thinking about a dsl.

Yea, we could change that to like tag :blue, /<.>/ if that means sense, instead of blue /<.>/. Do you get any benefit from naming them like that though? I used the colors as methods I think just to shorten it up.

-AJ

On Tue, May 17, 2011 at 1:05 AM, grosser < reply@reply.github.com>wrote:

sounds like a good idea and the syntax dsl seems nice. I think the dsl should not hard-coded to colors, rather use something like "string" "tag" etc, so users can say something like "string" -> green, "tag" -> blue. I did not think of writing a new syntax highlighting library, since there are plenty available, some of which must be good :)

There is a colors branch that has some basic color support. The only problem was that once you start using colors, everything needs colors and there is no black in curses, only dark gray which was kind of ugly :<

Reply to this email directly or view it on GitHub: https://github.com/grosser/ruco/issues/7#comment_1187816

ajpalkovic commented 13 years ago

O, i see, there are 2 colors branches I guess? Or at least that's what github thinks.

On Tue, May 17, 2011 at 1:15 AM, AJ Palkovic aj.palkovic@gmail.com wrote:

It looks like the only commit on the colors branch was adding a todo, is that right?

Did you have other syntax highlighters in mind? I didn't think about using a 3rd party one which is why I started thinking about a dsl.

Yea, we could change that to like tag :blue, /<.>/ if that means sense, instead of blue /<.>/. Do you get any benefit from naming them like that though? I used the colors as methods I think just to shorten it up.

-AJ

On Tue, May 17, 2011 at 1:05 AM, grosser < reply@reply.github.com>wrote:

sounds like a good idea and the syntax dsl seems nice. I think the dsl should not hard-coded to colors, rather use something like "string" "tag" etc, so users can say something like "string" -> green, "tag" -> blue. I did not think of writing a new syntax highlighting library, since there are plenty available, some of which must be good :)

There is a colors branch that has some basic color support. The only problem was that once you start using colors, everything needs colors and there is no black in curses, only dark gray which was kind of ugly :<

Reply to this email directly or view it on GitHub: https://github.com/grosser/ruco/issues/7#comment_1187816

grosser commented 13 years ago

damn i think i lost those commits :D ill double-check... style :tag, /<.*>/, :default => :blue, so we only define styles or sections that exists and the color-coding can be left to the user or e.g. color-scheme

ajpalkovic commented 13 years ago

https://github.com/ajpalkovic/ruco/commit/1e51da06bce70b55240e380cda991e3895df3ea3

On Tue, May 17, 2011 at 1:23 AM, grosser < reply@reply.github.com>wrote:

damn i think i lost those commits :D ill double-check... style :tag, /<.*>/, :default => :blue, so we only define styles or sections that exists and the color-coding can be left to the user or e.g. color-scheme

Reply to this email directly or view it on GitHub: https://github.com/grosser/ruco/issues/7#comment_1187869

ajpalkovic commented 13 years ago

I definitely like the idea of naming rules though so it is easier to create themes. In fact, I think rather than the syntaxes even specifying a default color they should only be allowed to specify types. We can supply a default theme with lots of types, but I think that's better because it separates the color from the syntax. In fact that's exactly how editors like e and textmate do it, the syntaxes define types, the themes allow you to associate a color with a type, and if there is no color for a type, it is ignored.

-AJ

On Tue, May 17, 2011 at 1:26 AM, AJ Palkovic aj.palkovic@gmail.com wrote:

https://github.com/ajpalkovic/ruco/commit/1e51da06bce70b55240e380cda991e3895df3ea3

On Tue, May 17, 2011 at 1:23 AM, grosser < reply@reply.github.com>wrote:

damn i think i lost those commits :D ill double-check... style :tag, /<.*>/, :default => :blue, so we only define styles or sections that exists and the color-coding can be left to the user or e.g. color-scheme

Reply to this email directly or view it on GitHub: https://github.com/grosser/ruco/issues/7#comment_1187869

grosser commented 13 years ago

sounds good!

i added a coloring example, rake try_color color handling is rather complicated, you have to define all colors you need first and then use them by id, maybe we can simplify this by e.g. adding colors just in time.

grosser commented 13 years ago

when choosing a library for syntax coloring, please have a look at the load and execution speed too, making it slow down the initial open-time would not be nice (e.g. time ruby -r color_lib -e '' time ruby -r color_lib -e 'ColorLib.colorize(File.read(big_file))' )

ajpalkovic commented 13 years ago

Did you have any syntax highlighting libraries in mind? I have found a few, but most seem to be fairly old. And what all do we want in a library? These are my features:

-AJ

On Tue, May 17, 2011 at 2:25 AM, grosser < reply@reply.github.com>wrote:

when choosing a library for syntax coloring, please have a look at the load and execution speed too, making it slow down the initial open-time would not be nice (e.g. time ruby -r color_lib -e '' time ruby -r color_lib -e 'ColorLib.colorize(File.read(big_file))' )

Reply to this email directly or view it on GitHub: https://github.com/grosser/ruco/issues/7#comment_1188053

grosser commented 13 years ago

looking at http://ruby-toolbox.com/categories/syntax_highlighting.html I think albino should be the way to go... github also uses it, only downside is that it requires pygments :<

grosser commented 13 years ago

this also looks nice: https://github.com/artemk/syntaxer so we could get some syntax verification

grosser commented 13 years ago

the syntax gem only supports ruby/xml/yaml, which is not enought imo

grosser commented 13 years ago

this is what nano is using: http://code.google.com/p/nanosyntax/source/browse/trunk/syntax-nanorc/ruby.nanorc

grosser commented 13 years ago

as far as i can see the best solution would be to write a syntaxer gem, that relies on pure regex and returns a tokenized string

e.g. "if a =~ /bbb/" -> [['if'', :keywoard], [' ', :whitespace], ['=~', :operator], [' ', :whitespace], ['/bbb/', :regex]]

ajpalkovic commented 13 years ago

Yea, looking over the ruby toolbox was interesting. Some of them use pygments, a python library. It looks like it works well, but it would have a huge downside in that we could never do a partial render. Because the ruby versions of it use pipes, we would have to re-color the whole document for every key event which would suck.

Some of them used cfgs (context free grammars). Cfgs are great because they can be very accurate and they are more powerful than a single regex, but they would effectively require us to do a full reparse of the file on every key event.

So, I see two reasonable options. The first is to use textmate style grammars. There is a ton of documentation on how these are defined and they are already grammars for a ton of languages out there which saves a ton of time. They downside though is that the grammars are far more complicated to implement, and that could ultimately be a problem. Textmate grammars are very powerful, they are almost like a cfg, but if you want to keep ruco a simple editor, it might be too much to do.

The other option is nano style grammars. I read through more of the nano code today and it seems what they do is the read the regexes for a syntax, and execute them on each line of the file in order. Wherever there is a match, they effectively immediately highlight that region and then if subsequent regexes match, they just overwrite the other colors. This is relatively fast and simple, but there is not 'structure' to the regexes so they are limited and it can lead to syntax highlighting errors. Additionally it would be easiest to add some kind of partial re-rendering to this, so that only changed lines are re-colored.

The only thing that concerns me about what you mentioned is what happens if you put the same line of code in a string or a comment? Then our parser has to be smart enough to not process if as a keyword, but as part of the string or comment.

Idk, I'm in two minds about it. It would be pretty kickass to bring textmate grammars to a command line editor. I haven't found any other editors that do that. We would have fairly complete and accurate syntax highlighting for a ton of languages. But it would certainly be much more complicated to do, and to do it efficiently. The simpler syntax highlighting is nice, but we'd have to create syntaxes for languages again.

-AJ

On Tue, May 17, 2011 at 1:12 PM, grosser < reply@reply.github.com>wrote:

as far as i can see the best solution would be to write a syntaxer gem, that relies on pure regex and returns a tokenized string

e.g. "if a =~ /bbb/" -> [['if'', :keywoard], [' ', :whitespace], ['=~', :operator], [' ', :whitespace], ['/bbb/', :regex]]

Reply to this email directly or view it on GitHub: https://github.com/grosser/ruco/issues/7#comment_1191723

grosser commented 13 years ago

I think you can do partial renders with pygments, just submit only one line, but it would result in some of the problems that regex-based styling has. Using textmate grammer sounds like a nice idea, maybe get it working on per-line level, so its simpler and we get an instant ton of languages supported (and hopefully its not too slow...)

ajpalkovic commented 13 years ago

These are two ruby projects that will help: Textpow is actually a textmate parser, which is exactly what we want. But, I saw another project that used it that it does not work with all of the grammars yet, like C and PHP. We would need to tweak it to work with curses or with escape sequences: https://github.com/spox/textpow This is a graphical editor in ruby that uses textmate grammars: https://github.com/danlucraft

This is a c++ text editor that fully and correctly supports textmate grammars: https://github.com/etexteditor/e

If you are into more theory stuff, this is a paper on structured regular expressions which is the ultimate building block of textmate grammars: http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.4069

http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.4069Anyway, I'm gonna read up more on textmate grammars and how to implement them before doing anything else.

-AJ

On Tue, May 17, 2011 at 4:39 PM, grosser < reply@reply.github.com>wrote:

I think you can do partial renders with pygments, just submit only one line, but it would result in some of the problems that regex-based styling has. Using textmate grammer sounds like a nice idea, maybe get it working on per-line level, so its simpler and we get an instant ton of languages supported (and hopefully its not too slow...)

Reply to this email directly or view it on GitHub: https://github.com/grosser/ruco/issues/7#comment_1192914

grosser commented 13 years ago

I think the curses mapping should not be that complicated, the only thing we need is a clean token/color output like [['if', :blue]] and then we can map these to curses colors very easily. Ill read the paper soon, sounds interesting :) The current StyleMap supports a [[0, :blue], [5, :red]] style syntax -> 0..4=:blue + 5..-1 = :red

ajpalkovic commented 13 years ago

I might have given you the wrong article, I took a quick look at it, and it didn't seem right, I think this might be it: http://doc.cat-v.org/bell_labs/structural_regexps/se.pdf . I can't remember though, Alexander Stigsen, the guy who made E for windows sent it to me some time ago.

I think if you have that already in place though, then we should be good to go. I'll start by integrating textpow. It has an api for outputting to custom formats so that ought to be able to get us started. We probably will have to have our own dsl for themes, because textmate themes will all specify 24 bit hex colors, which I don't think we can use. So last millennium, but I think we'ere limited to 8 or 256 colors unless some cool new toy has come around.

On Wed, May 18, 2011 at 1:31 AM, grosser < reply@reply.github.com>wrote:

I think the curses mapping should not be that complicated, the only thing we need is a clean token/color output like [['if', :blue]] and then we can map these to curses colors very easily. Ill read the paper soon, sounds interesting :) The current StyleMap supports a [[0, :blue], [5, :red]] style syntax -> 0..4=:blue + 5..-1 = :red

Reply to this email directly or view it on GitHub: https://github.com/grosser/ruco/issues/7#comment_1195101

grosser commented 13 years ago

the first try on colors branch COLORS