notepad-plus-plus / userDefinedLanguages

Notepad++ User Defined Languages Collection
593 stars 380 forks source link

Definition of a new UDL on the basis of a predefined language #62

Closed Zack-83 closed 2 years ago

Zack-83 commented 3 years ago

Dear N++ team, I would like to contribute creating syntax highlighting for some "hybrid" languages, which inherit from (more) predefined - not user defined! - languages:

HTML + Python --> Django R + Markdown --> Rmarkdown

Is there a way to extract the rules from the Scintilla lexers and to convert them somehow into a UDL description? Regards Giacomo Lanza

chcg commented 3 years ago

Regarding django: https://sourceforge.net/p/scintilla/bugs/975/ and from https://www.scintilla.org/ScintillaHistory.html

Fix HTML lexer handling of Django so that nesting a {{ }} or {% %} Django tag inside of a {# #} Django comment does not break highlighting of rest of file

, so maybe there is already support available by the html lexer.

skorpio07 commented 3 years ago

is there any interest in make a UDL based on VBA? I ask cause I live in VBA everyday (current job) and would be willing to put in the time to make the UDL just for me (and really love the new dark theme, its so perfect), but would like to here any other opinions.

pryrt commented 3 years ago

@skorpio07 ,

Since VBA is just Visual Basic with extra known keywords, I would be tempted to do something like adding keywords in the Style Configurator > VB/VBS > WORD and editing the user-defined keywords. I have a feeling that would be easier than getting the main syntax all properly defined.

But if you think the results are nicer using a UDL, or you want to learn the UDL system by doing this, then by all means, create that UDL and submit a PR to add it to the repository!

Yuyiya commented 2 years ago

Zack-83 wrote 'I would like to contribute creating syntax highlighting for some "hybrid" languages, which inherit from (more) predefined - not user defined! - languages ...'. This is a great idea!

I'd like to add the KeyKit language (https://github.com/nosuchtim/keykit) - "an algorithmic MIDI scripting language and GUI system" - which in many ways resembles C, for making algorithmic MIDI music, created years ago by Tim Thompson https://timthompson.com/tjt/. And it would be simplest to start from the C language definition.

The question, more generally, is: How can I create a UDL using an existing language as a template?

(Somewhat simpler than Zack-83's question, which is, more generally: How can I create a UDL using two or more existing languages as templates?)

pryrt commented 2 years ago

@Yuyiya ,

If you mean, How can I create a UDL using an existing [UDL] language as a template?, you use Save As in the User Defined Language dialog, and then edit it.

If you mean, How can I create a UDL using an existing [built-in lexer] language as a template?, the unfortunate answer is "you cannot" -- or rather, "start from scratch". The built-in lexers included in Notepad++ are actually compiled code which is linked into the Notepad++ executable; each built-in lexer has its own code, and its own idiosyncracies, and it's own internal parser. The UDL is a central parser which just tries to match the keywords and special symbols for the active language -- you define a UDL by just supplying the lists of keywords and operator symbols.

It really isn't that hard to start "from scratch".

image

It only took me about 15min to come up with a first approximation based on the first couple page-downs https://htmlpreview.github.io/?https://github.com/nosuchtim/keykit/blob/master/doc/language.html -- and that's when I know nothing about that language. If I were to have read the whole page, I would have been able to add more operators and other fancy keywords that I see when I just glance down the page. Someone who knows the language better would be able to get better groupings of the keyword lists.

If you want to start where I left off, save this attached keykit.xml.txt as keykit.xml in your userDefineLangs\ directory and restart Notepad++; you will have to change the extensions, and re-group keywords, and maybe add more groups of keywords. Once you've got it working, feel free to make a PR to submit it to this repo, assuming you think your attempt is good enough to share with other Notepad++ users.

Yuyiya commented 2 years ago

@pryrt ,

If you mean, How can I create a UDL using an existing [UDL] language as a template?, you use Save As in the User Defined Language dialog, and then edit it.

___ Yes, I get that. But that's not what I meant.

If you mean, How can I create a UDL using an existing [built-in lexer] language as a template?, the unfortunate answer is "you cannot" -- or rather, "start from scratch". The built-in lexers included in Notepad++ are actually compiled code which is linked into the Notepad++ executable; each built-in lexer has its own code, and its own idiosyncracies, and it's own internal parser. The UDL is a central parser which just tries to match the keywords and special symbols for the active language -- you define a UDL by just supplying the lists of keywords and operator symbols.

___ Yes, that's what I meant. And it's a pity there isn't a function to re-use another built-in language as template. Still, if the implementation of the built-in languages is as idiosyncratic as you say, it probably wouldn't be simple to solve the more general problem I posed: namely, to design a unified architecture for processing a definition of any language (comprising a dictionary of its words, with rules for its syntax and grammar) using some standard language specification model (whatever the best choice of standard for that is currently); then to implement that design in Notepad++ for all its in-built languages.

It really isn't that hard to start "from scratch".

___ Yes, I agree, and have done some trivial UDLs for my own niche uses (MP3 metadata, MuseScore 'Score' info).

image

It only took me about 15min to come up with a first approximation based on the first couple page-downs https://htmlpreview.github.io/?https://github.com/nosuchtim/keykit/blob/master/doc/language.html -- and that's when I know nothing about that language. If I were to have read the whole page, I would have been able to add more operators and other fancy keywords that I see when I just glance down the page. Someone who knows the language better would be able to get better groupings of the keyword lists.

If you want to start where I left off, save this attached keykit.xml.txt as keykit.xml in your userDefineLangs\ directory and restart Notepad++; you will have to change the extensions, and re-group keywords, and maybe add more groups of keywords. Once you've got it working, feel free to make a PR to submit it to this repo, assuming you think your attempt is good enough to share with other Notepad++ users.

___ Why, thanks! I might just do that. I've never yet made a code contribution on GitHub, but doing that's a bridge for another day. Meanwhile, I've saved your 'first approximation' and added the extension "k". That's all I can make time for today, but it's a start!

___ (PS - Is there a way to indent replies here? E.g. Wikipedia uses one or more leading colons ":" to indent text by so many tab spaces.)

pryrt commented 2 years ago

to design a unified architecture for processing a definition of any language (comprising a dictionary of its words, with rules for its syntax and grammar) using some standard language specification model (whatever the best choice of standard for that is currently); then to implement that design in Notepad++ for all its in-built languages.

There is. Instead of using a UDL, you could use the existing code for the C language lexer, then manipulate it into a lexer plugin specific to your language, and release that as a plugin. That's been done before (like the GEDCOM lexer plugin). But that's a huge amount of work. UDLs are meant to be a simple alternative to doing all that work... but it means that you have to start from scratch when you think you have a language "kindof like XXX". (And note, from what I saw, the language you were talking about is almost nothing like C, other than using braces for blocks.)

to implement that design in Notepad++ for all its in-built languages.

Notepad++ gets its built-in languages from Scintilla, a separate project; the NPP developers aren't interested in writing their own language-definition model and losing all of that existing IP.

___ (PS - Is there a way to indent replies here? E.g. Wikipedia uses one or more leading colons ":" to indent text by so many tab spaces.)

GitHub uses markdown for its comments; markdown uses > as a prefix for a quoted line.

Yuyiya commented 2 years ago

@pryrt - Thanks! Yes, I was aware of the Scintilla connection, but hadn't realised how opaque the language definitions were, as implied by the specific lexers. I guess this also, indirectly, answers Zack-83's other question:

Is there a way to extract the rules from the Scintilla lexers and to convert them somehow into a UDL description?

By implication, the only way to do so would be to consult the source code of the relevant lexers, and hand-build the language definition from that.

I had been hoping that we could use something like the old BNF (Backus-Naur (normal) Form, IIRC) as a unified way of specifying an arbitrary language, and write a general lexer to process that (more abstract) definition. But if there's as much work as you say, nobody's likely to take up the challenge without some strong motivation.

Thanks again for your replies.