jgm / skylighting

A Haskell syntax highlighting library with tokenizers derived from KDE syntax highlighting descriptions
189 stars 61 forks source link

Contributing new highlightings back to https://invent.kde.org/frameworks/syntax-highlighting/ #106

Closed christoph-cullmann closed 3 years ago

christoph-cullmann commented 4 years ago

Hi,

as one of the developers of Kate & KSyntaxHighlighting I really like that you re-use our definition files.

On the other side, it seems you have some additional ones available we still lack in our repository.

As we normally accept new files without any large hassle (a small test file would be appreciated), could you try to contribute back the files that are at the moment only in this repository?

I diffed a bit and think stuff like llvm.xml, idris.xml and Co. are at the moment still missing in our repo.

I saw some other diffs, but think that is mostly because you lag a bit behind our repo with syncing.

Or are there other patches we could in-cooperate?

Greetings Christoph

jgm commented 4 years ago

I always request that people who submit such definitions submit them upstream. People don't always do it, it seems. But I see no reason why you couldn't incorporate them; they all contain licensing information.

jgm commented 4 years ago

If you can provide me with some instructions about how to contribute xml definitions upstream, I can include this in the README or CONTRIBUTING here, which may help in the future. (The lower the friction the better.)

jgm commented 4 years ago

Note also that I have some local patches, *.patch in the xml directory. Most of these are probably irrelevant for your purposes.

christoph-cullmann commented 4 years ago

Hi ;=)

Thanks for the quick response.

For the README/..., I think it would be nice to point to

https://invent.kde.org/frameworks/syntax-highlighting/-/merge_requests

and tell people to please submit there some merge request for any highlighting fixes/additions.

The general way would be to provide the file (or the changes) + some test file to show the changes/highlighting.

For documentation, we have

https://invent.kde.org/frameworks/syntax-highlighting/-/blob/master/README.md

For some example merge request, one could link e.g.

https://invent.kde.org/frameworks/syntax-highlighting/-/merge_requests/20/diffs

That should contain enough context.

For in-cooperating the current state:

1) You are right, I should just grab all files that are missing in our repo now from here once, given they have licensing info. For the future it would be nice to get some ping for new stuff. It might even be enough to open a issue here

https://invent.kde.org/frameworks/syntax-highlighting/-/issues

that tells us we should try to grab a new file, if submission is too much work.

2) For the patches/diffs, I am not that sure, some look like fixes for issues with your engine, one like https://github.com/jgm/skylighting/blob/master/skylighting-core/xml/haskell.xml.patch looks like some improvement for the hl.

Beside that, it is nice to see our stuff used so widely, I did even use pandoc myself and didn't initially see that the highlighting is provided by a library using our definitions, nice to have the work re-used!

christoph-cullmann commented 4 years ago

Btw., I think it would sense for us to make people aware of projects like this that use the definitions, too.

I guess some "Related projects" section in our README.md would make sense.

Would you have some paragraph (including preferred links) for skylighting to add there?

christoph-cullmann commented 4 years ago

I imported the ats.xml, our static checker run during build did detect several issues with the file (e.g. broken regex ranges, one context that is not used, some wrongly typed attribute names and unused item data).

I fixed that in

https://invent.kde.org/frameworks/syntax-highlighting/commit/8c26233fecd0311a1fe51919d9ba6f9c28ebaf5d

Might make sense to sync that file back to here.

christoph-cullmann commented 4 years ago

I imported now all things. Some more hl files needed fixes to pass our checker.

The syntax-highlighting repo should now contain all languages you have.

It might make sense for you to resync, e.g. fortran.xml got replaced with two extra files in our repo, alert_indent.xml is I guess no longer needed.

dhaumann commented 4 years ago

Hi, I'm another Kate developer and just wanted to give a bit more feedback about licensing. We recently switched the KSyntaxHighlighting framework to the MIT license, and encourage all new xml highlighting files to be MIT licensed as well. My suggestion would be to you to do the same for new highlighting xml files. This allows for a more consistent license in the future. 👍

jgm commented 3 years ago

Thanks for all of this; I've made some documentation changes encouraging submission of changes upstream, and I'll look into resyncing the syntax definitions.

christoph-cullmann commented 3 years ago

Thanks for improving the documentation. I hope more people will be interested to provide things upstream.

jgm commented 3 years ago

I'm getting an error related to this regex in scheme.xml:

  <!ENTITY regex    "(?:[^\\(\[/]++|\\.|\[\^?\]?([^\\\[\]]++|\\.|\[(:[^:]+:\])?)++\]|\((?R)\))+">

Is this regex correct? In particular the ? in (?R) seems fishy, as it doesn't modify anything in the group... I've tried to emulate KDE's regex parser but I'm not sure I have it quite right.

jgm commented 3 years ago

Also I don't understand why there are ++. And, where does the first character group get closed? The \] would seem to be escaped, but I don't see another option.

christoph-cullmann commented 3 years ago

Hi, if I am not wrong, this got introduced in

https://invent.kde.org/frameworks/syntax-highlighting/-/merge_requests/15/diffs

Our static checker should actually test all regex strings for validity using the PCRE wrapper inside Qt.

Therefore I would assume it is valid.

I ping the people in that request now.

christoph-cullmann commented 3 years ago

I just checked again: The QRegularExpression "isValid" check (that translates into the matching PCRE2 call) is happy with this expression.

For the first [ group, it is closed by the first ], that isn't escaped e.g.

(?:[^\\(\[/]++|\\.|\[\^?\]?([^\\\[\]]++|\\.|\[(:[^:]+:\])?)++\]|\((?R)\))+

=> the first group is

(?:[^\\(\[/]

For the (?R), that seems to be a PCRE variant for pattern recursion.

jgm commented 3 years ago

Oh, thanks, my eye somehow read the forward slash as a backward slash in /].

What about the ++? Is that any different from +?

Didn't know about (?R) -- I'll have to add support for that.

jgm commented 3 years ago

Btw, I had been assuming that the regex engine you used was the one documented here: https://docs.kde.org/stable5/en/applications/katepart/regex-patterns.html Is that correct? There is no mention of recursive patterns there, which is why I didn't implement it.

jgm commented 3 years ago

OK, I'm seeing that the extra + "makes the quantifier possessive, matching as many characters as possible and not releasing them to be matched by subsequent material. Another feature I need to implement.

jgm commented 3 years ago

I've added these regex features and integrated the latest syntax definitions (with just three local patches).

christoph-cullmann commented 3 years ago

Thanks for taking care.

The https://docs.kde.org/stable5/en/applications/katepart/regex-patterns.html docs are rather incomplete and more aimed to show people what the could use in e.g. search/replace.

We actually use now everywhere since some time the QRegularExpression wrapper around PCRE2 and support with that more or less all PCRE2 regex extensions.

jonathanpoelen commented 3 years ago

Hi, I just saw the regex notification today. I think the regex101.com site can help to understand them if in the future there are other errors. It offers a visual separation and an explanation of each part.

jgm commented 3 years ago

I didn't know that KDE just used pcre. We used to use a pcre wrapper in this project, but I've replaced it with a pure Haskell regex implementation that covers the features that seem to be used here... Of course, my implementation does not support ALL pcre2 features, so this might bite us later. But it seems good enough for now.