jgm / skylighting

A Haskell syntax highlighting library with tokenizers derived from KDE syntax highlighting descriptions
190 stars 61 forks source link

Updating PureBASIC Lang Definition #10

Open tajmone opened 7 years ago

tajmone commented 7 years ago

I would like to update the language definition for PureBASIC:

https://github.com/jgm/skylighting/blob/master/xml/purebasic.xml

Current PureBASIC version is 5.60, and it has new keywords and functions.

I need to ask you a question: Since this file here is intended for code beautifying (and not autocompletion, like in an editor) my idea is to update (and then maintain) the list to contain all keywords and functions from PureBASIC 5.00 up to the latest current release — ie: keep deprecated and renamed tokens too, so that the highlighter will correctly parse old and newer code alike, as long as it's written for any PureBASIC 5.x version (but not for older versions, now lo longer in use).

Would this be in line with Skylight guidelines?

jgm commented 7 years ago

+++ Tristano Ajmone [Apr 24 17 04:30 ]:

I need to ask you a question: Since this file here is intended for code beautifying (and not autocompletion, like in an editor) my idea is to update (and then maintain) the list to contain all keywords and functions from PureBASIC 5.00 up to the latest current release — ie: keep deprecated and renamed tokens too, so that the highlighter will correctly parse old and newer code alike, as long as it's written for any PureBASIC 5.x version (but not for older versions, now lo longer in use).

Would this be in line with Skylight guidelines?

I understand why you want to do this. The problem is that if we did this, the xml definition would permanently diverge from KDE's upstream version, and we would no longer be able to take advantages of updates there.

The best approach is to propose an update that works upstream as well, and submit it there (and here too).

tajmone commented 7 years ago

I'm having toruble working out which is the upstream source of this xml file. Where should I propose these changes? I could contact the maintainer of the file and agree on the changes.

But I'm not sure if these definitions are meant for code editors, and if this means they should only mirror a specific release syntax (autocompletion, etc.). If this was the case, then we'd be facing a general problem applying to all langs, after all pandoc's highlighter should be aimed at code beautifying, therefore it should cover all possible versions of a lang, else it would be problematic to handle code from different versions of the same lang.

Is this upstream bond so tight that it's worth the price? Or could we consider loosing the bond and having Skylighting maintainers take up the task of integrating syntax updates in a backward preserving manner? For PureBASIC, I'm currently working on an automation system that would integrate new tokens from each release and autogenerate an updated definition for Highlight, Highlight.js and (I was hoping) for Skylighting too.

In the case of this PureBASIC definition, it misses out some of the new lang features and also seems to break away from previous ones. Since PureBASIC also has various LTS releases, it's quite common for users to work with more than one version of the language on the same machine.

So it seems the issue needs to be addressed — even if the upcoming pandoc 2.x will support external definitions, the vanilla distribution should offer syntax highligthing which is not bound to a specific release of any given language. Especially since we can't control that the creators of these third party xml files do update them to mirror the latest release of any given language.

It seems to me more complicated to have to push changes on separate repositories, at least for what concerns pandoc integration. Maybe these local changes could be made just for pandoc?

How does Skylighting handle these syntax definitions, does it merge them in from a specific upstream project, or are different definitions taken from different sources? (unfortunately these xml files don't allow space for comments and links, so they carry little info about themselves).

jgm commented 7 years ago

+++ Tristano Ajmone [Apr 24 17 07:26 ]:

Is this upstream bond so tight that it's worth the price? Or could we consider loosing the bond and having Skylighting maintainers take up the task of integrating syntax updates in a backward preserving manner?

Since the skylighting maintainers = me, I'd like to make this as simple as possible.

Of course, I take your point that the needs of a beautifying highlighter and an editor may be slightly different. But I'm reluctant to create additional ongoing maintenance burdens because of this.

Actually, I don't see why upstream maintainers of the xml syntax description would object at all to mixing tokens from different versions -- after all, they only provide one syntax definition for purebasic, and they don't know what version you're using -- so the situation isn't much different.

I fetched the sources from https://github.com/KDE/syntax-highlighting but this may be a mirror of a more official repository.

Having said all this, I'm not completely against having local changes that aren't reflected upstream, so if it's a big deal for you, you can propose something in a PR.

tajmone commented 7 years ago

Thanks for the link (I also noticed different source repositories and wasn't sure which is THE upstream).

I'll do that: I'll contact the mainter (traced him!) and discuss the issue with him. Hopefully, in a week or two my automation script could be ready, and this would make lang definition maintainance a matter of a few clicks (possibly not just for PureBASIC). I just wanted to make it right, adopting some JSON schema that can be flexible enough to be adapted to different usage cases (beautifiers and editors alike) and some templating strategy.

I'll post any updates to the issue if we can keep it open for a while.

tajmone commented 7 years ago

I have tried to contact via email the author/maintainer of the PureBASIC language definition in question on Apr 25th, but haven't got any reply so far — a full month has gone by.

So it looks like maintaining the PureBASIC definition up to date on the upstream might be a problem; plus, I didn't have a chance to discover if that definition is release specific (for code editing) or if it employs a cumulative list of keywords (for highlighting all versions of the lang).

As it currently stands, the lang definition doesn't cover the latest keyword/functions added to the language. If it's built for a specific version of the lang, then it might fail on both older and newer code.

I haven't looked into the list of keywords and procedures to check which approach it uses — as a rule, keywords are always added to the lang, and so far no keyword was abrogated, but built-in procedures and commands do change a lot with each release, with frequent renaming and abrogations, so that list would be quite sensitive when it comes to highlighting code from different versions of the lang.

I could look into it and try to understand if the commands list is cumulative or release-specific, but then the issue would be again on how to handle language definitons:

Since skylighting is a new improvement on pandoc, and since its main usage is going to be for highlighting code, I think that it's worth ensuring that language definitions take a cumulative keywords approach, so that old and newer code can be highlighted alike.

Could you please tell me more about Kate definitions — I've looked into it but documentation is rather big, and different versions have different documentation so I got a bit lost. My guess is that these are definitions intended for code editing, and that they are meant to be release-specific.

Wouldn't it be good to fork away from the upstream and find maintainers for each language that can ensure that all keywords are updated in a cumulative manner, and to include older keywords that might have been renamed or abrogated?

I know that the current list of languages is already quite big, and this might make maintainance more diffcoult. But in my view, it's better to have a lang definiton not updated to the latest lang version but which preserves all the previous version's keywords than having the latest version covered at the expenses of all previous versions.

What's your view on this?

jgm commented 7 years ago

+++ Tristano Ajmone [Jun 01 17 03:48 ]:

I could look into it and try to understand if the commands list is cumulative or release-specific, but then the issue would be again on how to handle language definitons:

  • Can skylighting accept some users updates diverging from the current upstream for certain languages?

I like to avoid this, but yes, it's possible in cases like this.

* How is the cumulative vs release-specifc (ie: highlighter vs
  editor) question going to be handled?

Since skylighting is a new improvement on pandoc, and since its main usage is going to be for highlighting code, I think that it's worth ensuring that language definitions take a cumulative keywords approach, so that old and newer code can be highlighted alike.

Yes, that makes sense to me. And I believe that's basically the approach kate takes. They don't have multiple definitions for different versions of languages. However, you could ask on a list for kate developers to see if they have a policy on this.

Could you please tell me more about Kate definitions — I've looked into it but documentation is rather big, and different versions have different documentation so I got a bit lost. My guess is that these are definitions intended for code editing, and that they are meant to be release-specific.

All I can say is to read the documentation and look at some examples.

Wouldn't it be good to fork away from the upstream and find maintainers for each language that can ensure that all keywords are updated in a cumulative manner, and to include older keywords that might have been renamed or abrogated?

No. That's too much manual work. Even supposing we could find people, they will inevitably stop maintaining their languages. It ends up being too big a burden on me.