openEHR / adl-antlr

Antrl4 grammars for ADL
Apache License 2.0
1 stars 4 forks source link

Things to do for Bert Verhees #17

Open pieterbos opened 8 years ago

pieterbos commented 8 years ago

Since @BertVerhees asked for things to do, a list of suggestions of things he can do:

In archie (https://github.com/nedap/archie) :

  1. Implement tests for any language construct - you'll test this grammar as well as the parser. the framework is already in place
  2. Implement parsing of CTemporal types. The grammar should parse those already, but the treewalker does not.
ghost commented 8 years ago

Hi Pieter, I appreciate the work you do, and I admire the craftsmanship.

But I test grammars in another way, less integrated, but more on the level of the strings which should fit in the detailed rules.

Working on this detail level, testing every parsing rule in the grammar has some advantages. I have my own boilerplate code which is quite easy to understand. The parser, in the end is tested thoroughly, but also every rule is tested, and because, complicated rules are in fact collections of small rules, there is complicated rule integrating testing also.

The advantage of this approach is that the tests are easy to understand, pinpoint immediately what goes wrong and where and when, which strings, etc. Working in this way also causes a very detailed understanding of the grammar, and I sometimes found hard to find bugs in grammars this way.

Do you think that is useful? I will write these tests anyway for future use, if not for the public, then for private use.

So I will start, writing tests, and publish them in a forked git repository, and you can judge later if that is useful for your way of working.

I will notify you where to find the work. Feel free to criticize it, I am glad to learn, I developed my grammar testing myself, and on my own, and I

am sure there are many other ways to do so.

Your request to fix a regular expression, I will check it. Which expression in the lexer is it? And what is the problem? My teacher used to say, if you solve a problem using regular expressions, you create another problem, but sometimes we have no choice. So I can take

a look, maybe I see something you did not see.

Regarding the CTemporal types, that the parser works ok, but the tree walker does not is very strange. I can check that also, but now I have no clue at all. Where can I find i formstion, How does the problem make itself visible? I saw some emails but did not read them well. Do you think the

problem is well enough described in those emails?

Let me explain my position. I am an independent developer, nobody is paying me, and I must find time to do this. I am also not a student, time is money. It is because it fits in some future plans which I have next year why I want to do this. So don't expect a high production, but I will do what I can.

Thanks for giving the opportunity.

Bert Op 27 okt. 2015 21:14 schreef "Pieter Bos" notifications@github.com:

Since @BertVerhees https://github.com/BertVerhees asked for things to do, a list of suggestions of things he can do:

  • In the adl-anltr grammar (this project):
    1. Fix the regular expression recognition in the ANTLR Lexer. The difficulty is that paths resemble regular expressions.

In archie (https://github.com/nedap/archie) :

  1. Implement tests for any language construct - you'll test this grammar as well as the parser. the framework is already in place
  2. Implement parsing of CTemporal types. The grammar should parse those already, but the treewalker does not.

— Reply to this email directly or view it on GitHub https://github.com/openEHR/adl-antlr/issues/17.

pieterbos commented 8 years ago

Hello Bert,

there are not really emails (although you can receive them this way), but github issues. You can find them on https://github.com/openEHR/adl-antlr/issues .

https://github.com/openEHR/adl-antlr/issues/7 explains the regular expression problem.

The CTemporal types is not strange at all - i just have not implemented them yet, the grammar works fine and it is not an antlr problem - just missing java code.

I think any testing in this case is useful. In any case, it's open source software, you can add anything you want in your own forks :) This grammar is mainly work by Thomas though, not me, I implemented a java library on top of it and tested the grammar that way.

ghost commented 8 years ago

Hi Pieter, thanks for your explanation.

I know that the work is mainly Thomas's work. I appreciate his contribution also very much.

I think testing is useful, because, I know from the previous grammar, which was written in JavaCC, that there were changes. Some even after five years. And it is an easy job to do, more or less, something to listen to classical music while doing, kind of meditation ;-). So I do it in my own fork as a contribution, and as an opportunity to learn the grammar very well.

Which will be very good for the next 10 years, I guess.

In the afternoon, I take a look to the at the regular expression issue.

Best regard Bert

On 28-10-15 10:31, Pieter Bos wrote:

Hello Bert,

there are not really emails (although you can receive them this way), but github issues. You can find them on https://github.com/openEHR/adl-antlr/issues .

7 https://github.com/openEHR/adl-antlr/issues/7 explains the

regular expression problem.

The CTemporal types is not strange at all - i just have not implemented them yet, the grammar works fine and it is not an antlr problem - just missing java code.

I think any testing in this case is useful. In any case, it's open source software, you can add anything you want in your own forks :) This grammar is mainly work by Thomas though, not me, I implemented a java library on top of it and tested the grammar that way.

— Reply to this email directly or view it on GitHub https://github.com/openEHR/adl-antlr/issues/17#issuecomment-151779376.

wolandscat commented 8 years ago

Re: testing code / test cases - if we can produce a nice library of test cases to add to this (adl-antlr) project it would be nice - I don't want it to just be the raw grammar - it should become a resource that a developer can use directly. To do that, I'll need help ;-)

ghost commented 8 years ago

On 28-10-15 14:31, Thomas Beale wrote:

Re: testing code / test cases - if we can produce a nice library of test cases to add to this (adl-antlr) project it would be nice - I don't want it to just be the raw grammar - it should become a resource that a developer can use directly. To do that, I'll need help ;-)

Can you give an example of what you intend and what the purpose is?

I remember in the old grammar, there was an error, there was a 13606 class-property as keyword, and that made it impossible to specify that particular class in adl. I have changed that, and I was happy to find out, everything still worked out, (no one complained until now) because there was no way to test the grammar, except the test-files Rong had written, which are not sure to cover the complete grammar.

But I believe you want a solution like that? I could write that too, all in good time.

I agree that testing the raw grammar is not used very often, it is mainly to be sure that there are no unintended side-paths in the grammar, and for those two or three times the grammar needs to be changed.

When googling, I found some test-tools on Antlr, until now, I tested my grammars only on raw-grammar, because, in fact, that tests the parser and the lexer also, but is quite some work. However, for a DSL, it is maybe two days, but for the OpenEHR grammars, I am afraid it is more then a week.

So, concluding, I don't know what is wise to do. I did not find any wise man having an opinion on this.

Maybe Terence Parr is the right person to ask. As main architect of OpenEHR, and also having English as first language, it could be better if you ask him.

If you can come to a wise decision, I am happy to help you create it

wolandscat commented 8 years ago

At the moment the test resource I am using is the adl-archetypes reference archetype set - what we use to test the parser and validator in the ADL Workbench. Each of these archetypes is intended to either illustrate some AOM/ADL feature in its working form, or else demonstrate a single kind of failure corresponding to a 'V' code, e.g. VCACA etc, which are the validity codes defined in the AOM, and that ADL WB emits.

Testing Antlr rules seems easier by testing each rule individually, and therefore using small fragments of archetypes, e.g. just a single C_COMPLEX_OBJECT structure etc. A test set based on this approach would presumably consist of many small files like this, and an automated test rig to run them all against specific rules; then we would also want to test full arcetypes and generate the same V-codes out from the reference archetypes.

Doing this probably means upgrading the current Antlr rules to include the V-codes due to specific error conditions.

ghost commented 8 years ago

On 28-10-15 16:20, Thomas Beale wrote:

At the moment the test resource I am using is the adl-archetypes reference archetype set - what we use to test the parser and validator in the ADL Workbench. Each of these archetypes is intended to either illustrate some AOM/ADL feature in its working form, or else demonstrate a single kind of failure corresponding to a 'V' code, e.g. VCACA etc, which are the validity codes defined in the AOM, and that ADL WB emits.

Testing Antlr rules seems easier by testing each rule individually, and therefore using small fragments of archetypes, e.g. just a single C_COMPLEX_OBJECT structure etc. A test set based on this approach would presumably consist of many small files like this, and an automated test rig to run them all against specific rules; then we would also want to test full arcetypes and generate the same V-codes out from the reference archetypes.

Doing this probably means upgrading the current Antlr rules to include the V-codes due to specific error conditions.

— Reply to this email directly or view it on GitHub https://github.com/openEHR/adl-antlr/issues/17#issuecomment-151879485.

Sounds good, only the last sentence, anticipating op probably to make errors, I have no idea if it is possible in a grammar. I think that part must be reserved to a class derived from the parser-class, which catches ANTLR errors and translates them, and will be used to process the parsing information. Probably there are also possibilities in the listener class, but I must study that.

ghost commented 8 years ago

Hi Thomas, in the ANTLR Reference is a complete chapter (9 Error Reporting and Recovery) about error-handling/reporting, and explanation and example how to change error messages. It is just a matter of adding your own errorhandling which translates the errors.

You can find the predefined errors in org.antlr4.v4.tool.ErrorType and ErrorSeverity

wolandscat commented 8 years ago

Indeed (I have the book) - I'll get onto this at some point, but if anyone wants to work out a few patterns, I'll be happy to incorporate them. Since I'm not normally a Java programmer, I'll need to make sure that what we do in this project is the 'normal' way to do it.

ghost commented 8 years ago

I can help you with that, we have the test-pattern Rong used, that could be a good starting point. I must see how to create the necessary boilerplate. Maybe you have already some archetypes for test-purposes, and according error messages/codes from the ADL Workbench, which you want the parser to produce.

And I must say, I must study the syntax from ADL2.0 (I postponed that ;-) But I must study that anyway, so this is a good occasion. So, if it is OK for you, I start this afternoon, and have at least the boilerplate code ready, and a first test. After that, things can go fast.

ghost commented 8 years ago

Hi Thomas, I checked the boilerplate code, it is in fact very simple, create a parser, add your errorlistener and parse the file, the errorlistener will then contain the errors that occured.

And for checking the AOM-classes, in fact Pieter Bos has build a good fundament. I think, this is ready, except for ADL-Workbench conforming errorhandling

Maybe you have test archetypes, they are easy to implement in the parser, with expected errors.

pieterbos commented 8 years ago

The error handling in Archie is still very primitive. The basic structures are in place (an ANTLR ErrorListener and a place to put errors and warnings), but that's it - the error messages are really primitive. I suggest something like the Bean Validation to handle archetype model validation and errors. If there is a list of standard errors - great, let's use it.

This is however not the first thing on my todo-list.

ghost commented 8 years ago

Pieter, error handling in ANTLR is, IMHO, good enough, it breaks at severe errors, and runs further at non severe errors. I believe there are five levels of severity. It must be easy to map some error-coding to antlr-errors, as long as the error-coding is parser-technical, the infrastructure is ready for it, it will be difficult to map semantic errors to parser errors.

I don't know what the adl-workbench has, Thomas knows, so it is for him to decide if it is possible to use that error-system, and maybe partly.

So, best thing to do is to wait for his opinion on this, and then, if appropriate, make a plan.