Closed vsch closed 9 years ago
Done. It was like pulling on a string of a sweater.
Sorry, Vladimir, for not being able to jump in and support you in your endeavour here. I was (and still am) on family vacation this month.
It's amazing to see you pull through nevertheless. Hats off to you!
I have issues with ListItem parsing that I am addressing. However, the fix results in failed regression tests, which are wrong. In some cases it is a wrong expectation based on the md source. In other cases it is a wrong expectation of the Node source range based on the current parser's incorrect implementation.
Wrong expectation from markdown source
Here is one that is expecting wrong HTML generation based on existing errors in list parsing, from
resources/pegdown/Bug_in_1.0.0.md
It is expecting the
* another list item
to be tightly spaced, but this is wrong because it has a blank line preceding it. To match the expected HTML the md needs to have no blank line before the second list item as in:Wrong Expectation of Node source pos range
The current parser erroneously appends "\n\n" to the block, if it is loose, before recursively parsing it. This causes the item's node range to be extended 2 characters into the next line. The block of text should only contain actual text recognized from the source, otherwise the text source ranges will be out of sync. It gets worse for nested lists which get to be recursively parsed again, adding another 2 characters, etc.
Manual checking is a pain, but in one case when I checked the character offsets from beginning of the source text it did confirm that the node is extended erroneously by 2 characters. However, the AST tests expect this erroneous behaviour. The same will be found in Definitions because in that case the parser rule also appends
"\n\n"
to the block before recursively parsing it.Here is one that is off in the node's source position, from
resources/pegdown/AstText.md
:The expectation for the
* nothing more
list item is:The corrected expectation for above should read:
Highlighting the md source from 144-159 results in the following source highlight:
With the parser fixed not to append characters that have not been parsed from the source the resulting node source pos changes to 144-157 which is correct and corresponds to highlighted text:
Automated testing is not a magic wand
These are examples of automated testing's "Achille's heel". Just because there are lots of tests and the product passes, does not mean it is working correctly. It only means that it is working consistently. If there are tests not validated for correctness then they get in the way of correcting the code because the responsibility is on the one rocking the boat to make the erroneous tests pass.
Now I have to weigh the benefit of correct operation of the parser vs lots of extra work validating existing failing tests and correcting their flawed expectations.
I did say that doing detailed validation of parsing and the tests looked like "A career decision". It certainly is starting to pan out that way.
Does anyone have a fast way of tracing test results back to test source/expectations?
Anyone with more experience in working with sbt and pegdown source and validating tests please advise on what is available to make this a bit more efficient. What I am looking for:
Otherwise, it will take a while to get this fix in as I muddle through, finding the failed tests and then validating them manually and correcting the flawed ones so that the fix passes the tests.
Cheers.