The problem is the rule item.2: header item_content_line* dashed_line_end?, specifically the item_content_line* part that ends up consuming more lines than it should. Like dashed lines, for example, which are supposed to end the item.
I am using the negative lookahead assertion in ITEM_CONTENT_LINE to stop more item_content_line rules being followed. THis works.
Is there another way to do it with rules or terminals?
The item_content_line is supposed to end with either a dashed line or a header followed by a dashed line:
The item consists of a header like 'WINGET' at the start of a line followed by dashed line, then the content, and then ending with a dashed line, or, if the next header comes first, that starts a new item first before the ending dashed line. Other lines, other stuff in the file can exist in between the "items" starting with the headers.
But I can't get it to work without using the negative lookahead in ITEM_CONTENT_LINE (at the beginning of ITEM_CONTENT_LINE).
The item_content_line* should end as soon as it hits a dashed line, or a header followed by a dashed line. Then it should go to other_line, dashed_line_end, or another new item.
But if I replace the ITEM_CONTENT_LINE terminal with just the plain LINE terminal, the item_content_line* starts consuming dashed lines which are supposed to end the item straight away.
I have a lark grammar:
And a test file:
The problem is the rule
item.2: header item_content_line* dashed_line_end?
, specifically theitem_content_line*
part that ends up consuming more lines than it should. Like dashed lines, for example, which are supposed to end the item.I am using the negative lookahead assertion in
ITEM_CONTENT_LINE
to stop moreitem_content_line
rules being followed. THis works.Is there another way to do it with rules or terminals? The
item_content_line
is supposed to end with either a dashed line or a header followed by a dashed line:The item consists of a header like 'WINGET' at the start of a line followed by dashed line, then the content, and then ending with a dashed line, or, if the next header comes first, that starts a new item first before the ending dashed line. Other lines, other stuff in the file can exist in between the "items" starting with the headers.
But I can't get it to work without using the negative lookahead in
ITEM_CONTENT_LINE
(at the beginning ofITEM_CONTENT_LINE
).The
item_content_line*
should end as soon as it hits a dashed line, or a header followed by a dashed line. Then it should go toother_line
,dashed_line_end
, or another newitem
.But if I replace the
ITEM_CONTENT_LINE
terminal with just the plainLINE
terminal, theitem_content_line*
starts consuming dashed lines which are supposed to end the item straight away.Like this buggy output:
If I do use the negative lookahead regex in
ITEM_CONTENT_LINE
it works with lark. But I'd prefer to do it with Lark rules or terminals.And the other_line rule can be a dashed line in the file, just not one part of an item header or item
dashed_line_end
.Also I have tried playing around with the rule priorities but I can't seem to get it perfect like the negative lookahead works.
Full test code for reference or running: