Closed xsawyerx closed 4 years ago
I thought about this a lot. Here's where I'm at:
__END__
and __DATA__
This means we have two options:
^__END__$
^__END__$
, we stop parsing entirely@gonzus I would especially appreciate your thoughts on this.
This is a difficult problem for which I have never seen a comprehensive solution. It basically boils down to parsing (or at least recognizing) more than one syntax in the same file. The same thing happens (to a more painful extreme maybe) in HTML files that embed JS and CSS, and even some other server-side language such as PHP or Perl.
Last time I looked into this in detail, the standard tools (yacc / bison and lex / flex) were starting to add basic support for this, but I stopped looking and I am sad to say I have no idea what the current level of support is for this.
The approach you propose is reasonable, but in the same way as "discarding white space while lexing" might be problematic for some tools (such as the exact location of a comment), maybe you need more smarts when discarding anything between __DATA__
and __END__
, or after __END__
altogether. So I would go with your proposal, and see how it goes.
Thank you for your thoughts. I prepared an MR that implements this approach. It seems to not mess up the location of elements in the file because it simply comments POD out.
When it comes to __DATA__
and __END__
:
__DATA__
can be opened as a bareword filehandle, but otherwise, it's not read by anything.__END__
is fully ignored by Perl. It stops reading the file entirely when it encounters it.For these reasons, I imagine they wouldn't need their own parser, just to be removed prior to parsing.
It would be great if we could detect
__END__
and__DATA__
and stop parsing then and there.