#region erases newlines

qwertie / ecsharp

Home of LoycCore, the LES language of Loyc trees, the Enhanced C# parser, the LeMP macro preprocessor, and the LLLPG parser generator.

http://ecsharp.net

Other

172 stars 25 forks source link

#region erases newlines #58

Closed j4m3z0r closed 5 years ago

j4m3z0r commented 5 years ago

I posed this question over on Stackoverflow but in the course of investigating it, I think it might be a bug. Here's a minimal test case that shows the issue:

#ecs;

namespace Test
{
    internal class Class
    {
        #region test region
        [DllImport(Constants.Lib, EntryPoint = "testfunction")]
        public static extern int TestFunction(int i);
        #endregion
    }
}

When the region markers are in place, the DllImport + method prototype are emitted like so:

        [DllImport(Constants.Lib, EntryPoint = "testfunction")] public static extern int TestFunction(int i);

(note: no newline, and #region, #endregion lines are omitted)

Whereas if you comment out the region markers, the newline is preserved:

        [DllImport(Constants.Lib, EntryPoint = "testfunction")] 
        public static extern int TestFunction(int i);

Though the emitted code still compiles and does what it's supposed to, region markers are super useful in classes large enough to be generated by a macro system. It would be great to have them emitted, and also not affect the whitespace of the enclosed code.

qwertie commented 5 years ago

I'm looking into it. To my own surprise it appears there is no support for preserving #region/#endregion in the output (like, gee, I don't remember not writing the code for that), but the deletion of newlines is certainly a bug.

j4m3z0r commented 5 years ago

Awesome. Thank you! Let me know when there's something to test and I'll give it a go.

qwertie commented 5 years ago

Fixed in v2.6.2.3.

The problem was that code in EcsPreprocessor intended to ignore one newline at the end of a preprocessor directive could actually ignore many. I decided the proper fix was to add support for preserving #region/#endregion and to preserve newlines before and after preprocessor directives. (I found it hard to decide because I thought it might be hard to "deduplicate" newlines in the printer, which for correct output must always print a newline before and after any preprocessor directive even if the syntax tree doesn't have one. However the deduplication turned out to be pretty easy.)

j4m3z0r commented 5 years ago

I think there's still some problems here. I just pulled down trunk and compiled (msbuild Loyc-Slim.sln /p:Configuration=Release) and still found some #region directives missing from the output. Here's a test-case that shows the issue:

#ecs;
internal class TestCase {
    #region constructors
    replacePP(ConstructorName => Method) {
        [DllImport(Constants.LibName)] public static extern IntPtr ConstructorName();
    }
    #endregion
}

The original version used an unroll outside the replacePP -- this was just the smallest thing I found that triggered it.

It seems like it might be triggered when I use LeMP outside of a method definition -- the invocations inside methods seem to work ok.

I'm also seeing behavior where white-space will affect the output here too -- much like #59. For instance, adding a newline after the #region line will cause the region line to be emitted, but not the #endregion line, and I can't seem to get the #endregion emitted, no matter how much whitespace I add. This also seems to be invariant to whether or not the #ecs; directive is included at the top of the file.

qwertie commented 5 years ago

The moment you said it, I realized what the problem is.

EC# handles a #region or #endregion directive the same way as a comment. Comments are associated with the node that they "seem" related to, so in this case both directives will be associated with the replacePP node. But the replacePP macro deletes itself from the output (replacing itself with its modified children) and in the process any associated "comments" are lost. If we dangle another node in front of the directives then they don't disappear:

#region constructors
#rawText("");
replacePP(ConstructorName => Method) {
    [DllImport(Constants.LibName)] public static extern IntPtr ConstructorName();
}
#rawText("");
#endregion

(#rawText(""); is a special command that emits arbitrary output to the output file, and in this case I'm emitting an empty string - nothing.)

It could be argued that this behavior makes sense for real comments but not so much for #region.

j4m3z0r commented 5 years ago

I wasn't aware of rawText -- I was looking for something similar in the docs but must have missed it.

Regarding treating region the same way as comments, I'd submit that that's probably not quite right: regions have a start and end and enclose an arbitrary sequence of intermediate elements. You'll also get a compiler error if they're mismatched (which is how I found this). They seem to me like they'd be better modeled in the same way as code between curly braces, though perhaps without increasing the indentation level in the output.

I also wanted to understand the behavior of comments: it seems like you're saying that there's two kinds of comments: those that should be emitted to the output, and those that are considered comments on the LeMP code and should be stripped for output, but it also seems like there's no explicit mechanism for expressing which is which -- do I have that right?

For my use case, comments are ultimately a nice to have, so I don't know what priority this deserves, however FWIW, it seems like it would be a nice addition to have some kind of syntax to denote comments that are intended to be written to output vs comments on the LeMP code that should be suppressed. Perhaps the right way to do it would be to assume that C# comment syntax should always be emitted, but add some other piece of syntax for EC# comments that are not intended to be written to the output. I'm unsure if this is how it currently works, but it would be awesome if comments were subject to substitutions the same as any other code when written to the output -- this would facilitate generating docstrings along with the generated code.

qwertie commented 5 years ago

Well, #rawText isn't in the LeMP docs because it is not part of LeMP, it's an intrinsic recognized by the EC# printer (not an important distinction to end users, admittedly)... I'm not sure if it's in any of the docs.

I'm not saying there are two kinds of comments. All comments given to the printer are emitted, but if a node is deleted from the syntax tree then any comments attached to it cannot be printed as they no longer exist. The replacePP macro itself doesn't appear in the output, so unless replacePP is specifically programmed to copy its comments into its output, they vanish.

I will think about changing how #region works.