So... what's the deal with D#?

jonathanvdc / Flame

A compiler framework for managed code.

https://jonathanvdc.github.io/Flame

GNU General Public License v3.0

52 stars 4 forks source link

So... what's the deal with D#? #26

Closed qwertie closed 5 years ago

qwertie commented 8 years ago

I mentioned elsewhere that I didn't think there was much to be gained from making a brand-new programming .NET language because there are already nice alternative .NET languages out there - Nemerle, boo, and so on. For that reason (and because of the lack of documentation) I have to admit that I haven't been paying much attention to D#.

Your readme says it is "roughly speaking - a dialect of C#". I'm not sure what that means. What does that mean? Do you intend backward compatibility with C#? If not, why not? What's the difference? It sure looks like C#...

jonathanvdc commented 8 years ago

Maybe that readme is a little sparse on information when it comes to D#. And you're right, there's hardly any documentation on D# itself.

Look, I initially designed D# as a "test" of Flame's capabilities. I needed a front-end to ensure that Flame can handle real programs, so I built one. Initially, I started off with a subset of C#, because that was Flame's original implementation language. That subset slowly expanded to include most C# features, with minor syntactical differences (which broke syntactic backward compatibility from the get-go) and various flavors of syntactic sugar to make my life easier, such as

public this(set int X, set int Y);

instead of C#'s

public Vector(int X, int Y)
{
    this = default(Vector);
    this.X = X;
    this.Y = Y;
}

Eventually, I decided that D# was "ready" for its trial by fire: to become the implementation language of the core Flame libraries.

Then, as I was rewriting those libraries in D#, I noticed some things in C# that bothered me. So I decided to break semantic backward compatibility, as well. Here are some of the highlights:

Static classes can't implement interfaces, which is an annoyance when implementing compiler passes: most passes fit the bill of a static class, but they must implement an interface. The typical workaround involves writing a lot of unsatisfying boilerplate code. So I figured I could make all static class entities singletons, and that's worked out well so far. As a bonus, I can now store references to singleton instances in variables. For example: var stmt = EmptyStatement;
csc and mcs always seem to get their value type total initialization analysis wrong whenever I use auto-properties. So the D# compiler avoids the issue entirely by automatically inserting a this = default(T); in value type constructors for you.
I often found myself iterating over multiple collections at the same time, so I created "multiple foreach" statements, which function more or less like a "zip" followed by a foreach. Additionally, collection elements of foreach statements on arrays are mutable, so ArraysExample.ds is a thing.
The const attribute can be applied to methods and constructors to mark them as pure. I'd like to have the compiler verify function purity in the future, but for now it's more of a hint to the optimizer.
Delegates have a well-defined type, so var f = DoSomething; (where DoSomething is a function) is perfectly legal in D#. The back-end is responsible for convertin function types such as int(int, int), to Func<int, int, int>.

Now, I don't claim that D# is the programming language of the future. It's not revolutionary: I haven't implemented any shiny new programming paradigms like EC#, Boo or Nemerle have. D# is the result of a number of incremental tweaks to C#. Frankly, it's just a useful tool that I made and still use everyday. I'm just a hobbyist who built their own programming language, and I'm really not aiming for world domination with D# here.

My main project is Flame, and I have a Flame front-end for D#. Ergo, I program in D#. But I'm not fundamentally opposed to switching to some other programming language, say EC#, as the main implementation language for Flame, as long as I can bootstrap the core Flame libraries with a Flame-based compiler for that language (such as fecs), and it gives me the tools I need, like singleton objects.

Most, if not all, of D#'s distinguishing features compared to .NET languages such as Boo and Nemerle can be attributed to its use of Flame as a back-end. These include:

Speed and program size. Flame's -O3 optimization level typically improves both execution times and the size of the output program. It constructs an SSA control-flow graph to implement various intraprocedural optimizations, and uses inlining and scalar replacement of aggregates to reduce the overhead of abstraction mechanisms. Whenever a function becomes unreachable due to inlining (or some other optimization), it is removed from the executable, thus reducing code size. (I'm actually working on implementing exception handling support for -O3 right now.)
Static linking. Flame can compile library projects down to IR, and then statically link the IR files with an executable project. The result is a single *.exe that contains the minimal set of required entities, and can be optimized interprocedurally.
Multiple target platforms. Architecturally, there's nothing tying D# to .NET. The CLR back-end may be the only reliable back-end at this time, but there are a number of experimental Flame back-ends, which could mature and become stable. I'm mostly thinking of -platform wasm.

I understand that these pros are secondary concerns: picking a productive language should be the primary concern. But they are relevant to compiler writers, and the compiler framework is my main project.

qwertie commented 8 years ago

Could you give an example of "total initialization analysis wrong" with autoproperties?

It strikes me that D# could be backward compatible with C#, or nearly so. How close is your D# (Flame) compiler to running arbitrary C# code?

I'd like to make a proposal: combine D# and EC# into one language, with EC# and LeMP as the front-end, and Flame as the backend.

Currently EC# compiles to C# - I don't know if Flame's architecture could allow that, but I can imagine a "frankenstein" might still be useful, where Flame is used in the LeMP single-file generator for the sole purpose of semantic analysis, so that semantic errors can be reported in the original source file. I wanted to use Roslyn for that purpose, so that the C# error messages would match the EC# error messages, but ... whatever. Anything that works is good.

Whether it's worthwhile to have a version of EC# that cannot compile to C# - or even a version that does compile to C# but does not preserve the high-level code structure - I'm not sure; the fact that users aren't "married" to EC# is one of its main selling points.

In the long run, though, it's attractive to be able to compile EC# to Wasm and C++. It might even be useful in the short run: I could adapt my LLLPG-based LES parser so it can be used by C++ programs, assuming the C++ back-end produces predicable and "consumable" code (i.e. code that is easily used by existing C++ programs.)

Obviously, you're not seeking world domination, but you'd like some users, wouldn't you?

Have you ever tried to do Visual Studio integration? To do a D# project type?

qwertie commented 8 years ago

Sorry, did I say "sole purpose of semantic analysis"? That was wrong. A single-file generator can't do complete semantic analysis because it doesn't have access to the whole program. But it could offer limited error detection and limited code completion / intellisense, which Flame should be able to provide.

jonathanvdc commented 8 years ago

About that total initialization analysis: I vaguely recall this not compiling under good old csc, because the compiler fails to understand that X and Y are backed by fields, and should thus count toward total initialization. Turning X and Y into fields makes the program compile, and inserting a this = default(Vector); works as well.

public struct Vector
{
    public Vector(int x, int y)
    {
        X = x;
        Y = y;
    }

    public int X { get; private set; }
    public int Y { get; private set; }
}

EDIT: this example seems to compile fine under mcs, though.

I'm open to the idea of building a front-end for a unified D#/EC#. Right now, Flame doesn't offer built-in support for async/await, and the D# front-end doesn't do type parameter inference/lamba type inference. These are hard-to-implement features, but there are no architectural roadblocks here that I can see.

I've ported my fair share of C# classes to D#. I'm my experience, porting a program from D# to C# - or vice-versa - can be done with a few minor tweaks. Perhaps a small number of EC# macros can be used to "lower" D# to EC# directly?

You can already get errors and warnings without writing the output to disk with the -fsyntax-only option. So fecs file.ecs -fsyntax-only -Weverything gets you the diagnostics you want if file.ecs is the only input file. I must admit that I haven't tried doing code completion or visual studio plugins yet - and given that I don't use visual studio anymore, my interest in the latter has waned somewhat. Would it be entirely unreasonable to just pass the other source files (they're C#, and therefore we can handle them, right?) to the compiler and then have it analyze everything?

Really, I don't know if Flame is suitable as a code analysis tool for IDEs. Firstly, I don't know about performance: Flame was designed as a compiler framework, not as a high-performance semantic analysis framework. On the other hand, it does try to cache as much information as possible, so perhaps it really is fast enough once everything has been analyzed. But my greatest concern is that Flame doesn't expect entities to go away. Frankly, I haven't the faintest idea of how to handle the scenario, where the programmer deletes a method or a type, in an elegant way. Properly writing a tool that analyzes your code as you type sound like a lot of work, and I'm mostly interested in compiling things.

So my answer boils down to this:

if you want to use Flame to compile something, then that's super easy and all you'll need is a front-end.
if you want to use Flame in a command-line tool that figures out what's wrong with your code, then that's fairly easy. A front-end and some driver program code should suffice.
if you want to use Flame as a code analysis engine in an IDE, then I all bets are off. It'd probably take a fair amount of work to get this working properly.

It's been a while since I've used the C++ back-end, so bitrot may have set in already, but I designed the C++ output to be as close to hand-written code as possible. The generated code should be fairly straightforward to use. Here be dragons, however, as you'll only have a fairly limited set of C++ standard library plugs at your disposal (fortunately, you can always just write your own), instead of the .NET framework's rich set of assemblies.

jonathanvdc commented 8 years ago

I'm sorry if my previous response scared you off. I really didn't mean to. Re-reading it now, it dawns upon me that my response was actually fairly ambiguous. I'll try to succinctly re-phrase it point-by-point here:

I'd like to unify D# and EC#.
Creating a Flame front-end for a unified D#/EC# should be easy for the feature set that is already supported by the D# front-end, and doable for "fancy" C# features such as async/await, lambda type inference, and generic parameter inference.
Flame decouples front-ends from back-ends, so a D#/EC# compiler will be able to produce CLR assemblies, WebAssembly and C++ output.
Flame-based compilers can be told to generate a bunch of useful errors and warnings, which can easily be intercepted. Displaying these diagnostics in a GUI is just as feasible as printing them to the command-line.
Building a linter/static analysis tool - like Clang's static analyzer - for D#/EC# should be fairly easy. Flame's IR can be used to analyze the program and look for bugs, like null pointer dereferences. Perhaps these analyses can be simplified significantly by converting the entire program to SSA form, which Flame can do out-of-the-box. Additionally, the IR retains enough information about the source code to accurately highlight the location of any potential bugs that the static analysis tool finds.
I've never really done IDE integration. I don't know what the performance requirements are, and I don't know what information the IDE feeds to the language plug-in. Right now, I use Atom to write D# code, and rely on dsc for error and warning messages. I'd welcome IDE integration for D#/EC# and wouldn't mind adding features to Flame in order to make that possible, but I don't use Visual Studio myself (because my main OS is not Windows), so I probably won't personally be developing a Visual Studio plug-in any time soon.

Does that work for you?

qwertie commented 8 years ago

Well, let me respond to your first message first. To me, a basic IDE experience is critical. The two most annoying parts of using EC# right now is the lack of code completion (symbol lookup isn't too bad, I can press Ctrl+Comma to look up the symbol under the cursor, although it will take me to the plain C# version), and the fact that errors are only shown in the C# output (several times I've forgotten I was editing the output rather than the input, and then of course whatever I fixed breaks again next time I save the EC# file). I'm not quite sure what to do about those problems - it's a design flaw that VS allows users to edit the output file (with no warning) and then overwrite it accidentally. And even if Flame was suitable as a code completion engine, I would have to rewrite the Visual Studio extension - changing it from a COM component into a MEF component - in order to be able to add code completion features and red squiggly underlines. A royal pain in the butt, but ... worthwhile, if Flame were ready.

But then I realized, if you haven't implemented lambdas yet, or async-await, then certainly it wouldn't support dynamic either. Pretty much all real-life software uses lambdas, most software relies on generic type inference, and I'd guess that at least half of software dynamic or async... and lots of people use LINQ. So we're quite far from having something that real devs would consider using. Certainly I wouldn't consider trying to "sell" a standalone EC# compiler until it can at least compile itself (Loyc..dll + .exe). Clearly, right now, the easiest path to doing that is to use Roslyn as the back-end. Roslyn is "bulletproof", and also well-known, so to announce that EC# is built on Roslyn might garner far more interest.

Of course, eventually I want to do the "Loyc" thing and have multiple backends like C++ and Wasm, but maybe it's not the thing I should do right now.

I think many people that use C# use it for its excellent IDE features. That's true for me - if I didn't care about the IDE experience, possibly that would have tipped the scales three years ago, when I decided to continue working on Enhanced C# instead of joining the D camp. Remember that language we were talking about doing? The idea of doing a good IDE plus a "Learnable Programming" debugger for it makes me salivate.

qwertie commented 8 years ago

Let's see, what do you need for an IDE? I haven't studied how Roslyn works, but here are my thoughts.

It isn't really required to "delete" members in the IDE. You don't need to respond instantaneously to changes to the program, so I think I'd do this:
1. Pre-parse all source files, run LeMP on them, and run any initial information-gathering that does not rely on other source files. This step is "embarrassingly" parallel and easily done on multiple cores.
2. Finish type resolution for the whole assembly (figure out what types are referred to by method signatures, fields, etc.) for the whole program. This is probably parallelizable too.
3. In the current source file only, gather information about local variables. Once this is done, you're ready to answer code completion queries.
4. When the user makes any changes to a file, start a two-second timer. Each time a key is pressed, reset the timer to 2 seconds. When it expires, reprocess the current file (reparse, run LeMP, etc.) and if the user hasn't modified the file in the meantime, discard the resolved type information of the entire assembly and redo type resolution from scratch. It's somewhat expensive, but it's only a fraction of the entire compilation process. NOTE: you don't really have to discard the old information until the new information is ready. That way, code completion will keep working during the rebuild process. A simple way to save CPU time is to detect when the signatures in the current file have not changed - i.e. when the user is changing a method body and not any method signatures, you can skip the assembly-wide process and only reprocess detailed info about local variables in the current file.
5. If there are multiple projects that depend on each other, they should keep largely independent state. A sequence of two-second timers could be used to allow new information to cascade down through all the dependent projects, without stressing the CPU too badly.
6. In the presence of LeMP, the source file doesn't quite correspond to the file you actually have to analyze - Flame would be analyzing an "expanded" version of the file. Actually I think a "fixup" step may be needed, because LeMP macros will generally produce a mixture of synthetic nodes (with no source location) and real nodes (with source locations). The fixup step would scan the syntax tree looking for synthetic nodes, and "guess" a source code location for each one based on source code locations associated with parents and children of the synthetic node. But maybe this process could be deferred until an error occurs.
The IDE responds to things like Foo.Bar(x, y). with a pop-up list. To implement this, Flame would need to be able to answer three questions: (1) Where are we in the source code? (i.e. give me an object that we can use to make Code Completion queries); (2) What is the type of the expression Foo.Bar(x, y) at this location; and (3) given this type, what should the code completion list look like?
Similarly, when typing Foo.Bar( the questions are "what signature(s) are associated with Foo.Bar at this location in the source code?"
The next most common request is "go to the definition of Bar in Foo.Bar". For code completion you only need to know what members exist; for "go to definition" you also need to know where all the definitions are.
Member search (Ctrl+Comma) is an easier version of the above, in which the current context doesn't matter.
"Find all references" and "find callers of this method" requires all the method bodies and their full type resolution - or at least you need to find all method bodies that might contain the relevant symbol. That's okay, speed is not as important for this feature as for the "." popup list.
The most common "refactor" (if you can even call it that) is "insert using statement". That one is pretty easy so I don't think I need to say more.
The most common and important real refactor is a rename. With LeMP this is a bit tricky, since it's necessary to change the original source file, not the postprocessed form. The postprocessed form will usually contain the correct source locations to change, though. So you can change those original locations, then run LeMP again and compare the output to the expected output. If they don't match, we could show a "diff" window that shows how the rename went wrong and asks the user whether to keep the new version or abort the process.
The second most important refactor is "generate method" and "generate constructor" (or "generate type" if the specified type does not exist.) Sounds pretty straightforward except for the need to modify the original file rather than the postprocessed form. So again, I think we can take a "guess and check" approach, asking the user to confirm/reject the operation if it didn't produce the desired result.
Differences between full compilation and partial compilation for IDE support:
- No codegen is needed.
- Error tolerance is important. No source code error can be allowed to prevent code analysis from running to completion, and errors in one file must not cause any trouble in an unrelated file.
- Duplicate members (including duplicate methods, fields and entire classes) should be considered perfectly acceptable; "go to definition" should find all duplicates.
- Except when doing a "find all references", you could discard the method bodies of all methods outside the current source file to save memory. Or cache X files. It's not worth managing a cache in the initial implementation, though. Note that LNodes are designed to use memory efficiently.

qwertie commented 8 years ago

So, how much does working on an IDE interest you? I can think of three IDEs that run on both Windows and Linux: Geany (written in C), Eclipse (Java) and Xamarin Studio (C#). Since it's C#, Xamarin Studio seems like the obvious thing to add code completion to - I don't know if it's designed to support "third-party" languages but ... well, it probably is.

jonathanvdc commented 8 years ago

Actually, Flame does support lambdas. It just doesn't do lambda type inference, because that's the front-end's responsibility. The details of the type inference algorithm are specific to the source language, so the middle-end can't - and shouldn't - do that.

I chose not to implement type-inferred lambdas in the D# front-end because they are an ugly exception to the way expressions are typed in (most) imperative languages, and I at the time decided that doing so anyway would be more trouble than it's worth, at least for my own use-cases. Conversely, the micron front-end infers all types, so lambdas (i.e. local let-bindings that take at least one parameter) are always type-inferred there.

My point is, though, that there's absolutely nothing stopping us from doing just that, and I do plan on implementing lambda type inference/generic parameter inference in a unified D#/EC# front-end. Likewise, I don't think implementing async/await will be fun, but roslyn is open-source, so we can always just look at how they do things. So I'm optimistic that we'll attain feature parity with roslyn sooner rather than later.

Besides, once I get started on building a D#/EC# front-end, then I at least want to be able to compile both Flame and Loyc, and in doing so move from a partially bootstrapping compiler to a fully bootstrapping compiler.

Which brings us to IDE support. I currently use MonoDevelop as my go-to C# IDE, which is basically the same as Xamarin Studio. I think. I once tried to create a D# project plug-in for MonoDevelop, but documentation was somewhat sparse, and I encountered mad girlfriend bugs: the plug-in obviously didn't do what it was supposed to do, but the IDE said that everything was fine. I lacked the patience to debug the whole thing, and eventually settled on my current work-flow.

Now, most of the features you've described really don't sound like things that will be handled by Flame per se. After all, Flame provides a common middle-end and ships with a number of back-ends, but neither of those are super important features for IDE support. Flame's IR may prove useful when looking things up in method bodies, but most IDE-related functionality is source language, and therefore front-end, specific.

That being said, supporting the scenarios you described would shape the front-end in certain way, which I am more than willing to do. I also wouldn't mind implementing certain IDE-specific features such as figuring out what the type of the expression at the cursor's location is, or retrieving a type's location in the source code. On the other hand, I'm not sure if I'm up for coding UI-related things from scratch. That has always proven to be a debugging nightmare in my experience.

For something completely different, I was pondering what the implementation details of a unified D#/EC# front-end would be. At first, I considered updating an existing front-end (either the dsc or fecs front-end), but in my opinion there are some valid technical objections against doing that:

The D# front-end is kind of buggy and doesn't use a number of Flame features.
Flame's current EC# front-end is an F# project, which precludes bootstrapping, and, from my understanding, F# is not your favorite programming language.

So I was thinking that maybe we should just create a new front-end from scratch in C#, and then rely on EC#'s backward compatibility with C# to compile said new front-end, once it matures. Thoughts?

qwertie commented 8 years ago

Yes. Maybe I just have a learning disability or what, but I found F# to have poor usability - unintuitive syntax plus unintuitive error messages. Anyway we should have a "dogfood" compiler (that "eats" itself) written in EC#. Would you agree with me that EC# should be the official name, because in the long run Google will find it more easily? 'EC#' isn't ambiguous (it's not used by anything else) and insofar as Google ignores the '#', EC and "Enhanced C" are both more unique than "D" which Google matches with words like "I'd".

For getting actual users I think we'll need VS integration, and for Linux/OSX we need Monodevelop. So, how about I be in charge of VS integration and you'll be in charge of Monodevelop integration?

I can agree in principle about most of the features you've added to D# compared to C#.

I am most skeptical about the auto-properties thing - C# definite assignment analysis doesn't fail in trivial cases so you should figure out exactly what problem you were having.
I agree with automatically choosing a Func<> type in var declarations.
Marking functions as pure shouldn't be officially supported until the feature is properly done and carefully thought out... but EC# supports arbitrary non-keyword attributes so you can use pure rather than const.
Static classes implementing interfaces sounds useful; and more generally I think any static member of a class should be eligible to be part of the implementation of an interface for that class. Could you give more details on the exact semantics you want / have implemented?
A foreach with a mutable variable is useful but potentially breaks backward compatibility - perhaps we should define a new for loop instead; this is what I have in mind:
```
for ($x in list) {
  Console.WriteLine($"list[$(x#)] = $x");
}
```
The $ would be for consistency with pattern matching (match) which uses $ already; it has the advantage of highlighting places where variables are created in a more lightweight manner than var. Generally if you write for ($name in list) you'd also be defining name# which holds the current list index, as well as name which is an actual variable that caches list[name#], and if you write name = value it would be implemented as name = list[name#] = value. I don't think "zipping" is worthy of a whole language feature; instead we could support tuple deconstruction so you can write:
```
for (($x, $y) in list1.Zip(list2)) {
  Console.WriteLine($"list1[$(##)] = $x and list2[$(##)] = $y");
}
```
and also general pattern deconstruction:
```
List<Point> points = ...;
for ((X: $x, Y: $y) in points) {
  Console.WriteLine($"points[$(##)] = ($x, $y)");
}
```
It would be harder to support mutable loop variables in this case, but possible. Zip is already defined in Loyc.Essentials, but it could be enhanced to return a mutable list struct in case the two inputs both implement IList<T>. In this case the list item itself could be called #--this is consistent with how three different LeMP macros already work; # generally means "the current thing". Logically, then, ## would be the index of the current list item.

for could revoke mutability in case it is used on a type that has GetEnumerator and no mutable indexer, so that foreach is never needed, and would exist mainly for backward compatibility.

Oh, I must ask - if you start changing D# into EC#, can you architect it to lower EC# to plain C# with perfect preservation of semantics? This is a key feature that sets EC# apart from competing languages like Nemerle and F#. (Addendum) The implication here, I think, is that we need to keep the method bodies in the form of Loyc trees for a long time, and keep the "type tree" (types, namespaces and method signatures) linked back to the Loyc tree, so that it will be straightforward to output plain C#.

qwertie commented 8 years ago

By the way, have you tried Enhanced C#'s matchCode and quote? It's fantastic for manipulating Loyc trees. For me they have reduced the cognitive burden of writing macros substantially. I haven't studied your Flame IR but you could consider making a matchFlame macro modeled after matchCode for manipulating your own IR, and a quoteFlame for generating your IR.

qwertie commented 8 years ago

Addendum - I didn't fully digest everything you said in your previous message, so let me add some comments in response.

So I was thinking that maybe we should just create a new front-end from scratch in C#, and then rely on EC#'s backward compatibility with C# to compile said new front-end, once it matures. Thoughts?

Yes, I agree. At the same time as we write a new front-end, we could start laying some of the foundation for a multi-language front-end - by avoiding any C#-specific features in the low levels, and by implementing a "parameterized space tree" - the "file system" concept I was telling you about earlier.

most IDE-related functionality is source language, and therefore front-end, specific.

That's not necessarily true. There is already an engine called ctags designed to support certain IDE features for many languages (sadly I've never had occasion to use ctags). IDE functionality may not be part of what you think of as Flame, but certainly it can (and should) be generalized across languages. Perhaps a small language-specific module is needed to understand incomplete code like Foo (x, y => y. - but that's potentially a very small amount of code, especially if the front-end parser is designed to parse incomplete statements intelligently, and if multiple front ends implement a common interface for IDE features.

coding UI-related things from scratch

We'll be modifying existing IDEs to avoid doing anything from scratch. IDEs have built-in UIs for code completion; the challenge is just figuring out how to invoke them and to install various event handlers in the editor. For syntax highlighting, for VS that's already done, and for Monodevelop you can probably just tell it to use C# highlighting.

I'm sorry your effort to make a D# plug-in didn't work out, but could we try a different tactic? To begin with, EC# is currently used in Visual Studio as a single-file generator. So, could you investigate adding a LeMP SFG to MonoDevelop? Xamarin Studio already supports T4 templates (TextTemplatingFileGenerator) in a VS-compatible way, and Google found this file which might be the implementation of that. Perhaps you can copy & modify this code to make one for LeMP and LLLPG. This would let VS solutions that use LeMP work seamlessly in MonoDevelop/XS ... except Loyc.sln, apparently, which (in XS 5.9.4) just seems to build the solution forever without producing any errors (Build|Stop is greyed out) and for some reason LeMP.StdMacros is labeled "Invalid Configuration Mapping". Bleh.

jonathanvdc commented 8 years ago

Would you agree with me that EC# should be the official name, because in the long run Google will find it more easily?

Sure. Though I may use 'D#/EC#' in the future to differentiate fecs from the new EC# front-end.

I am most skeptical about the auto-properties thing - C# definite assignment analysis doesn't fail in trivial cases so you should figure out exactly what problem you were having.

Allow me to move the goalpost here just a little bit and manually desugar those auto-properties. This doesn't compile under mcs:

public struct Vector2
{
    public Vector2(double X, double Y)
    {
        this.X = X;
        this.Y = Y;
    }

    private double x;
    private double y;

    public double X { get { return x; } set { x = value; } }
    public double Y { get { return y; } set { y = value; } }
}

Besides, why is total initialization of value types mandatory while total initialization of reference types is optional? Having the compiler insert initialization code makes replacing class by struct a painless transition.

Marking functions as pure shouldn't be officially supported until the feature is properly done and carefully thought out... but EC# supports arbitrary non-keyword attributes so you can use pure rather than const.

Agreed. D# also supports a special syntax for attributes that are compiler intrinsics, which are understood by the middle-end linker and the back-ends. Any chance of getting support for that in EC#? Here's an example of a function that is implemented as a WebAssembly import. (ignore the module thing for now, more on that later)

public module spectest
{
    /// <summary>
    /// Prints an integer to standard output.
    /// </summary>
    [[import]]
    public void print(int Value);
}

Static classes implementing interfaces sounds useful; and more generally I think any static member of a class should be eligible to be part of the implementation of an interface for that class. Could you give more details on the exact semantics you want / have implemented?

Right now, the following:

public static class Foo : IFoo
{
    public static int Bar()
    {
        return 4;
    }
}

desugars to:

public class Foo : IFoo
{
    private Foo() { }

    // Actual, real-deal static member
    public static Foo Instance
    {
        get
        {
            // Not thread-safe, I know. This could easily be done in 
            // a static constructor, and I plan on re-implementing it
            // like that in the EC# front-end.
            if (instance_value == null) instance_value = new Foo();
            return instance_value;
        }
    }

    public int Bar()
    {
        return 4;
    }
}

Member static methods/properties of instance types are implemented as members of a nested Static_Singleton singleton class. I initially planned on implementing 'static inheritance', where public class BigNum : IComparable<BigNum>, static IComparer<BigNum> would be legal, but I've kind of changed my mind on that lately, because I'm not sure if it adds much value compared to implementing IComparer<BigNum> in a separate singleton class.

Ironically, I eventually ended up re-implementing the old C# static class behavior, as module, because I needed actual static classes for (1) extension methods and (2) certain back-ends which don't have reference types yet (like the WebAssembly back-end). So maybe we should really just keep the old static class semantics, and just use something like public class object Foo to create singletons.

I'm also not sure if a lexical macro can reliably lower singleton entities to regular classes. We can't just do a find-and-replace, because method and variable names take precedence over class names in.

I don't think "zipping" is worthy of a whole language feature; instead we could support tuple deconstruction

I actually find multiple foreach to be an elegant and useful construct most of the time. It gets rid of the cruft that is indexing when applying some kind of operation to an entire array. I also really don't see why it should be replaced by an explicit 'zip' call followed by tuple deconstruction, because:

Multiple foreach is implemented efficiently. Generating equivalent code for an explicit 'zip' followed by tuple deconstruction requires zealous function inlining and scalar replacement of aggregates. The only upside to this is that it makes Flame's -O3 look good on benchmarks.
Multiple foreach was easy to implement, as is really just a generalization of regular foreach.
I don't think multiple foreach interferes with existing syntax.
It reduces the cognitive burden that is associated with looping over multiple arrays, both for the code's author, and for any readers.

Here's an example of a multiple foreach in Flame:

public static IExpression[] VisitAll(INodeVisitor Visitor, IExpression[] Values)
{
    var results = new IExpression[Values.Length];
    foreach (var output in results, var input in Values)
    {
        output = Visitor.Visit(input);
    }
    return results;
}

Can you provide an example where mutable loop variables are backwards incompatible? The usual suspect - assigning a value to an iteration variable - makes mcs report a compiler error, so that won't get us in trouble:

test.cs(10,13): error CS1656: Cannot assign to `item' because it is a `foreach iteration variable'

you could consider making a matchFlame macro modeled after matchCode for manipulating your own IR, and a quoteFlame for generating your IR.

Mmmhh. quoteFlame sounds a whole lot like an embedded __asm statement, which shouldn't be too hard to implement, and may also be quite useful. I'm not sure if matchFlame is useful for macros, because the order of optimizations is important, and macros get evaluated before the pass pipeline is invoked. It certainly is worth looking into, though. A macro that generates Flame API code which builds an IR tree at run-time for a given snippet of code may also be useful.

There is already an engine called ctags designed to support certain IDE features for many languages

Sure, but that all depends on which features you want. As far as I can tell, ctags implements one thing: "go to definition". That's a tremendously helpful feature that's more or less language independent, but more advanced features, such as refactors and accurate code completion, simply can't be done by the middle-end. Furthermore, it's entirely possible to have Flame search a parsed (and analyzed) project and map type/method/field/property names to their point of definition, because that information is - as you have suggested - clearly language-independent.

and if multiple front ends implement a common interface for IDE features.

A common interface for IDE features would certainly be useful, and could also decouple source language-specific logic from the IDE itself. I am in favor of that.

So, could you investigate adding a LeMP SFG to MonoDevelop?

All right, but I want to get the EC# front-end working first.

jonathanvdc commented 8 years ago

I think I'll create a separate repository for the (new) EC# Flame front-end. What do you want to call it? We could just re-use the 'fecs' name, which has already been taken, but is conveniently pronounceable, or we could go with 'ecsc'.

qwertie commented 8 years ago

Yes, I knew that normal properties couldn't be initialized that way, only autoproperties (the latter is supported because there's no way to refer to the backing field.)

why is total initialization of value types mandatory while total initialization of reference types is optional

From a theoretical perspective it may not make sense, but from an implementation perspective it does, because value types (which often exist on the stack) actually need to have every member assigned, but reference types do not because the heap is zeroed before it is allocated to specific objects.

I originally planned to support this exact syntax as a way to invoke lexical macros:

[[import]]
public void print(int Value);

But I punted on it, realizing I could use normal attributes by writing a macro for #fn rather than the attribute itself. This, however, is not a good long-term solution... (edited) it doesn't scale well, because each new macro separately scans for attributes, and more importantly, it's hard to support different macros modifying the same construct (if two macros change a method in response to two different attributes, LeMP has to report an ambiguity error, as it doesn't know how to combine the changes). I don't know what the right solution is.

C# already supports several back-end-specific attributes, such as [Conditional] and [Serializable]\ (the back end has to specially recognize it, since it's converted from an attribute into a one-bit flag in the assembly). Why don't you want to continue this pattern for other "intrinsics"?

See #27 re: static interfaces.

I'm not sure if matchFlame is useful for macros, because the order of optimizations is important, and macros get evaluated before the pass pipeline is invoked

I don't think you understand - matchFlame would not be used for macros, rather it would be a macro that is used for pattern-matching your IR inside your backends. You could just use match for that purpose, but a special-purpose macro might work better. Likewise quoteFlame would not be like an __asm statement, more a way to quote an __asm statement. Both of these would require you to define a compact DSL to represent your IR. Anyway, I don't really know what I'm talking about since I haven't worked with Flame.

I thought ctags also provided (inaccurate) code completion but I never used it, so, not sure. I'm glad we're in agreement about trying to do IDE features in a way that is as language-independent as possible.

I want to get the EC# front-end working first.

I'm confused. In my mind, LeMP is literally an EC# front-end that works already, and a single-file generator already exists, so it should be easy to port to MonoDevelop. But maybe in your mind, D# is the closest thing to an EC# front end. Which reminds me, you didn't address my earlier question:

if you start changing D# into EC#, can you architect it to lower EC# to plain C# with perfect preservation of semantics?

qwertie commented 8 years ago

Can you provide an example where mutable loop variables are backwards incompatible?

Yes. In order to support a mutable loop variable in foreach, you wouldn't be able to use IEnumerator anymore. I was assuming that instead you would use the indexer (list[index]) and a potentially hidden index variable. There is an alternative - you could implement mutation by setting the Current property of the enumerator - but most enumerators don't support that. It doesn't feel right to use MoveNext/Current and then suddenly switch to a completely different approach if the user mutates the loop variable. I think that would surprise users. So, assuming your foreach always uses an indexer and Count (where available), there is a theoretical possibility that it is not backward compatible, since Count and the indexer are not guaranteed to behave the same way as the enumerator. Also, performance will change; the indexer may be faster or slower than the enumerator depending on the circumstances. In case of AList, the enumerator is faster in theory (O(1), whereas the indexer is O(log N)) although I don't know about practice.

I want EC# to be strict about backward compatibility - if a given C# program compiles under EC#, it should behave identically, so to me it seems better if foreach doesn't support mutability at all, or if it supports mutability in a conservative way, by setting the Current property. That's why I propose a new for loop for the new functionality.

I'm kind of on the fence about multi-foreach since it's neither something I would use a lot, nor is it needed (given Zip) but on the other hand, it's not hard to implement. By nature, EC# can't be one of those "small core" languages where the functionality is in libraries (including macro libraries). Still, I do like to be careful about adding features to the core language. So I feel a tension. But ... I guess I can ... accept ... having it in the language.

jonathanvdc commented 8 years ago

From a theoretical perspective it may not make sense, but from an implementation perspective it does, because value types (which often exist on the stack) actually need to have every member assigned, but reference types do not because the heap is zeroed before it is allocated to specific objects.

Actually, if I'm not mistaken, stack objects are zeroed out in verifiable code, which is what C# compilers produce. That's exactly what init means in IL declaration .locals init (int32 V_0). Total initialization is required because C# compilers routinely optimize x = new X(...); to X..ctor(ref x, ...);. This optimization is fairly fragile, by the way. For example, I can trick mcs into doing the following:

using System;

public struct Vector2
{
    public Vector2(ref Vector2 Other)
    {
        this = default(Vector2);
        this.X = Other.X;
        this.Y = Other.Y;
    }

    public double X;
    public double Y;
}

public static class Program
{
    public static void Main()
    {
        Vector2 vec;
        vec.X = 4;
        vec.Y = 5;
        // Insert unsafe optimization here, by emitting
        // a direct `call` to `Vector2::.ctor` instead of
        // a `newobj` for `Vector2::.ctor`. 
        vec = new Vector2(ref vec);
        // Actually manages to print "0\n0\n". ಠ_ಠ
        Console.WriteLine(vec.X);
        Console.WriteLine(vec.Y);
    }
}

C# already supports several back-end-specific attributes, such as [Conditional] and [Serializable]\ (the back end has to specially recognize it, since it's converted from an attribute into a one-bit flag in the assembly). Why don't you want to continue this pattern for other "intrinsics"?

That's a really good point. I actually did that in the past (and still do for some intrinsics). At some point, however, I realized that this whole approach was actually a huge hack. These downsides quickly started to outweigh whatever upsides that remained:

The attribute class has to be defined in an external library, for it to be usable by both the compiler and the program that is being compiled. The C# compiler circumvents that problem by putting the attribute class in the .NET framework class library, but we don't have that luxury. Also, developers shouldn't have to juggle libraries for things that the compiler understands completely.
It's really hard to tell what is an intrinsic attribute and what isn't. Comparing the attribute class' name to a known string is not very reliable, because any EC# programmer can unwittingly define their own hypothetical CompilerRuntime.ImportAttribute, which will then be interpreted as a compiler intrinsic, even though that was not the programmer's intention.
Encoding compiler intrinsics as attributes ties the compiler's version to the runtime library's version. Changes to the runtime library will require changes to the compiler. That's a versioning disaster waiting to happen.

Yes. In order to support a mutable loop variable in foreach, you wouldn't be able to use IEnumerator anymore.

The D# compiler is pretty conservative here: loop variables are only mutable when looping over an array. C# compilers already optimize array foreach by lowering them to for loops. The D# compiler simply takes that one step further and lets the programmer modify the loop variable derived from that array. This is a bit of a special case, I know, but initializing and modifying arrays is the main use case for mutable loop variables.

Likewise quoteFlame would not be like an __asm statement, more a way to quote an __asm statement. Both of these would require you to define a compact DSL to represent your IR.

Oh, I see. Yeah, that could definitely be useful, especially for the "lowering" passes.

But maybe in your mind, D# is the closest thing to an EC# front end.

What I meant is that I want to create a Flame front-end for EC# first. That'd enable the IDE plug-in to perform semantic analysis on EC# code.

Which reminds me, you didn't address my earlier question:

if you start changing D# into EC#, can you architect it to lower EC# to plain C# with perfect preservation of semantics?

I don't know if I can. But I'll try, that's for sure.

Update: I created an ecsc (enhanced C# compiler) repository for a Flame EC# front-end.

qwertie commented 8 years ago

Actually, if I'm not mistaken, stack objects are zeroed out in verifiable code IIRC, init is a flag that all locals are to be zeroed, and I was surprised to learn (years ago) that the verifier requires this rather than doing a static assignment analysis. Then the JIT tries to detect and eliminate the double-initializations. So yeah, you're right - on the matter of struct initialization, C# seems to follow the way the CLR might have worked rather than the way it does work.

Perhaps another reason why DAA is done for structs and not classes is that structs (unlike classes) tend to be small, so it was felt more reasonable to require all members to be explicitly initialized.

Wow, your code causes the same behavior in csc (0 0) even in a Debug build. I checked the IL - right at the start the constructor has

ldarg.0 
initobj .../Vector2

Maybe this is required by the verifier too, but it makes C#'s DAA seem positively pointless.

Even so, I have to say, I don't think this change to the C# language is worth making. Have you heard of the "point" system the C# team uses (or used to use?) for adding features? I'd like to treat EC# as having a similar point system, except with a lower threshold for additions, and with the threshold modulated by implementation difficulty - e.g. supporting underscore literals like 1_000_000 is incredibly easy and so has a very low threshold. I suppose if the C# team had modulated their thresholds by difficulty/complexity of the feature, C# 2.0 would have already had underscored and binary literals...

Re: attributes for intrinsics, the C# team mentioned "Compile-time only attributes" in one of their Design Notes.

The attribute class has to be defined in an external library, for it to be usable by both the compiler and the program that is being compiled.

That's not actually true, e.g. you can define class ExtensionAttribute in your own .NET 2 assembly in order to use extension methods. It's potentially hazardous though. My memory sucks, but I think I might have once had a compatibility problem where my .NET 3.5 assembly had a problem consuming a .NET 2 assembly.

It's really hard to tell what is an intrinsic attribute and what isn't.

True - to address this. I've been using the convention of using a lowercase first letter both for macros themselves, and for attributes that macros recognize.

Encoding compiler intrinsics as attributes ties the compiler's version to the runtime library's version.

At this point I want to point out that using attribute syntax doesn't necessarily mean that the attribute has to exist at runtime, even though it has been done that way in the past. None of the LeMP-specific attributes exist at runtime.

loop variables are only mutable when looping over an array

Ouch - I don't use arrays very much, so I don't think the feature should be constrained in that way.

Um, about repos, I was thinking of (A) combining Flame and the Loyc libraries in one repo, with Flame being a subtree (in the same way that LoycCore is a subtree of this repo), and (B) defining a 'ecsharp' or 'Loyc' "organization" on github and putting the official version of our code there. Anyway, there's no hurry, I'll be busy for awhile modifying the parser and finishing basic support for extracting 'sequence expressions' out of if statements, loops, etc.

I don't know if I asked this before but why is it called 'Flame'?

jonathanvdc commented 8 years ago

I'd like to treat EC# as having a similar point system, except with a lower threshold for additions, and with the threshold modulated by implementation difficulty

I'd argue that automatic initialization of structs is very easy to implement: simply insert a this = default(T);. In fact, that's a lot easier to implement than forcing manual total initialization on the programmer, because said rule in turn requires flow analysis to ensure that the struct initialization paradigm is respected.

I've been using the convention of using a lowercase first letter both for macros themselves, and for attributes that macros recognize.

Do you think that simply using [#import] is acceptable? That way, there's no ambiguity, and no separate attributes. I'm okay with this:

public static class spectest
{
    /// <summary>
    /// Prints an integer to standard output.
    /// </summary>
    [#import]
    public void print(int Value);
}

I was thinking of (A) combining Flame and the Loyc libraries in one repo, with Flame being a subtree

Do you mean, like, in a separate repository? If so, then the ecsc repository might be a good candidate, because it uses both Loyc and Flame libraries.

and (B) defining a 'ecsharp' or 'Loyc' "organization" on github and putting the official version of our code there.

I wouldn't mind transferring ecsc to a Loyc/ecsharp "organization." Not so sure about the Flame repository itself, though, because it's more of a compiler construction kit, not just the back-end for the EC# compiler (I'm not modifying Flame in any way to compile EC#). Moving it to an ecsharp organization might send the wrong message. It's also kind of my baby, and I'm not sure if I'm entirely not sure if I'm ready to let go of it yet.

I don't know if I asked this before but why is it called 'Flame'?

Good question. I had to call it something, so I figured 'Flame' would be a cool name. It's no acronym; I consider those to be fairly fickle, especially when used in compilers. Just look at GCC (GNU Compiler Collection now, originally GNU C Compiler) and LLVM (formerly known as the Low Level Virtual Machine). Plus, 'Flame' also keeps the all caps shouting to a minimum.

qwertie commented 8 years ago

Yes, it's easy to implement. However, I think many programmers would actively oppose this change and among those that aren't opposed, it's a minor cognitive burden having to keep track of this difference between C# and EC#. That gives it, in a sense, negative points that offset the positive points from making this the default behavior. Now how many positive points does it earn? Well, most people don't write structs very often, and among those that do, this is a change that is only beneficial when the constructor intentionally didn't initialize all members in a way that the DAA understands. So the benefit I see here is tiny - so small that the overall value is zero or negative.

Yes, putting a # at the front of an attribute makes it very clear that it isn't a runtime attribute, so that's good. I have been a bit indecisive about where to use # - it's a small burden to type it and a noisy character visually, so I don't think all macros should use it. For example if someone uses contract attributes a lot, the many #s might look cluttered...

[#ensures(# >= 0)] double Sqrt([#requires(# >= 0)] double x) => ...;

The same could potentially be argued of #import. Conventionally all attributes are capitalized so I think lowercase is already a strong enough hint.

jonathanvdc commented 8 years ago

I think many programmers would actively oppose this change and among those that aren't opposed, it's a minor cognitive burden having to keep track of this difference between C# and EC#.

I've never seen any real justification for the current struct initialization paradigm. I think it's ugly because it's based on a few special cases - any initialization logic that depends on a separate method to initialize some part of a struct, is simply out of luck -, and I don't really see why anyone would oppose fixing that in a perfectly backward-compatible manner. Any additional cognitive burden would be caused by EC# making our lives easier in a way that C# does not. And isn't that kind of the point of EC#?

I have been a bit indecisive about where to use # - it's a small burden to type it and a noisy character visually, so I don't think all macros should use it.

Right, but I don't expect #import to suddenly pop up everywhere. Like P/Invoke, it's a feature for library writers, so it doesn't have to be pretty. Anyway, this is a moot point, since I realized that I can just re-use the extern keyword for this purpose. I might use # as a prefix for some esoteric compiler intrinsic attributes in the future, though.

On a different note, I've gotten to the point where I'd like to implement binary operator resolution in ecsc, and the C# (binary) operator resolution algorithm that Roslyn uses turns out to be ridiculously complicated, as evidenced by BinaryOperatorOverloadResolution.cs. Would you mind if I copied and adapted that file for use in ecsc? Roslyn is Apache licensed, which should be compatible with ecsc's MIT license.

qwertie commented 8 years ago

And isn't that kind of the point of EC#?

Yes. And if DAA errors were something I encountered on a daily or weekly basis, I would agree with you... Okay, maybe we could compromise? Like for every difference between C# and EC#, we print a warning that is on by default and you'd have to turn it off somehow. So you'd do the DAA and detect that a variable is unassigned, but print C# requires fieldFooto be fully assigned before control is returned to the caller as a warning that can then be suppressed.

Sure, copy whatever code you want from Roslyn - and I applaud your initiative. By the way, my understanding is that I don't need to change the license of ecsharp to incorporate code licensed MIT, Apache or whatever... but traditionally Loyc has been LGPL licensed. This hasn't bought me anything so far and did attract the wrath of an ignoramus so I wonder if I should change to some other license. Is there a particular reason you picked MIT?

append: yeah, extern makes sense.

jonathanvdc commented 8 years ago

Like for every difference between C# and EC#, we print a warning that is on by default and you'd have to turn it off somehow. So you'd do the DAA and detect that a variable is unassigned, but print C# requires fieldFooto be fully assigned before control is returned to the caller as a warning that can then be suppressed.

That works for me. Perhaps we can group warnings for EC# extensions as -pedantic, so they can be enabled or disabled all at once.

Is there a particular reason you picked MIT?

I didn't want to discourage people who were thinking about using the Flame libraries because of a licensing issue. The GPL, and, to a lesser degree, the LGPL, can complicate things for developers who decide to license their work under the MIT license. That's a significant group of users that I'd rather not alienate. I also don't buy the "evil corporations will steal your code" argument - for example, LLVM is licensed under a permissive open source license, and lots of corporations contribute code to LLVM even though they technically don't have to - so I don't see what the advantages there are to the (L)GPL.

But really, picking a license is a personal choice - it's your copyright, after all. And licensing is also kind of a practical detail; if people are truly invested in Loyc, then they can just ask you to give them a different license.

qwertie commented 8 years ago

Just to let you know, I've been lazy for the past three weeks. Partly it's because the Bureau of Immigration required me to spend four (!) days in Manila, but also I've just been spending time with family, reading news and writing. I am still working on the algorithm for eliminating #runSequence expressions and the :: quick-binding operator.

jonathanvdc commented 8 years ago

Sorry it took me three days to respond. Life's kind of intervened for me as well here. My exams are coming up soon now, and I've also been busy changing Flame's underlying data structures for attributes (flat sequences were silly, so I switched to tiny hashmaps) and names (strings were a bad idea to begin with). ecsc is slowly making some progress, too.

I see that you kicked off a discussion on roslyn. Awesome. Anyway, I'm sure that the roslyn folks will give EC# the attention it deserves.