willsam100 / FShaper

FShaper - a tool to make the process of converting C# to F# much easier
MIT License
44 stars 11 forks source link

Support for new lines in string #11

Open agracio opened 4 months ago

agracio commented 4 months ago

Hi, I am having a problem when strings contain \n to create a new line in output. It is easy to reproduce using a test.

Input

void Foo() { Console.Write("First line. Second Line"); }

Result

------------------------CONVERTED------------------------
member this.Foo() = Console.Write("First line. Second Line")

Input

void Foo() { Console.Write("First line.\nSecond Line"); }

Result

------------------------CONVERTED------------------------
member this.Foo() =
    Console.Write(
        "First line.
Second Line"
    )

It splits "First line.\nSecond Line" into 2 rows and also adds another new line after member this.Foo() = and Second Line"

EDIT: Same is true for \r, however I can see that \t is resolved correctly

Any ideas how this can be resolved?

agracio commented 4 months ago

If \n is present in string then \t also breaks

void Foo() { Console.Write("\tFirst line.\nSecond Line"); }

Result

------------------------CONVERTED------------------------
member this.Foo() =
    Console.Write(
        "   First line.
Second Line"
    )

@willsam100 Any thoughts?

willsam100 commented 4 months ago

@agracio

I would take a step by step approach on this.

The first step, is defining the what the right output should. In most cases this is clear, however in some cases it's not.

this project always aimed to a direct approach to the translation process (as once we're in F# other tools can then move things forward). So from that perspective, I would define the following as the expected output:

member this.Foo() =
    Console.Write(
        "First line.\nSecond Line"
    )

Now that we know we want, it would then identifying where it's not working.

It's seems that instead of \n remaining as a char, it's being used to format. To find this, I would probably work backwards, starting the from output.

The formatting of the F# is done by fantomas (an old version of that package now). I think Fantomas is outputting it as the linebreak rather than \n. I would also guess this is where the additional newline is coming from.

Should it not be here, the next place to look would in ast that Fantomas is given, and confirming that the string is still one node in the ast. It's probable that it was split into various nodes further up, changing the representation.

Does this help?

agracio commented 4 months ago

Thanks! I will try to take a look at output before Fantomas formatting. I already checked all the instances of .Split('\n') and nothing is happening there so \n is not treated as new line when parsing initial input. I do expect output to be member this.Foo() = Console.Write("First line.\nSecond Line") without any line breaks the same as the line without \n. Although in my opinion in this context better interpretation would be

let foo() = 
    Console.Write("First line.\nSecond Line")

EDIT: after debugging confirmed that it is ast code that is incorrect before being processed by Fantomas.

willsam100 commented 4 months ago

Hi @agracio,

Nice work on finding the issue!

To clarify, the reason FShaper prefers translating C# methods to F# member functions rather than using let bindings is due to several important considerations:

Example in C

Consider the following C# class:

using System;

public class ExampleClass
{
    public void Foo()
    {
        Console.Write("First line.\nSecond Line");
    }

    private void Bar()
    {
        Console.Write("Private method.\nAnother Line");
    }

    public void Baz()
    {
        Console.Write("Public method Baz.\nYet Another Line");
    }
}

Broken Example in F

A direct translation of the above C# class to F# might look like this:

open System

type ExampleClassThatFailsToCompile() =
    member this.Foo() =
        Console.Write("First line.\nSecond Line")

    // Not allowed in F#
    let bar() =
        Console.Write("Private method.\nAnother Line")

    member this.Baz() =
        Console.Write("Public method Baz.\nYet Another Line")

However, this F# code will not compile. In F#, let bindings inside a type definition must be placed at the beginning of the type definition and cannot be interleaved with member definitions. This constraint makes it difficult to maintain the original structure of the C# class.

Why member Translation is Preferred

To avoid these issues, it is generally preferred to translate C# methods to F# member functions rather than using let bindings. Here are the reasons:

  1. Order and Structure: Reordering methods in a class to accommodate let bindings is complex and not typically what the author expects. It disrupts the original structure and makes the code harder to follow.

  2. Correctness: Using member functions makes it easier to check for correctness. The original order and structure of the methods are preserved, making it easier to verify that the translation is accurate.

  3. Single-Pass Compiler: F# is a single-pass compiler, meaning that all let functions must be defined in order if they call each other. This constraint makes it difficult to type-check the code. member functions do not have this constraint, allowing for more flexibility in the order of method definitions.

By translating C# methods to F# member functions, we can avoid these issues and produce code that is closer to the original C# structure, making it easier to understand and maintain.

I'm happy to accept a PR if you would like to contribute a fix for this issue. Your contributions are always welcome!

agracio commented 4 months ago

Hi, Unfortunately I was not able to figure out how to fix \n issue in AST output, so will have to leave it at that. Can post a working fork that uses .NET Standard instead of Mono code if that can help but assuming you are preoccupied with other work. My goal was to just create a basic UI using Electron for quick conversion but \n is a major issue.

member this.Foo() =
        Console.Write("First line.\nSecond Line")
// vs
let foo = 
    Console.Write("First line.\nSecond Line")

Is not really a big issue, just a matter of preference for single line conversion.

agracio commented 4 months ago

Managed to fix the issue in CsharpParser.fs by using regex to replace \n and \r at the point of SyntaxKind.StringLiteralToken construction. Unfortunately my branch is very different from yours as it is using .netstandard 2.0 without Mono and has changes to tests and some main files so not sure I would be able to cerate a PR very easily.

CsharpParser.fs Line changed: https://github.com/willsam100/FShaper/blob/master/FShaper.Core/CsharpParser.fs#L417

// added
open System.Text.RegularExpressions

       // change

       | SyntaxKind.StringLiteralToken ->

            let parseStringLiteral(literal: string, range)=
                let n = Regex.Replace(literal, "(?<!\\))(?:((\\n)*)\\n)(?![\\n/{])", @"\\n")
                let r = Regex.Replace(n, "(?<!\\))(?:((\\r)*)\\r)(?![\\r/{])", @"\\r")
                SynConst.String(r, range)

            (node.ValueText, range0) |> parseStringLiteral |> Expr.Const 

Regex only replaces \n and \r and ignores any instance when there is more than one \ present before n|r

willsam100 commented 4 months ago

Hi @agracio,

Nice work on your fix. I have checked this and it does work.

One thing I found while testing this out, is that the fix proposed is not idempotent. There is still an issue in how the F# code is formatted., The fix makes it pass the first time. If it is ran through twice then it will fail. While this is fine for external use, it can be a problem with testing, as the code is sometimes formatted again, which breaks it again.

I believe an alternative fix could be implemented here: https://github.com/willsam100/FShaper/blob/9b03ab5afe5779693677b5445100b2124b3a9ee1/FShaper.Core/Converter.fs#L793

To use Fantomas for the formatting, code snippets need to be wrapped in a structure. Once the code structure is removed, this code then removes newlines and indentation. This solution is clearly too naive! I believe improvements here would be an alternative solution, and probably more scalable.

I have not had time to take a deeper look into a fix here.

You mentioned that you had made some other changes.

Can you share what changes you have made to make this work on .NET Core, i.e., removing the need for Mono?

I see that FSharp.Compiler.Tools is what requires Mono. This package is very old and could be solved by depending only on FSharp.Compiler.Service. My first look at this shows that there are quite a few breaking changes here.

I'm curious if you have found a way around this?

Cheers, Sam

agracio commented 4 months ago

My code: https://github.com/agracio/FShaper, it is not marked as a fork since I had repo set to private while playing with it.

https://github.com/willsam100/FShaper/blob/9b03ab5afe5779693677b5445100b2124b3a9ee1/FShaper.Core/Converter.fs#L793 runs after the code has already been split into multiple lines if it is not handled in SyntaxKind.StringLiteralToken as it will treat all \n and \r as new lines in CsharpParser.fs methods when it parses C# syntax.

This method will return string with new lines if not handled in SyntaxKind.StringLiteralToken https://github.com/willsam100/FShaper/blob/9b03ab5afe5779693677b5445100b2124b3a9ee1/FShaper.Core/CsharpParser.fs#L853

EDIT: as for FSharp.Compiler.Service it uses v38.0.2 in the project and I have not changed it. The highest it can go would be v41.0.3 in line with Fantomas v4.7.9 as from version 5.0 Fantomas is a DotnetTool and cannot be used as reference in projects. I have attempted to upgrade but there are major differences in FSharp.Compiler.Service with namespaces being moved around, many constructors changed etc. Converting everything was incredibly painful and I gave up on it since it wasn't clear if it would even work and the remaining 10% of the changes were the most stubborn and painful. While making those changes I came across SyntaxKind.StringLiteralToken and decided to investigate it instead. Even SynConst.String constructor was changed in v41.0.3