julia-vscode / LanguageServer.jl

An implementation of the Microsoft Language Server Protocol for the Julia language.
Other
355 stars 75 forks source link

Roslyn links #2

Open davidanthoff opened 7 years ago

davidanthoff commented 7 years ago

Here are some interesting reads about the C# Roslyn design:

ZacLN commented 7 years ago

Interesting stuff.

I'm slightly concerned with the possibility - as pointed out on Lint.jl issue - that due to the dynamic typing of Julia we won't be able to extract the same level of useful information though I am more optimistic than MikeInnes seemed to be (a year ago at least). As a start getting a proper connection between AST and document location is key, I've got a 4 line change to JuliaParser.jl that gives us hard location information on all blocks which is then trivial to turn into a tree with child nodes determined by location.

An example:

module testmodule
using somemodule
function func1(a::b)
    dothings(x) = somemodule.athing
    if true
        for i = 1:100
            dothings(i)
        end
    end
end
end

gives a tree with each node knowing its parent/children and a reference to the expression.

0:1:177
1:--1:177         module
2:----19:35         using
2:----36:174         function
3:------61:92         =
3:------97:170         if
4:--------113:162         for
5:----------139:150         call

This adds no time to the parsing and constructing the nodes (including pointing to each corresponding expression) doesn't take much time.

Then for a given position (i.e. 143) we can quickly pull the block, and construct the namespace it operates in from it's parents on the fly. This should allow us to get the full and appropriate information for hovers, completions and signatures.

Mutability of the tree can I think be handled if I convert the location ranges to be relative to the parent, then changes to the source will only require updating of the top level of location information. And as

In terms of refactoring as a start we should be able to do a simple rename that can easily distinguish between global/local variables of the same name but in different scopes.

I'm sure there are many intricacies I'm missing but I think that quite a bit of progress can be made following this route and that if later on we decide to change approach most of the supporting infrastructure can be reused.

ghost commented 7 years ago

There is a simple hack to JuliParser.parser.jl to embed the location information of the non-keyword identifiers. With this we obtain a tree similar to following (for the example above)

module testmodule#1#8 # none, line 1: # none, line 2:
using somemodule#2#25 # none, line 3:
function func1#3#45(a#3#51::b#3#54) # none, line 4:
    dothings#4#61(x#4#70) = begin # none, line 4:
           somemodule#4#75.athing#4#86
       end # none, line 5:
    if true # none, line 6:
        for i#6#17 = 1:100 # none, line 7:
            dothings#7#139(i#7#148)
        end
    end
end
end

Basiclly, any symbol is now in the form "name#line#pos". if we can change lexer.jl then we can also calculate the correct column (3 lines code codes). This tree cannot be evaluated without removing the location information first. But with a smaller change in JuliaParser.parser.jl, we can get new functionaliry without breaking the current use of JuliaParser. At the end, we will have a function parse(ts::TokenStream, meta=false) with meta=true if we want the position infor, and the default is false, it doesn't change the current use of JuliaParser

ZacLN commented 7 years ago

Thanks for that. At the moment I've simply changed parse_eq slightly, which gives add character position info to all expressions. In addition it leaves the position information mutable so you can deal with inserts/deletions really simply.


function parse_eq(ps::ParseState, ts::TokenStream)
    start = position(ts)-length(string(¬ts.lasttoken))+1
    lno = curline(ts)
    ex = parse_assignment(ps, ts, parse_comma)
    isa(ex, Expr) && ex.typ==Any && (ex.typ=start:position(ts)) 
    return short_form_function_loc(ts, ex, lno, filename(ts))
end ```
davidanthoff commented 7 years ago

@ZacLN I think that sounds fantastic. And I agree, lets get started with that, and then we can still improve later on.

@bthlgd Encoding the location information in the names seems a bit shaky to me, to be honest...

Two broader questions:

  1. is JuliaParser error resistant? i.e. if it detects invalid code, will it continue to parse the remaining code? That seems really important for us, i.e. if a user has some invalid code in a given line in the editor, the LS should still give meaningful results for other parts of the code in the same file.
  2. for things like source code formatting, and also renaming, we would need a round trippable representation of the source code. That is still not feasible, even with the location info embedded, right? Because the tree still doesn't hold info about e.g. comments, white spaces etc. Or am I missing something there?

Here is another high level read that is interesting: https://github.com/dotnet/roslyn/wiki/Roslyn%20Overview

ZacLN commented 7 years ago

Re-questions: Yep, I've changed the parser to return Expr(:error, info) with location info and to skip a line, for example a non closed function will return the inner block expressions at the global (or whatever the appropriate enclosing) scope. With a fair bit of work we could interpret these errors and return suggested fixes but I think it would boil down to rewriting half the Linter.jl code...

Nope, general formatting is out for the moment though I think some simple renaming may be possible.

ghost commented 7 years ago

@ZacLN I didn't notice that Expr has is a field named "typ". That's a very clever use of it.

@davidanthoff About the round trippable representation of the source code, I think another kind of expression tree might be needed, e.g. the following two code snippets will have exact the same expression tree

map((x,)->x*x, m) and

map(m) do x
    x*x
end
davidanthoff commented 7 years ago

Yes, the current types for expressions certainly won't work for that.