Open davidanthoff opened 7 years ago
Interesting stuff.
I'm slightly concerned with the possibility - as pointed out on Lint.jl issue - that due to the dynamic typing of Julia we won't be able to extract the same level of useful information though I am more optimistic than MikeInnes seemed to be (a year ago at least). As a start getting a proper connection between AST and document location is key, I've got a 4 line change to JuliaParser.jl that gives us hard location information on all blocks which is then trivial to turn into a tree with child nodes determined by location.
An example:
module testmodule
using somemodule
function func1(a::b)
dothings(x) = somemodule.athing
if true
for i = 1:100
dothings(i)
end
end
end
end
gives a tree with each node knowing its parent/children and a reference to the expression.
0:1:177
1:--1:177 module
2:----19:35 using
2:----36:174 function
3:------61:92 =
3:------97:170 if
4:--------113:162 for
5:----------139:150 call
This adds no time to the parsing and constructing the nodes (including pointing to each corresponding expression) doesn't take much time.
Then for a given position (i.e. 143) we can quickly pull the block, and construct the namespace it operates in from it's parents on the fly. This should allow us to get the full and appropriate information for hovers, completions and signatures.
Mutability of the tree can I think be handled if I convert the location ranges to be relative to the parent, then changes to the source will only require updating of the top level of location information. And as
In terms of refactoring as a start we should be able to do a simple rename that can easily distinguish between global/local variables of the same name but in different scopes.
I'm sure there are many intricacies I'm missing but I think that quite a bit of progress can be made following this route and that if later on we decide to change approach most of the supporting infrastructure can be reused.
There is a simple hack to JuliParser.parser.jl to embed the location information of the non-keyword identifiers. With this we obtain a tree similar to following (for the example above)
module testmodule#1#8 # none, line 1: # none, line 2:
using somemodule#2#25 # none, line 3:
function func1#3#45(a#3#51::b#3#54) # none, line 4:
dothings#4#61(x#4#70) = begin # none, line 4:
somemodule#4#75.athing#4#86
end # none, line 5:
if true # none, line 6:
for i#6#17 = 1:100 # none, line 7:
dothings#7#139(i#7#148)
end
end
end
end
Basiclly, any symbol is now in the form "name#line#pos". if we can change lexer.jl then we can also calculate the correct column (3 lines code codes). This tree cannot be evaluated without removing the location information first. But with a smaller change in JuliaParser.parser.jl, we can get new functionaliry without breaking the current use of JuliaParser. At the end, we will have a function parse(ts::TokenStream, meta=false) with meta=true if we want the position infor, and the default is false, it doesn't change the current use of JuliaParser
Thanks for that. At the moment I've simply changed parse_eq slightly, which gives add character position info to all expressions. In addition it leaves the position information mutable so you can deal with inserts/deletions really simply.
function parse_eq(ps::ParseState, ts::TokenStream)
start = position(ts)-length(string(¬ts.lasttoken))+1
lno = curline(ts)
ex = parse_assignment(ps, ts, parse_comma)
isa(ex, Expr) && ex.typ==Any && (ex.typ=start:position(ts))
return short_form_function_loc(ts, ex, lno, filename(ts))
end ```
@ZacLN I think that sounds fantastic. And I agree, lets get started with that, and then we can still improve later on.
@bthlgd Encoding the location information in the names seems a bit shaky to me, to be honest...
Two broader questions:
Here is another high level read that is interesting: https://github.com/dotnet/roslyn/wiki/Roslyn%20Overview
Re-questions:
Yep, I've changed the parser to return Expr(:error, info)
with location info and to skip a line, for example a non closed function will return the inner block expressions at the global (or whatever the appropriate enclosing) scope. With a fair bit of work we could interpret these errors and return suggested fixes but I think it would boil down to rewriting half the Linter.jl code...
Nope, general formatting is out for the moment though I think some simple renaming may be possible.
@ZacLN I didn't notice that Expr has is a field named "typ". That's a very clever use of it.
@davidanthoff About the round trippable representation of the source code, I think another kind of expression tree might be needed, e.g. the following two code snippets will have exact the same expression tree
map((x,)->x*x, m)
and
map(m) do x
x*x
end
Yes, the current types for expressions certainly won't work for that.
Here are some interesting reads about the C# Roslyn design: