microsoft / python-language-server

Microsoft Language Server for Python
Apache License 2.0
912 stars 133 forks source link

Create Stub (pyi) generator #1724

Closed heejaechang closed 4 years ago

heejaechang commented 4 years ago

let users create stub (pyi) from our analysis so that one can reuse analysis result next time and share pyi with other tools that can understand pyi file.

heejaechang commented 4 years ago

tagging @judej

MikhailArkhipov commented 4 years ago

Generating stubs and analyzing them is only faster if the regular analysis has to evaluate function bodies (i.e. there are no annotations or stubs available). However, in this case hit rate is pretty low and most commonly function has to be evaluated with arguments anyway. We actually tried this a while ago. Scraped (compiled) modules are saved as pyi and their analysis time is far from zero.

Ex, if parsing and analysis of stub (i.e. pyi) takes 20 ms per file, for 3000 files in tensorflow (or 5000+ in plotly) it is still 6-10 seconds and quite a bit of CPU burn. And we still have to do dependency analysis, build the graph, resolve loops and walk it.

What you want is no analysis at all.

heejaechang commented 4 years ago

@MikhailArkhipov that's fine. as long as we can produce pyi that other tool can also consume, that's still useful. also, sharing pyi between multiple users can still help us until user produce our db file with proprietary format.

my understanding is pyi for correctness and db is for performance. our auto-generated pyi will be as good as our analysis engine but not worse. so that should be fine for pyi purpose. we will get perf win from db.

heejaechang commented 4 years ago

@MikhailArkhipov I think your assumption on this work item is different than why this item exist. this is not to improve perf. but to provide community a tool to produce pyi.

if you know existing lib or tool or script we can use to do it rather than reusing our analysis engine, let me know than I can try that tool as well.

heejaechang commented 4 years ago

FYI @judej so it looks like there is existing this tools https://stackoverflow.com/questions/35602541/create-pyi-files-automatically

MikhailArkhipov commented 4 years ago

Sure, I mean to clarify what can be produced from analysis.

Yes, stubs are for correctness. However, they won't be real complete stubs (as in typeshed) since if there is no, say, information on function argument types or default values, analysis does not help to figure out. Basically what you get out of analysis is return types. However, for

def func(a, b):
   return a+b

you won't get anything useful. Similarly, if function return type depends on, say, isinstance, analysis won't help you generate multiple overloads with different parameters and return types. I.e. you won't easily get typical stub like

def func(a: int, b: int) -> int: ...
def func(a: float, b: float) -> float: ...
heejaechang commented 4 years ago

so, since we don't care about perf much for this feature (command line tool or command user need to explicitly issue), what we can do for method parameter types are finding all method calls in library and bind all arguments to induce possible input types.

it might not full list of types allowed, but it will all correct input types.

still not 100%, but better than creating everything from scratch by hand?

MikhailArkhipov commented 4 years ago

Typically stub describes public API surface. But internally public methods may or may not be called. Therefore it may be hard to tell what exactly parameters and return types are.

When building JS stubs for node.js API implemented in C++ back in the day I wrote utility to scrape node.js documentation markdown and generate JavaScript stubs from it. Then finished by hand.

MikhailArkhipov commented 4 years ago

Feel free to reopen if this is still needed.