Discussion: frontend, backend and "runtime"

elazarg commented 8 years ago

This is an idea for the long-run, not a "please do now" request. I don't know if the right place for such discussion is here; let me know.

I think the architecture of the system should be more modular, comprised of three independent parts, with well defined interface:

Frontend - parsing and semantic analysis
Backend - translating the output of the front end to "assembly language" of type checkers, which is some constraint language (textual language)
Runtime - evaluating the contsraints and emitting error messages

Coordination is performed in the driver and build manager as today, except that phase 3 is optional (through a flag similar to -s in gcc). There can be more than one target language.

The implementation overview describes four passes, but the data structures used are intermangled and not modular. Additionally, currently the "constraint language" is a hierarchy of Python classes and Python code.

In principle the system should be able to work without any Python object passed between phase 2 and 3 (I think that separating 1 and 2 is harder). This modularity will allow using other tools for each phase, and will be useful in case someone wants to plug in efficient constraint-solvers for scalability, or use mypy's constraint solver for portability. It is complementary to a plugin API, and will put less requirements on it.

(Of course, objects may be passed between 2 and 3 as an optimization, but this is just another "target language" which must happen through clear API,and my guesses is that such an optimization is unnecessary).

What do you think?

elazarg commented 8 years ago

It also gives a natural place for incremental checks - if the constraints are unchanged, there's no need to reevaluate them.

gvanrossum commented 8 years ago

I'm feeling pretty skeptical about this. It's like saying "We should reimplement Python from scratch, using this architecture." That's been done a few times (PyPy, Jython, IronPython, Pyston) and it has always taken way more effort than anticipated.

Your design may well be better, but whether it will succeed or fail won't depend on the design alone, it will depend on many details, and on the execution. I would much rather focus on improving the existing mypy implementation gradually than on some grand redesign, no matter how much more general that design might be. (Read Joel on Software: http://www.joelonsoftware.com/articles/fog0000000069.html)

I am also guessing that if it really was that easy, it would have naturally happened that way. Jukka is a really good and experienced engineer, and while you may not always immediately see the logic behind a particular implementation strategy, usually there is a really good reason.

elazarg commented 8 years ago

The first question is whether this architecture is indeed desirable in the first place, ignoring the costs. As you said, @JukkaL undoubtedly had his reasons, but the situation might have changed since 2014. If this architecture is not desirable, then there's nothing more to discuss; the decisions will be documented (if they aren't already), and I will learn something new.

Assuming the change is desirable, the question becomes that of cost/benefit. I understand the voice of experience says the cost is significantly more than it seems. But we can still discuss strategies for going in that direction in an incremental way, which might be feasible since the current design already use explicit separate passes. I have read Joel's post some years ago; I believe he has a point, and of course I don't suggest reimplementing anything from scratch; if that's necessary for the change, I withdraw my suggestion.

At the very least, a decision to move in this direction can justify certain refactoring (yes, I know, don't say anything), be a guide in implementing and reviewing new features (e.g. "don't entangle these two pieces of data") and help deciding which parts of the code deserve more documentation.

As a related but independent suggestion, it will be nice to have "assertion passes" between the phases, to verify (and document) the post-conditions of each pass and pre-conditions of the next, without adding more clutter to the code. Currently breaking something in e.g. the semantic analysis pass, is likely to result in (seemingly unrelated) error message in latter phases. Of course it cannot be avoided, but it can be reduced. After (gradually and slowly) implementing detailed assertion passes, it will be somewhat more straight forward to define a more explicit API between the passes, even if some intermediate state will still be implicit.

Another step might be serializing constraints as an optional step. Note that this is already done, in the form of stubgen - after all, a .pyi file is a constraint database, although with only partial information. There are two other parts that I can see: the usage constraints, and the rules of the type system. Again, as far as I can see (which is not so far, I admit) everything can be done incrementally, and it will already give the benefit of being able to plug in different solvers. Another step might be implementing deserialization of constraints, which again is already partially implemented in the form of consuming .pyi files.

gvanrossum commented 8 years ago

Maybe Jukka will want to continue this conversation.

JukkaL commented 8 years ago

Hmm, the proposal is not specific enough to write detailed feedback. I won’t suggest spending a lot of time on this, at least right now, but if you still think that this may be worthwhile after my and Guido’s responses, writing a sketch of how things could work and highlighting the concrete benefits could be helpful. Even though I’m skeptical, there are a bunch of things I can say without knowing more details. In particular, if you can split your proposal into smaller tasks that each bring incremental value, it's much easier to make decisions. Note that evaluating a big or complex proposal is hard and we may have trouble justifying the effort of even giving detailed feedback, as just that could take days.

I’m trying to explain here what sort of thinking goes behind the scenes then deciding whether to undertake a particular project that perhaps has some big potential benefits but that would also potentially take a big chunk of our development resources, such as this one. It’s more complicated than just a cost/benefit calculation. Other important factors include risk, uncertainty and “opportunity cost”.

For example, maybe the expected benefit is 4 and the cost is 3, for a net benefit of 33%. Yay! But wait, maybe there is a 33% chance of a total failure, and the work has to be scrapped. For example, maybe the approach doesn’t actually work for an important type system feature; maybe it’s too slow; maybe it can’t generate good enough error messages; maybe it doesn’t work well with the plugin system we’ve been thinking about. Now the expected payoff is negative, and the project doesn’t look very promising any more.

There are other sources of uncertainty:

The contributor working on this may give up halfway through the project. Maybe they lose interest or switch jobs and don’t have the spare time any more — and there is nobody to finish it. For a large project this is actually somewhat likely, as often the original estimate of effort is way too small.
The expected benefits and costs are notoriously difficult to predict. Maybe it isn’t quite as good as envisioned, or it takes twice as long to implement. These could have a big effect on the cost/benefit calculation.
A big change is likely to cause some disruption to other features that are being worked on at the same time. At the very least some open pull requests will be very hard to merge, and all mypy developers will have to spend some time to understand the new design. These costs are likely substantial but it seems difficult to predict how big the total effect will be, as it depends on things like the number of active contributors, which we can’t predict.

As the mypy core developers clearly have doubts about the proposed approach, it would be useful to see how other (successful) languages that have some similarity to Python/mypy have done this. TypeScript, Hack, C# and Go should be sufficiently similar to provide useful insights. If there is a widely used approach that is similar to what you are suggesting, that would give credibility to your proposal and reduce the perceived risk level.

Languages that don’t resemble Python very much are somewhat less interesting, as it may be undesirable to apply those techniques to mypy, even if it would be technically possible. For example, Haskell is known to have a steep learning curve, and it’s something we want to actively avoid. Much of mypy design is motivated by learnability and user experience.

Again, assume that we are thinking about spending 3 units of work to gain a benefit of 4. There are always other things we could’ve been doing with those 3 units of work. Maybe we could have implemented some neat type system features or made the type checker twice as fast, and written better documentation. The benefit of these alternative tasks could be higher, let’s say 6, so they'd be more desirable. Here it’s important to consider also the benefits to mypy users, not just the developer team. The users may not see any immediate benefit from an internal change, but they’d most certainly benefit from a much faster mypy. Basically you’d need to argue that this is best of all possible investments at a given time.

A 6 month investment generating a future saving of 12 months of work due to a better design (a 100% return on investment!) over the next 5 years may actually not be very enticing, say if we’d only have funding for the next 6 months — it might be way more important to work on features that help secure more funding in the near term to ensure continued development.

Having a feature, refactoring or any other nice thing today is more valuable than having it only 12 months from now. A big redesign or refactoring postpones other nice things, and the benefits from the redesign are gradual and may take years to be fully realized, no matter how significant. However, if we’d not do the refactoring we could be doing other improvements right now, and we’d get the benefits sooner (and learning from user feedback on these features), and we could also drive improvements to the typing module faster.

[Well, that turned out to be a pretty long write-up. I'm hoping that I can reuse that in the future in similar contexts.]

gvanrossum commented 8 years ago

Awesome insights in the art of software engineering, Jukka!

elazarg commented 8 years ago

Thank you very much for these insights. I will give it thought.

(I crammed together the risk and relative costs into "cost", but the distinction is important to keep in mind so thanks for making it clear, too).

I want to note here that I believe pursuing this direction might get you more developers, if I will be able to convince them it will help their research (specifically, finding uses for Ivy). But that's a mere hunch.

I would still like your opinion about the suggested architecture, ignoring costs and risks :) but now as a personal favor only.

JukkaL commented 8 years ago

I'm skeptical about close collaboration with researchers -- their incentives generally wouldn't align with ours. We don't care about being novel and they typically don't care about long-term maintainability and polishing their implementations to production quality (though there clearly are exceptions). Using boring, well-understood, battle-tested solutions is something we generally want to do.

Here are some thoughts about other aspects of your proposal (these are pretty random and in no particular order):

Having "assertion passes" sounds like a great idea. We could, for example, verify that unbound types don't leak from semantic analysis, and that every name expression refers to something after semantic analysis. This could be enabled in test cases and maybe skipped otherwise to avoid performance overhead.
The apparent complexity of the current implementation is largely a result of the complexity of modelling Python semantics. I initially targeted a much simpler language than full Python, and the implementation was much cleaner. There is definitely room for refactoring, but I think a lot of the complexity is difficult to get rid of entirely. All the nice Python features have an implementation cost, and some of the features are difficult for static checkers in particular.
The main motivation I can see for having constraints as a more of a first-class citizen in the type checker is to make it possible to type check entire statements (or entire functions) in one go, instead of doing it mostly one sub-expression at a time as mypy currently does. However, my intuition says that this will be significantly harder to get right than the current implementation. Even if the current implementation is a little messy with all the special cases and such, it's conceptually pretty simple and relatively easy to modify. I've worked on constraint-solving systems previously and my impression was that I'm not smart enough to design them (let alone debug them). Basically, doing one step at a time in a well-defined sequence simplifies things a lot in my experience, and having an implementation that is approachable to non-PhDs is important. Maybe it's possible to implement the constraint-solving bit in a simple and elegant way while addressing all the semantic details of Python, but I can't see how.
In particular, constraint systems seem to make it harder to generate useful error messages, and this is very important for users. There are ways around this (I think Hack has something called "witnesses" that may be relevant here) but it seems inherently more difficult, especially as the number and complexity of constraints grows bigger. This at least used to be one of the top complaints about Haskell, and I'm not sure if they have quite resolved this even after all these years.
We are contemplating some major changes to how mypy works internally, in particular a plugin system for extensions. The current implementation model seems pretty amenable to this (though I'm sure it's not going to be just smooth sailing) but I'm worried that a fancier constraint-based system would be harder to extend, at least with a simple API.
Updating constraints incrementally sounds like a complex problem. Yes, it may make it possible to do finer-grained incremental checking, but how complex would it be? Again, my experience tells that if something sounds complex before work on a detailed design starts, it often turns out to be even more complex once you start figuring out all the edge cases and such, and when it also needs to run fast.

elazarg commented 8 years ago

Wow. That's thorough... Thank you!

I'm skeptical about close collaboration with researchers

I understand your concerns regarding researchers. My thinking was that researchers can help build the infrastructure that will enable their research and will be useful for the industry in general, without putting any "novel" stuff in the main project; admittedly the code quality issue is still there. (As perhaps turns out to be the case in my grand-refactoring suggestions and PRs. I think it's more due to inexperience than to wrong incentives, though; I try to learn).

my impression was that I'm not smart enough to design them

You just did, didn't you? :) I mean, I don't entirely understand in what sense isn't mypy a constraint solver. A very specific one, of course, but as long as these constraints are serializable in any way, it still is.

... constraint systems seem to make it harder to generate useful error messages

Unlike you, I don't have any experience with constraint solvers, which might be the cause of some incorrect assumptions. It is not the first time I hear that constraint solving and static analysis yield poor error messages, though; but isn't it inherent in the problem? I'd thought that giving useful error messages is inherently hard and even poorly defined, since an error is useful only if it points to "the source of the problem" which depends on unexpressed intents of the user. So unless you can find a small number of constraints (in the problem description itself, i.e. in the source code) that comprise an "unsatisfied core" and print them all (as mypy does in a simple type mismatch), you have to guess the user's intent, and that's always hard.

Somewhat related and might interest you: Why JavaScript Programmers Hate You: an ode to dynamic languages by Jan Vitek (link to the specific point on gradual typing).

elazarg commented 8 years ago

This issue may be closed, as far as I'm concerned. Do you want me to open a new issue about assert passes?

JukkaL commented 8 years ago

(As perhaps turns out to be the case in my grand-refactoring suggestions and PRs. I think it's more due to inexperience than to wrong incentives, though; I try to learn).

Fair enough. I wasn't talking about people in the academia (sorry if I gave that impression), but the nature of many academic CS research projects -- in my experience the main research end product typically is a paper (or several), and the implementation is often a proof of concept which will be thrown away when a project ends. If you have to hop between research projects, it can be difficult to maintain continuity and find the time to polish your code.

I mean, I don't entirely understand in what sense isn't mypy a constraint solver. A very specific one, of course, but as long as these constraints are serializable in any way, it still is.

Well, I really meant a "general" constraint solver. Mypy has a pretty specialized constraint solver in it and it doesn't present the problems what I was talking about. Also, I don't think that it would be useful to serialize the constraints, as the constraints are used temporarily in very local contexts and then thrown away. (Though perhaps I misunderstood your meaning.)

It is not the first time I hear that constraint solving and static analysis yield poor error messages, though; but isn't it inherent in the problem?

There clearly are design tradeoffs. Generally, the smaller set of constraints you work with at a time the more predictable the system is. That's why mypy does local type inference, and generally gives up pretty soon it encounters something out of the ordinary instead of trying to do something very clever. Contrast that with whole-program type inference: if the tool gets one thing wrong, the invalid result may propagate through multiple functions or modules and manifest itself in totally unpredictable ways.

JukkaL commented 8 years ago

And yes, please open an issue about assert passes!

python / mypy

Discussion: frontend, backend and "runtime" #2150