Refactoring Idris internals

david-christiansen commented 10 years ago

Hi folks,

This ticket is really to serve as a place to plan a larger refactoring that I'd like to carry out. My goal is to enable the use of the core and the elaborator infrastructure to implement other languages (in particular, one language for my PhD project). Additionally, I'd like to enable editor plugins to re-use Idris's parser without having to run all of Idris but still get features like those of structured-haskell-mode.

Thus, I propose splitting the current Idris library into idris-parser, idris-language, and idris-core, where idris-language depends on both idris-parser and idris-core. The idris-parser and idris-language libraries would both contain ASTs (PTerm and PDecl), with the intention of eventually moving all the desugaring steps performed by the parser into idris-language, which would then have a simplified AST without do blocks and the like.

The plan is:

Separate idris-core, and implement a simple language that uses it as a test. That language could either be a front-end for TT, or a simple functional language (say, something ML-like). This will be a medium-sized effort, requiring detangling the core from IState, though this is largely the case already.
Separate idris-parser, and have it contain the current PTerm/PDecl implementations. idris-language can import these for now. This will be a bit effort, requiring detangling the parser state from IState.
Implement explicit desugarings, with two separate high-level term types. This will be a medium-sized effort.

What do you folks think? Is this the right place to cleave it? Should there be more or fewer splits?

Thanks for the feedback!

david-christiansen commented 10 years ago

One question in particular: what about a separate idris-compiler library that depends only on idris-core?

puffnfresh commented 10 years ago

I would love to see this happen. One thing I would like to be considered is the backends - would they require idris-core and/or idris-language?

I really want the use-case of using Idris' backend with a different frontend happen.

david-christiansen commented 10 years ago

I think that the backends should use idris-core only, and idris-compiler if it becomes separate. This will hopefully also allow them to specify wider version bounds.

eamsden commented 10 years ago

Will this allow for external backends packaged separately? The IU PL wonks have a habit of hacking together new backends for various languages.

david-christiansen commented 10 years ago

This is already allowed, and the Java and JS backends are already packaged separately. The LLVM one is on the way to an independent repo as well.

jfdm commented 10 years ago

@david-christiansen I believe @eamsden was refering not to the codegen targets but the actual idris internals. That is, can say the IU PL (the who?) construct a different compiler for idris.

Having thought about it over night.

My understanding is that the long term goal of this refactoring is to split idris-dev into separate functional units i.e. haskell libraries. These functional units will allow reuse of the idris ex-and-in-ternals in other endeavours. For example: creation of new Dependently Typed Languages; new languages that leverage the idris elaborator; new language that leverage the codegen facilities; slim line editor plugins and tools; new backends for the idris language; et cetera.

On the outset I am not adverse to this plan. The suggested components seem to be sufficient, and splitting the project into functional libraries sounds reasonable. Another benefit could be easier maintenance in the long run, and god forbid separate git repos.... However, I need a little more convincing.

So here are some numpty questions:

I am unsure as to how much disruption will come into play until the new code base has been stabilised.
- Can you elaborate on this more?
- How long will there be duplicate ASTs? or did I read that wrong?
- Would this require separate repos for the project or can everything still be in idris-dev?
- How does this affect the build process?
Can you elaborate further on the suggested components? What is the role that the idris-language will have?
Can you elaborate further on the new workflows for compilation using the new refactorings, including the IRs used and transformations?
How does this impact the codegen backends that are currently being established?
Will this allow for a different backends for the idris language to be generated?
How does this affect any plans for having separate idris-doc, idris-repl, idris-c, idris-pkg et cetera?

vendethiel commented 10 years ago

I really want the use-case of using Idris' backend with a different frontend happen.

If you can get combinatorial backend/frontend pairs, I guess it's pretty nice :)

david-christiansen commented 10 years ago

Thanks for good questions! I can see that I was unclear in a few places, and that there should have been more detail.

I am unsure as to how much disruption will come into play until the new code base has been stabilised. Can you elaborate on this more?

I would strive to do it in the most backwards-compatible, disruption-free manner possible, for my own sanity. Ideally, it would be a series of small changes that users of Idris don't notice, and where compiler developers have nothing worse than a merge conflict or missing import. The plan, should it come to fruition, would be to first split core into a separate Cabal library, included in the same Git repo, and move the various parts over there.

How long will there be duplicate ASTs? or did I read that wrong?

They shouldn't be duplicate - rather, there should be one AST representing what the parser can produce, and one representing what the elaborator can process, with an explicit desugaring step from the former to the latter. Today, the parser does some desugaring, and other parts are desugared explicitly here and there. One AST can't really provide both clean elaborator input and accurate source code information. For example, do-notation should exist for editor tools, but it should not exist for the elaborator.

Would this require separate repos for the project or can everything still be in idris-dev?

Everything would still be in idris-dev, but it would need separate subdirectories for each library. Cabal doesn't, IIRC, support multiple libraries in one .cabal file.

How does this affect the build process?

It will need to build each library separately. It should still just require make, but a bit of sandbox setup might be necessary on a fresh checkout. Part of the process would need to include putting this into the README.

Can you elaborate further on the suggested components? What is the role that the idris-language will have?

Certainly. I was suggesting:

idris-core would contain (roughly) the things in src/Idris/Core that deal with TT, like the evaluator, executor, type checker, etc. Also, the basic elaborator infrastructure would be here: see Core/Elaborate.hs.
idris-syntax or idris-parser would contain the Idris parser and the AST of the user-facing language. Ideally, editor tools that need a parse tree could depend only on this library, making their dependencies somewhat more manageable. It would not depend on idris-core or any other part of Idris.
idris-language would contain code dealing with the high-level Idris language and its elaboration into the core. This library would contain things like the internal desugared AST, the desugarer from the high-level terms that the parser produces, the actual elaborator, module loading, displaying error messages, the REPL code, etc. This would be largest.

In the above split, the compiler is either in idris-core or idris-language. It could also be a separate package, I suppose. I'm not sure yet.

Can you elaborate further on the new workflows for compilation using the new refactorings, including the IRs used and transformations?

Ideally the only difference would be the explicit desugaring step, and this is something we should have anyway to make the parser readable and to make it easier to write things like purely syntactic refactoring tools.

How does this impact the codegen backends that are currently being established?

Hopefully nothing more than a couple of lines updated in the dependencies section of the .cabal file

Will this allow for a different backends for the idris language to be generated?

If you mean different core languages, the answer is unfortunately "no". If you mean different codegens, the answer is already "yes", and this doesn't change it.

How does this affect any plans for having separate idris-doc, idris-repl, idris-c, idris-pkg et cetera?

I see it as unrelated. Those are also worthy projects, but I won't get my PhD project closer to completion by working on them :) Neither should step on the toes of the other in any significant way.

lunaris commented 10 years ago

@david-christiansen:

Disclaimers: Firstly, apologies for diving in here out of seemingly nowhere -- I'm a medium-time user but as of yet haven't contributed to Idris. Secondly, further apologies for contributions which may be hideously ill-informed/not relevant/too idealistic/not pragmatic given where Idris is right now.

In case you're interested, here are my thoughts on your suggestions:

I agree that something akin to idris-core needs to exist and that it should contain everything to do with TT and nothing else. To this end, I would argue that it should not make any mention of elaboration. Indeed, one might even consider naming it idris-tt (or even language-tt).
I think your idea of idris-syntax/idris-parser is great (again, given other packages on Hackage [if it even matters -- talk about bikeshedding], you might even call it language-idris). In contrast to the above, this package should make no mention of TT.
I think your suggestion for idris-language potentially conflates too many individually-useful features. I foresee something even more granular, such as:
- idris-elaborator, which handles desugaring and elaborating of idris-parser-level ASTs to TT (as described by idris-core (or whatever it ends up being named).
- Code generator libraries (which in theory might get away with depending only on TT, but I guess that depends on whether or not too much information loss has occurred by that point) -- back-ends which generate LLVM, JS, etc.
- idris-compiler (again, name up for debate), which glues the parser, elaborator, TT and (selected) back-end libraries to do the full job.
- idris-repl (yep, name yet again up for debate), which handles the interactive aspects of Idris programming.

I'm sure there are loads of features which I've overlooked which need rehoming (such as error displaying and editor integration) which are covered by your vision and idris-language but not mine. Whatever happens, I think it's great that Idris is evolving in a design/architecture-conscious manner and am really excited by its ongoing development (which I hope to start contributing to soon enough).

reuleaux commented 10 years ago

I am of course highly in favour of the planned changes and will at some point give more detailed comments (I am kind of busy, having just moved to my new home).

david-christiansen commented 10 years ago

@lunaris Thanks for the comments! Good to have fresh eyes look at things.

There's a bit of a tension between maximal re-use and feasibility of implementation - my time for doing this kind of thing is limited, and having too many small pieces might make it difficult to navigate the source. Some of what you suggest has already happened (like moving the JS, Java, and LLVM codegens into their own separate projects), and it seems that the idris-language could be further split if we find it necessary or useful. My metric here was "stuff that needs IState" goes in idris-language, at least for now.

The reason for including the core of the elaboration infrastructure in the core lib is that that core TT datatype already includes support for elaboration, like hole and guess bindings. It really is designed for use as a Haskell-scripted tactic-based prover.

At the moment, though, I'm more worried about splitting things that should be together than I am about letting things stay together that should be split. After all, if this happens, it will be an incremental process, because I don't have a week to be a hermit and crunch it through.

david-christiansen commented 10 years ago

@reuleaux Enjoy the unpacking!

david-christiansen commented 10 years ago

@lunaris It would be nice to get the REPL more split out, as you mentioned. It would be really neat to have it speak the same language as the IRC bot, the Emacs mode, and tryidris.org. As some kind of ultimate dogfood, we could also try to reimplement the REPL itself in Idris, speaking the IDE protocol.

altaic commented 8 years ago

Hello, I'm working on a semantic grammar for Atom, and I think I've figured out a fairly good way of doing it. I decided against doing a hybrid regex/semantic approach because (1) Idris has syntax extensions, and (2) doing a second background highlighting pass in Atom is not really feasible. Therefore, I'm relying solely on Idris for highlighting and I've got some questions/comments (I'll open separate tickets if this isn't the right place to discuss).

Foremost and hopefully relevant to this thread, using ide-mode's :load-file is slow, and doesn't make for responsive highlighting. Would separating out idris-parser as described above allow one to parse more quickly due to not using the elaborator and such?

It would be ideal if :load-file would do minimal work, and parsing, type checking, etc. would be completed by subsequent ide-mode commands, where certain commands imply running other necessary parts of the pipeline if they haven't been yet.

I hope I haven't missed something WRT ide responsiveness; I haven't yet had the time to dive into the emacs plugin for guidance. Anyhow, I can volunteer some time to speeding up ide-related stuff if desired.

david-christiansen commented 8 years ago

I'm glad to hear that you're getting the semantic highlighting into Atom! This issue is a bit out of date, because I abandoned my plans to incorporate Idris's guts into another language and instead focused on exposing Idris's guts to Idris itself (the elaborator reflection framework). So while I think this kind of split would be nice, it's lost the feeling of urgency for me.

There are a lot of things to be done to make everything nice and incremental and responsive. Part of that is enabling the parser to run separately of the type checker, part of it is getting the type checker going in parallel to the parser, and part of it is having a notion of damage and repair for source. It's a big job, and realistically speaking, it will need to be tackled in small parts.

The current way it works in the Emacs mode is that the mode does a little bit of minimal keyword highlighting until the file is loaded at the user's request. Then, as Idris parses and type checks the file, it emits highlighting information. Note that most of this information comes from the type checker, not the parser, as it needs to do things like resolving overloaded names in order to highlight accurately. So the highlighting information tells you not so much what is a keyword and what is a name, but rather what specific thing each name is pointing at.

A good start for making it nicer to work with would be to rig Idris to remember hashes of regions of source that it's examined (top-level mutual blocks, for instance), and then rewind its parsing and elaboration state to the latest unmodified one, rather than processing everything from the beginning. But this kind of discussion is probably best on the mailing list, or on a "responsive IDE" ticket.

Welcome!

reuleaux commented 8 years ago

David,

got the message, thanks, and while I have been quite on the list lately, and busy with my Pire (Pi-forall refactoring) stuff, I am certainly still very much interested in anything that is moving Idris + refactoring forward. - Can't make any promises though, when I will be able to come back, and spend more time on these things.

Best, Andreas

David Christiansen notifications@github.com writes:

I'm glad to hear that you're getting the semantic highlighting into Atom! This issue is a bit out of date, because I abandoned my plans to incorporate Idris's guts into another language and instead focused on exposing Idris's guts to Idris itself (the elaborator reflection framework). So while I think this kind of split would be nice, it's lost the feeling of urgency for me.

There are a lot of things to be done to make everything nice and incremental and responsive. Part of that is enabling the parser to run separately of the type checker, part of it is getting the type checker going in parallel to the parser, and part of it is having a notion of damage and repair for source. It's a big job, and realistically speaking, it will need to be tackled in small parts.

The current way it works in the Emacs mode is that the mode does a little bit of minimal keyword highlighting until the file is loaded at the user's request. Then, as Idris parses and type checks the file, it emits highlighting information. Note that most of this information comes from the type checker, not the parser, as it needs to do things like resolving overloaded names in order to highlight accurately. So the highlighting information tells you not so much what is a keyword and what is a name, but rather what specific thing each name is pointing at.

A good start for making it nicer to work with would be to rig Idris to remember hashes of regions of source that it's examined (top-level mutual blocks, for instance), and then rewind its parsing and elaboration state to the latest unmodified one, rather than processing everything from the beginning. But this kind of discussion is probably best on the mailing list, or on a "responsive IDE" ticket.

Welcome!

Reply to this email directly or view it on GitHub: https://github.com/idris-lang/Idris-dev/issues/1539#issuecomment-162715267

ahmadsalim commented 8 years ago

@david-christiansen What is the status of this?

jfdm commented 8 years ago

@ahmadsalim This may have stalled, but I think it should be kept open, i think the splits would be useful to have for those wanting to work on dependently typed languages.

However, I am not sure on what are views on feature requests that are plausible should be. That is: should they be kept open and left open; or closed even if plausible to declutter the issue tracker. The new labelling system should be sufficient such that the issues for plausible feature requests can be kept open.

ahmadsalim commented 8 years ago

@jfdm OK, thanks for the update!

ahmadsalim commented 7 years ago

I will close the issue, since no one has volunteered to do it and it has been inactive for a while. I am unsure whether it helps to keep track of feature requests with such ambitious projects if no one volunteers to do them. Of course, if anyone volunteers, please reopen as needed!

david-christiansen commented 7 years ago

I agree that this should be closed. I ended up doing elaborator reflection instead :-)

idris-lang / Idris-dev

Refactoring Idris internals #1539