Closed mmn80 closed 7 years ago
I forgot to say that I am ready to do this work myself, but I need to know if the idea is accepted. Maybe there is an obvious issue with it. Otherwise I'll have to do my thing on a fork instead.
I found the older refactoring thread: #1539.
New day, new plan: scrap the git submodules and do it with packages, but in a maximally general and minimally disruptive way:
idris-core
package that will contain everything except codegens and the main executable. All build complexity, including docs and tests, will move to idris
, leaving idris-core
a simple, pure cabal
package, easy to refactor or depend on for other projects. The environment variables defined in IRTS.System
will be abstracted (eg: uninitialized IORef
s) and instantiated by idris
. All the rest of Haskell files remain unchanged. idris-core
. Note that codegens should sit close to front rather then back, since someone might want to generate Agda from surface syntax, for example, or use other high level hints to drive generation.idris
depends on all of the above and its main purpose is to integrate and build the executable, libs, tests and docs. Dependencies on codegen packages are under cabal flag and a module is added to idris
to conditionally import the codegen functions. Other languages built on top of idris-core
will have to do the same as idris
, but they will have access to all the generators for free (as long as the AST stage they build upon is compatible with the generator's, which is just a matter of type checking).I believe that this would be the right way for codegens, and it's an incremental step that enables people to further split idris-core
later, in line with the concerns expressed in #1539. All current codegens will remain in this repo, while new ones can be registered with a few lines of code. Moreover, this architecture supports the interactive compilation I was blabbering about in the OP.
I think I'll leave this lively discussion for now and go ahead and implement a PR. This way, I'll discover precisely what other hidden cyclical dependencies lurk inside, and we'll be able to discuss more concretely.
@mmn80 Sorry for not getting back to you sooner. Am I understanding correctly that the overall goal is to separate the code generators from the rest of idris to have a more modular design?
PS: It would be nice if you summarized your design using a bit shorter text. I am having a bit hard time following the proposal.
Sorry for the long text. Only the last comment contains the proposal in its entirety. I changed my mind after reading #1539.
Design:
idris-core
package contains all Haskell code, has a Simple
build, and the paths and all other environment vars are abstracted (to be instantiated by frontends)idris-core
)idris
) depends on idris-core
and on all codegens enabled by users with cabal flags; it contains no code other then Main.hs
, just builds executable, plus tests, docs, and libsGoals:
idris-core
and codegens are modularised and can be reused for new frontends and new languagesidris-core
further if desired (as mentioned in #1539)This isn't much shorter, but I hope is clearer.
Hi @mmn80 thanks for posting the proposals.
Certainly interesting. IIRC #1539 was abandoned as the proposer found an alternate route for their research, and refactoring the internals for better organisation was not an immediate priority.
Looking forward to seeing how this pans out, and how easy it is to maintain and affect for future iterations of Idris.
Ultimately, I think this will be up to @edwinb to decide, as primary maintainer and releaser.
The way I did it was the simplest way that allows people to write new codegens without the Idris core needing to know about them - we used to have four different ones and it became something of a maintenance burden. Anything that keeps this property is fine especially if it results in better organisation.
By the way, I know it's not relevant to the proposal, but I don't know where you get the idea that I don't care about modules. I have said in the past that it's not a research problem I'm interested in, but my interests change, and not being interested is not the same as not caring... Idris interfaces are remarkably close to ML modules, in any case.
Hi @edwinb. I did not want to go into details about my live coding RTS project in order to keep the post short, and also because it might fail. With that poor attempt at a joke I tried to communicate a bit (it's about Erlang, rather then ML style modules), but I should have known not to use loaded language when talking to strangers over the Internet. In fact, this is even mentioned in the CONTRIBUTING page. Sorry about that, I will refrain from such twisted speech in the future.
The work is progressing well, and I'll write a detailed PR when I'm done. I just hope that merging new commits from upstream would not be too painful, since the location of the files has changed. I made the commits such that git describes file migrations with renaming
, rather then deleted
and new file
.
@jfdm suggested on IRC that I should describe here in more detail what the problems I'm trying to deal with are, in order to clear some confusions, and focus the PR more on the solution.
The topic is extension, compile and run time. idris
has excellent support for compile time extensions, with powerful types, syntax rules, elaborator reflection, etc. But idris
also has some runtimes: those in the C and JavaScript codegens, which are not the subject of this post, and the REPL/IDE server processes, which is what I mean by "runtime" below.
IPC is not very good:
idris
server process; this is what motivated me in the first placeDBus
+systemd
, or turning idris
into an OS-like master process, both hard to get right; idris alone is nice as it just has a standard port the editors know aboutCurrent codegen system is good for what it is, a simple solution for an immediate problem, but, for the reasons above, is not suited as a general extension mechanism to be used effectively by 3rd party IDE/REPL tool writers, and the lack of process centralization is especially terrible for end users.
Simplest sustainable solution would be to have a way of registering tools to the runtime from within, since Haskell has built-in concurrency support and no marshaling would be needed. This requires global state (MVar
at top level in a core library module).
Once put in place, the MVar
can also be used to link extension threads up, so that they can freely interact with each other and the REPL. So a bit more like a system with dynamic bindings like the Elisp inside Emacs or the similar system they use at yi
, except that the extension is written in Haskell, and, since we are actually inside an Idris, can even be Idris (passing Term
s as messages, holding IState
in the global dynamic state, using the evaluator). There can be a special Eff
language for this editor stuff.
But this design creates a dependency problem that needs to be addressed (next issue).
This is a problem because runtime dependencies go backwards as compared to the compile time ones. My codegen needs to depend on idris
the library, but idris
the front end needs to depend on my codegen, since otherwise it would no longer be the front end. This pushes the frontend upfront, and forces users to create a host of batteries-included bundles, which is inconvenient.
Separation of the two allows creation of packages in between, and the default frontend can easily update a selection of bundled, but disabled by default, extensions.
Additionally, it simplifies maintenance of forks of either of the two, creation and maintenance of new front ends (eg: alternate Prelude
), and refactoring of the library, which will have a simplified build system.
Since the parts (codegens, etc) of the current package are conceptually distinct, situations arise during building or testing when they do need to be treated differently. So the build system already has this complexity within, except it's somewhat implicit, harder to debug and can more easily cause maintainers to make mistakes and break the build. For new, potential contributors, looking at the unified cabal or Setup.hs
file and understanding what belongs to what is an extra cognitive burden.
But packaging the codegens while also keeping them bundled forces us to deal with the dependency issue above and to do the split regardless of other concerns.
So I think the confusion around this project, both in terms of communication with you guys, and some mistakes I've made, comes from the fact that I try to do a number of different things at the same time. One is introducing dynamic extension hooks in the REPL, which leads to design decisions that have impact on the package structure, and others are general refactoring issues that I picked up from the old refactoring thread and also from what I've seen throughout the codebase. And this while my main motivation is just learning some new tricks.
I hope that the repo will end up improved for maintenance, extension, and just picking up by new users.
@mmn80 Could you please provide a couple of examples of what kind of runtime extensions you are interested in?
Well, upon further reflection I no longer believe this doubling down on the Haskell side is that much worthwhile.
So I was thinking of integrating Haskell packages such as yi
or diagrams
as a set of dynamically registered commands in the REPL/IDE of a type like SExpr -> Idris SExpr
, rather then just SExpr -> IO SExpr
or the current external codegen's Args -> IO ()
. An Idris mode for yi
, for example would have to depend on both idris-core
and yi
, so the refactored architecture would allow for 3rd parties to implement such a package, to be bundled as a disabled by default dependency in the idris
package, leading end users to install idris
like this:
stack install idris --flag --idris:codegenC --flag --idris:yi --flag --idris:diagrams
and resulting in a single executable where the editor, Idris repl/ide and other things would benefit from GHC's shared memory concurrent runtime, and combined type checking. The :e
command would switch modes instantaneously with no state loss on either side, etc. Other examples: IHaskell or Haskell for Mac style environments, interactive charting or code visualization.
But considering that the Idris REPL is already runtime extensible on the Idris side (the table of symbols :x
can take as arguments can be dynamically updated with :l
) this direction would eventually lead to an Idris to Haskell code generator and FFI. We would write these libraries in Idris, then benefit from the shared memory runtime in one of a few ways:
idris
3rd party packages as before, except that they contain mostly Idris code, and on a build hook we would use the Haskell codegen to generate the Haskell containing the Idris REPL symbol registration calls.:l
command to generate Haskell code beside the Main.hs
of idris
, modify Main.hs
to import it, and use a bundled dyre
to recompile and reload.The alternative on the Idris side looks more promising. We would essentially implement the -fobject-code
flag of ghci
, that would change the :l
command to compile Idris to C, create a shared library and link it into the executable. We would need to include 2 SExpr
marshaling functions in the C and other 2 in Haskell, and ensure that all functions are declared for export
. Even codegens could be eventually implemented in Idris, at least as Haskell wrappers (the specialized FFI needed here would be much easier to do), and just called like :x codegenC context
, where context
would be a built in binding.
So let's compare:
gcc+ld
and ghc+dyre
will win the slowness award, but I suspect the latter[SDecl]
, but for must other things compile vs. interpreted would matter more, for speed, then shared vs message passingIn conclusion I'll restate the problems this refactoring addresses, after acknowledging that the entire "insufficient mechanisms" section in the post above is pretty dumb:
idris
, reducing the maintenance burden of a special IPC protocol and Main.hs
idris-core
to achieve other goals (like those mentioned in the old refactoring thread), more customisable front end buildsPrelude
, the dyre
system mentioned above), in general bundling things the main Idris does not accept for various reasonsThe refactoring work is now a minimalistic conservative extension. Everything that used to work, including the external process codegens written in Haskell, will continue to work the same, but now you will also be able to publish interactive codegens and Idris libs that will be integrated by the frontend.
In order to minimize the number of files that need to be changed (for easier reviewing of the PR and decreased chance of mistakes), I'm still using a very basic MVar
based inversion of control scheme implemented entirely in a centralized location, which is IRTS.System
. I've been very strict about this and deferred any further refactoring work for future PRs. But I also thought more about the general problem.
After re-reading Dan Piponi's Quick and dirty reinversion of control, I realized that a general solution for such extensions (which I won't be implementing as part of this PR) would be adding ContT
as a new layer to the Idris
monad. Even better, pipes
(or machines
, which are CPSed by default for performance reasons), has a more sophisticated vocabulary that lets you to not just yield
and await
, but also request
and respond
, as it turns out all of these are useful. But let me explain in more detail.
The Idris
monad based code in idris-core
will be, in the language of pipes
, a Client
, while the fronend's Main.hs
will connect it to a Server
. The client will make requests to the front end using a simple ADT:
data IdrisRequest = ExecCodegen CodegenInfo
| GetCRTSInstallDir
| GetIdrisLibInstallDir LibName
| ExecREPLCommand CommandName CommandArgs
| ExecIDECommand CommandName CommandArgs
...
For instance, the generate
function will say request (ExecCodegen cg)
, and other parts of core will say dir <- request GetCRTSInstallDir
insted of just dir <- getDataDir
with its baked in import Idris_paths
.
The front end's Server
will then dispatch these requests to the appropriate implementation package in its build-depends
and return results into a corresponding IdrisResponse
ADT. The front end package contains nothing in it but this Main.hs
and the bundle of features in its build-depends
.
It will be very easy for other people to maintain batteries-included forks of the standard idris
front end, since idris
will almost never change, so no merging work will be needed. In fact, IDE tools could even automatically generate from a template such a package based on user selections in a UI, and by scanning dependencies in the ipkg
, and create a custom idris
executable for each project. Hackage could be used as an Idris package manager with very little effort. I do not want to argue with anyone whether we want this or not, I'm just mentioning it as a possibility.
From the constructors in the type above you can tell what extensions I thought of. 3rd party packages will be able to export codegens, Idris libs and REPL/IDE commands (the current built in ones can be lifted into a separate package as well), and all of them will be integrated by the front end.
But even with just the very quick and very dirty solution I'm working on, the idris-core
package will become lifted / platform-independent to a much larger degree then before. It will also maintain its integrity, since all current Haskell code will be in it, and not hamper language research by creating inconvenient package boundaries that other, more intrusive, refactoring projects might involve.
This refactoring focuses exclusively on disentangling the build system across the conceptual boundaries that already exist, but lays the ground work for more interesting language development related projects.
If anyone somehow managed to get through this whole wall of text, I would be really interested in your feedback.
@mmn80 Thanks for the update, we appreciate the effort! If you have a version that is conservatively working with Idris please open a PR as soon as possible, so that we can review the code and perhaps merge it before there becomes too much bitrot.
I still have things to do. I will send a PR only when I get it to actually work, which will be sometime next week, hopefully. I didn't have the curiosity to try and merge new commits, but I'm sure git will fail me once again and I'll have to do everything manually.
PR opened.
I am working on a new C RTS that is capable of code hot loading and I need to be able to compile individual Idris modules quickly. Unfortunately keeping an
idris
process in Ide mode only affords interpretation, but if I try to compile code it will fork a separate codegen process which has to load and grind allIBC
files, including thePrelude
(min. 2 sec on my machine), defeating the purpose of Ide mode for this scenario.The current C backend does not even spawn the external process, but rather does it inline, as it should, I'd argue. In fact there is an obvious bug in the Main.hs of
idris-codegen-c
from which I draw that no one ever used it before.[EDIT: see new proposal based on cabal packages in comments]
I understand the reason for keeping the generators separate (easy maintainance), but I propose a different solution, one that upholds the interactive property of Ide mode: git submodules (also used by GHC). I never tried them before, but from what I read they were created specifically to address this problem, and look like a sort of sandboxed, symlinked, foreign repo-folders. They address the security issue by demanding end users to manually pull their contents independently from the host repo's.
We could add cabal flags to completely disable compiling these subfolders and could even automate all
git
management commands, to make it seamless for users. After setting up the build system (basically adding hooks into ourSetup.hs
toward someMakefile
s in codegen's root and deferring all further responsibility to the codegen, plus thegit
and flags stuff), there will be no more work required from Idris maintainers, other then adding a few lines of code every time a new codegen wants in.An advantage for this proposal is that only minor modifications are warranted to Haskell source (other then build scripts), since the current C backend already works this way, as mentioned above. We just need to make a list where generators register functions like
codegenC
which are called bygenerate
. Another advantage is that we could immediately move the existing codegens to external repos and speed up compilation this way for people who don't need them.I know that Edwin does not care about modules, neither ML nor Erlang style, and I'm only writing this sentence in order to raise the number of different meanings of the word "module" in this post to 5.