Document architecture and relationship between component projects

ProofOfKeags commented 4 years ago

I think one of the barriers to new contributors such as myself on this project is understanding the suite of disparate tools that come together to make HLS a reality. I think guiding early users of HLS towards being able to hack out solutions to bugs they discover or know where a feature they want might go can go a long ways towards creating a self sustaining (and growing) project.

At present, as a pre-first time contributor (but attentive user), the questions that come to mind are these:

What is the relationship between hls, ghcide, haskell-lsp, hie-bios etc? How do they fit together?
Are there any component projects not mentioned in the list above that I am missing?
When I want to open an issue, fix a bug, or add a feature, how should I go about reasoning which of these 4(+?) codebases that the change belongs in?
Is the partitioning on these lines essential or a historical accident?
Is their partitioning presently beneficial to the project? If not, what barriers exist to merging them, if any besides the fact that it hasn't been done yet?

I open this issue as a way for me to start trying to contribute to tools in the best way I know how, which begins with some documentation on the architecture of hls.

I want to end with an anecdote about why I think this is important.

A few weeks ago I noticed #342 and opened an issue for it. It seems like something that was going to be within my grasp to fix, but the rabbit hole got deep fast. I was pointed to ghcide as the source of the problem, which in turn seemed to point to the idea that the problem was in haskell-lsp (yet another project I had not heard of). At this point I lost confidence in my ability to help the situation. It turned out alright as it looks like the issue was fixed in the last few days, but I can't help but think that the project would be better off if people such as myself were able to guide themselves quickly to the place where they could hack on things that weren't 100% local to the hls repo itself.

TL;DR If you have any information on the architecture of hls either by links to blog posts or know-how inside your head, drop them in this thread and I will try to compile them into a document inside this repository.

ndmitchell commented 4 years ago

As luck would have it, I gave a talk which touched on the architecture a few days ago: https://ndmitchell.com/#ghcide_04_sep_2020. Video is not yet up, but slide 22 of https://ndmitchell.com/downloads/slides-building_an_ide_on_top_of_a_build_system-04_sep_2020.pdf has a diagram of how these things relate. The paper associated with it is a detailed overview of how the pieces fit together in technical depth. To answer your questions:

The 4 projects are haskell-lsp (LSP bindings), hie-bios (how to set up a Haskell compiler), ghcide (the dependency mechanism behind an IDE) and HLS (plugins and user facing bits). Each of these pieces is usable independently. Some like LSP/Ghcide have other users. So they are a federation of projects that HLS ties together.
Leaving them separate and with other consumers, stops us overfitting and turning everything into a ball of mud. It's been valuable. I don't think changing that is a good idea.
Clearly documenting what the projects are, where to go for guidance etc. is something we should do. In particular all of the 3 projects HLS depends on should probably point at HLS in the first line of their README.
HLS is very much the umbrella. If you ever have a question and aren't sure, HLS is the place to go.

So lots of docs to add, lots of pointers required!

ProofOfKeags commented 4 years ago

My hero. *chefs kiss*

This is great. :pray: I'll watch the talk this weekend and compile the info I get from it into a doc that I'll PR into this repository

EDIT: can't watch the talk til it's up I guess (lol). That'll teach me to read more carefully

michaelpj commented 4 years ago

I think the ghcide/haskell-language-server split seems the least convincing to me these days, but we don't want to break DA if we can help it anyway.

alanz commented 4 years ago

@michaelpj I think it could be a good time to revisit the technical drivers behind the architecture, with a view to rebalancing. This project needs to work for the long haul, and so we need to make whatever changes are necessary to get it in a fit state for that.

My understanding is the from a DA perspective they pretty much have a solid product for their needs, will maintain their own fork, and we have freedom to change going forward, without being constrained by them. @cocreature do I understand correctly? Is there is a lingering constraint, stating is clearly will help us in considering options when we periodically revisit this discussion. As we will.

And I actually think some sort of (semi-)formal documentation of the decision process, including reasons why alternative choices were not taken, would be a good thing. One possible route is Architecture Decision Records, but any lightweight approach allowing this could work.

ndmitchell commented 4 years ago

FWIW, I still think that having the core separate to the plugins and installer is quite a nice design. Whether I'd have put them in different repos with different (somewhat unrelated) branding is a different question - but the key technical distinction between ghcide vs HLS still seems sounds, to me. But agreed, a nice debate would be great!

michaelpj commented 4 years ago

I do think that (the branding of) ghcide vs haskell-language-server is one of the top sources of confusion for new users (just judging by the questions I see on IRC etc.), so I think it would be of some value to improve the situation. As a straw proposal, if we had one repository with packages haskell-language-server and haskell-language-sever-core (née ghcide) I think that would be less confusing.

Then you can write in the readme: "For a simpler experience, haskell-language-server-core can be used directly as a more minimal language server without plugin support."

hie-bios is also a source of confusion. I can think of two things:

The configuration for haskell-language-server is partially done with hie.yaml, which is documented in hie-bios, which forces users to know of its existence. If we put the documentation here, users need never know that it's handled by another tool.
hie-bios and hie.yaml both confusingly mention "hie". It's not at all obvious what this is, let alone that it's an abbreviation for the now-obsolete "haskell-ide-engine". (The hie-bios readme doesn't mention what "hie" stands for anywhere)

ndmitchell commented 4 years ago

ghcide née haskell-ide-core :). I think putting a link at the top of the Ghcide repo advising people to use haskell-language-server might be a good idea. It's only recently the code has converged, so the advice people were given only a month or so ago is already wrong. I'm a little concerned about more renaming things, since following so many name changes is tricky. I don't think I'd ever recommend people using ghcide now, unless they were developing on it directly.

For hie-bios, I think the name hie.yaml is a bad name - it should be haskell-ide.yaml - I raised https://github.com/mpickering/hie-bios/issues/248. I also think we should put all documentation under HLS, and respond to all support requests via HLS, and only mirror the issues in hie-bios once they have a concrete "this is the bug, described" ticket.

Anrock commented 4 years ago

I think the name hie.yaml is a bad name - it should be haskell-ide.yaml

cradle.yaml maybe? Not sure why ide still comes up in naming - hls is language server and hie was ide engine, not ide.

ndmitchell commented 4 years ago

@Anrock - these things are all fundamentally about building an IDE. We are the Haskell IDE team. Our blog posts are posted at IDE 2020: https://mpickering.github.io/ide/index.html. We might have a bunch of names for individual components, but IDE is the common theme. If I am a Ruby programmer and see "cradle.yaml" in the repo it could do anything. If I see "haskell-ide.yaml" I can guess it probably configures the Haskell IDE.

alanz commented 4 years ago

Given we are at the start of a potentially very long journey, and we have not published much to hackage yet. So if we do choose to rename anything, now is probably the best time to do it.

And the counter-argument is that is it initially confusing because there has been so much change recently. Once it settles, it become just one of those quirks that everyone knows about.

sir4ur0n commented 4 years ago

My 2 cents :smile:

Merging GHCIDE and HLS

I like this idea, it's one step closer to more clarity for newcomers. 2 separate projects make it seem like 2 possible solutions to use (= confusion). As I have no (0, none, niet, nada) insight of the technical impacts, I think at least merging in a single Git repo, even if it remains 2 components, would be a nice step forward.

Incidentally, wouldn't this simplify the Git sub-module "Oh shit we need to update the ref again" thing?

This would also solve some frustration for users who open an issue (usually on HLS) and which then need to open the same issue on GHCIDE as the problem is in the underlying component (which users have no real knowledge of).

Candid question: would there still be benefits to publishing Ghcide on Hackage?

Renaming GHCIDE to something else (e.g. `haskell-language-server-core`)

I actually like this idea, as it would make it even more explicit (dumb-proof?) that one relies on the other, and that you almost definitely want to use haskell-language-server if you don't know what to look for.

I reckon the impact may be consequent at short term (change documentation in various places, many blog posts are expired, etc.), so I'd classify more this as "Nice to have". I agree with @alanz though that once dust settles, this will be but a past quirk of the project(s).

Context/ecosystem documentation

Coming back to the original request of this issue :sweat_smile: I also 100% agree. I don't think videos are good enough as they fare well for "deep insight" but are annoying to get a quick overview of the ecosystem.

I think 1 or 2 images with some text is far better, e.g. (from Neil's slides above):

or Matt Pickering's blog article

I find Matt's image hard to read (colors, fonts, too dense) and Neil's image too restrictive (e.g. doesn't include editors) but I am confident we could quickly get a nice compromise for the sake of clarity :smile:

I lack knowledge on the whole interaction between all those components, but I would happily play the part of the newcomer to help write clear documentation :sweat_smile:

cocreature commented 4 years ago

I agree with @ndmitchell that the split is still useful for a few reasons:

Ease of debugging: if you can reproduce something with ghcide, you’ve already drastically reduced the amount of code you potentially need to look at.
I think it’s generally a good architectural split. Keep a relatively clean core with few deps and then a large thing with bells and wistles.
Some people like something lightweight and don’t actually want all those features. Yes, you can in theory configure the plugins used by hls, in practice, nobody wants to change the code and recompile the world. There is a reason why you distribute static binaries and try to include it in ghcup.
ghcide is much much lighter in terms of dependencies which allows it to move much quicker in terms of newer GHC versions and other things. HLS is already back to the point where it depends on several autoformatters, it still cannot be published to hackage whereas ghcide has been on Hackage for literally months. A working ghcide for the GHC version you want is still better than an HLS that doesn’t work on your GHC version even if you do like all the features from HLS.

As for renaming, I think at this point this would cause more confusion than it would help.

In terms of repositories, I don’t have particularly strong feelings. Keeping them separate makes things easier for us at DA but I don’t think that should be the deciding factor. CI and tests in general seem like the primary reason for keeping them separate. Otherwise, it becomes very tempting to only test HLS and only test the GHC versions supported by HLS which destroys a lot of the benefits I outlined above.

alanz commented 4 years ago

@cocreature I actually agree with you (and hence @ndmitchell) that the current split does make sense. But I think we need to clearly articulate somewhere why we believe this is so, and hence what the decision criteria should be for deciding if a feature should be implemented in the one or the other. I do think this is something that tends to work itself out over time, and does seem to have stabilised. But that boundary is only apparent to the few inside. We need to clarify it for all.

And the rest goes to sending a clear message, and whether a wholesale name-change across the entire suite of interacting libraries makes sense.

ndmitchell commented 4 years ago

I think part of the reason for an unclear message is any branding/promotion we did on Ghcide is now harming us - encouraging people to do the wrong thing. To try and counteract that I wrote a blog post (https://neilmitchell.blogspot.com/2020/09/dont-use-ghcide-anymore-directly.html) and Tweet (https://twitter.com/ndm_haskell/status/1308334856594755584). I think if we spread a consistent message things should (slowly) fix themselves.

alanz commented 4 years ago

I wonder if we should create an Organization-wide project board, as mentioned here

Organization-wide project boards can contain issues and pull requests from any repository that belongs to an organization. You can link up to twenty-five repositories to your organization or user-owned project board. Linking repositories makes it easier to add issues and pull requests from those repositories to your project board using Add cards or from the issue or pull requests sidebar.

This might make having a coherent view of things easier. For my part I would happily move haskell-lsp in to the haskell org if this would help matters.

ProofOfKeags commented 4 years ago

Org wide project boards can help, but as @Sir4ur0n pointed out, it won't necessarily help with outsiders cutting issues to the correct places unless it is accompanied by an arch diagram/issue template that gives users an intuition about where to cut the issue.

However, that's not the only solution here. If we had a fairly aggressive triage of issues, we can make sure that issues cut against hls that are about features implemented in ghcide can be propagated quickly. I anticipate pretty much all of the issues that get cut by outsiders are going to be dumped in hls unless they (we 😅) are handheld through the issue cutting process.

But even better would be guiding us adventurous users to be able to solve the problem ourselves without requiring a priori knowledge of how all these projects fit together. I think the idea of a "treasure map" is one that benefits any large OSS project because one of the largest barriers to contribution is to know where to even look for various bits of functionality. When all you know is the language it is implemented in and what the end product does, sometimes a couple nudges to where certain things are implemented can make the difference between someone making a contribution or not. All of this presupposes that hls would benefit from more contributors, but I'm hard pressed to believe that this isn't the case.

alanz commented 4 years ago

I think that within a few months hls is going to be the "brand" users interact with, and so the issue tracker here will be the primary reporting point. And I think we need to take it upon ourselves to internally route issues to the right place, but perhaps keep the original open to communicate with the issue reporter.

ndmitchell commented 3 years ago

What is left to do here? Maybe a few links to things like @pepeiborra's tutorial and our IFL paper (https://ndmitchell.com/downloads/paper-building_an_ide_on_top_of_a_build_system-04_sep_2020.pdf - I'll upload the revised one which is more HLS centric once it gets officially published)? The brand story has changed sufficiently, and repos merged, that HLS is the one true thing and I think most users won't mind.

I also note that the Rust Analyzer has a page about architecture at https://github.com/rust-analyzer/rust-analyzer/blob/master/docs/dev/architecture.md, which says which boundaries are API boundaries, what each component should/shouldn't do. It might be nice if we had that.

Anton-Latukha commented 2 years ago

@phadej, you know this question better than all of us. (questions what tooling & how currently works to monitor dependencies, what you use & how it is done). Can you say how or give a graph of the project? We would put it into doc & it would boost the newcomer onboarding into the project.

fendor commented 2 years ago

I think such a graph is best created manually and maintained manually. Personally, I am not a fan of auto-generated graphs. But maybe, I am just used to generated Java class diagrams, which always look horrible and don't help in the slightest to understand the project architecture.

Anton-Latukha commented 2 years ago

It would require to manually redo graph every N months.

I use the https://github.com/haskell-nix/hnix/wiki/Design-of-the-HNix-code-base#module-connections-over-the-project for easy onboarding & frequently used to look at it myself. Module map - useful to understand the project internal dependencies & emergence of things, to place funtionality functions properly, group modules, refactor & to not have orphan instances.

Christmass story of children reuniting with parents

:gift: :family: :christmas_tree: Module map was crucial to reunite orphans with their parents in https://github.com/haskell-nix/hnix/pull/805/files, without a map - that is pretty hard to figure-out there. By having a map & aligning their type system with the project structure & finding the scope that is a meeting point - in landslide of cases that module happens to directs to the proper solution & directs to deep understanding of how to place them properly & elegantly. Before there were 28 orphans, after - only 1 that produces specific IO effects. To reunite 27 instances properly - it is a measurable improvement to the performance & mood of ecosystem compilations. ---

Module map also allowed me to understand how to improve the API & the module grouping & be able to architect the transformation of the project into proper mutipackage in the future.

Main project links graph is laborious but realistic to do manually. But module map - is impossible to maintain manually.

I also did a map of ghcide, https://matrix-client.matrix.org/_matrix/media/r0/download/matrix.org/weTNfVnClANyGtvHgcHkkqev, but since it is only ghcide - not submitted it so far. graphmod required a lot of workaround to generate it (mostly deleting everything except the module name & imports where tolling did not manage to parse the file) & upstreamed reports to it so far were not responded.

It is pretty nice. SVG allows being zoomed-in infinitely.

I know that phadej mastered these questions pretty good.

haskell / haskell-language-server