Prepare ourselves for the Great Merge

thoughtpolice commented 8 years ago

The time for us to go upstream is "Real Soon Now", to get more hands and eyes on the build system from people. Here was the basic plan I outlined via email, particularly for migrating the tickets:

Merge the code into 'master'
Create corresponding Trac bugs, for the existing GitHub issues. Don't bother copy/pasting the whole discussion; just give a one-sentence description, and link to the GitHub issue to see the original discussion.
All new bugs should be filed in Trac, and if necessary, reference those (extant) GitHub issues.
Stop accepting new bugs in GitHub at the same time.
As we fix bugs, close issues. If we close an issue created as specified in point 2), then we also close the corresponding GH
Eventually, the old GH repository is abandoned/historical, and all future glorious work proceeds in Trac etc.

I'm merely putting this here as a reminder, and as more of a meta ticket for any remaining outstanding issues that might block the upstream merge.

ndmitchell commented 8 years ago

I must say I vastly prefer the GitHub workflow to the existing GHC approach. In many ways it will get more difficult to work with post merge. I somewhat wonder if just being a GHC submodule and remaining on GitHub is workable...

Ericson2314 commented 8 years ago

If the repos are to be merged, may I suggest git subtree in order to preserve all history.

thoughtpolice commented 8 years ago

I must say I vastly prefer the GitHub workflow to the existing GHC approach. In many ways it will get more difficult to work with post merge. I somewhat wonder if just being a GHC submodule and remaining on GitHub is workable...

I don't know what you mean: on the long or short term? Because GitHub and submodules are not happening in the long term, you're at best buying yourself some time here. But it has to be merged 'eventually' and I'd like to have it in the hands of developers who are going to have to use it every single day (presumably, for the rest of forever) in a way they're familiar with.

I should have clipped a few other points out of my email, but I was being hasty:

The goal is mostly to get it in the hands of people and give them the lowest amount of friction to actually work on it. Right now, it's basically vaporware, and getting it into people's hands (because it is a lot nicer in the scenarios it handles) will hopefully spur people to work on it. I will almost certainly begin using it exclusively for development builds and to rattle bugs out of the stage2 compiler.
We don't need to enforce highly stringent code review once this code is broadly available to GHC HEAD, at first. There are still many issues that are in flux, and we still need to bring it to parity with the existing system. There's plenty of room left here, to keep us occupied for a while, while we sort out the details, and we can tell people to consider it tentatively unstable.

There are other factors to this, too. I mentioned this to Andrey, but the reality is that the main GHC HQ does not pay attention to any place except Trac and basically Phabricator (for patches only). Whenever we do bug triaging and scanning, we require a ticket upstream, even for 'upstream' libraries we have as submodules, if the issue manifests as a GHC issue (as I'm sure you know). If these don't exist, we are not going to see or pay attention to issues, period. There's little point for a 'shadow' bug tracker, and keeping it alive forever isn't ideal.

Second, once we keep this upstream, we want to keep it working, meaning that changes to the existing system that need parity in the new system can be blocked. (Submodules further complicate the code review process if you e.g. update the Make system and are required to update the Shake one, too). This is also one of the primary motivators, to keep development in lockstep. We can also issue consistent CI builds at that point, too.

A middle ground might be to merge it into a branch in GHC, (e.g. wip/shake), which regularly has master merged into it, and it can be a free for all for a while for the people involved, outside of the normal process. This would be a decent middle ground that gets most of the above benefits with a slightly higher cost (git checkout). Later on, we can move that into master, and we will move the build system to being reviewed "by the book" just like all other pieces of GHC.

If the concern is rapid development right now, the last option IMO seems like the best to me, so people can be more 'unhinged' from the regular GHC process, while it's much more easily accessible to developers to try out. But that will, too, eventually come to an end, so this GitHub is eventually doomed. (And it should go without saying, but I'm never going to use a submodule just for my build system, that's just not gonna fly.)

thoughtpolice commented 8 years ago

If the concern is rapid development right now, the last option IMO seems like the best to me,

And FWIW, I mean "given the idea this is going to go upstream soon". When I said "Real Soon Now" it was a bit facetious - @snowleopard is obviously working hard and has been for a few weeks since I messaged him about this, hence the bug. I actually want to know what the tentative plans are, if Andrey has any plans for an ETA - you're more than free to stick around here for the near short term, but it's going to move eventually.

thoughtpolice commented 8 years ago

If the repos are to be merged, may I suggest git subtree in order to preserve all history.

We've done this before for base (although not using git subtree directly) so this isn't out of the question. Personally, I consider anything outside master essentially ephemeral with all the caveats attached, so I don't really care very much (this is a hot button topic though that people feel very strongly about, so I'm surely not on any sort of clear winning side.)

snowleopard commented 8 years ago

@thoughtpolice Many thanks for creating this issue! It will be very helpful for the Great Merge.

My main priority is to support the standard GHC development workflow. My secondary priority is to allow contributions from people who are unfamiliar with or dislike the standard workflow, even if it means that I will have to do extra work.

Can we all try to figure out the best way to achieve both if possible? I don't want to go into any arguments on which workflow is better; all I want is that everyone feels welcome and productive.

So, yes, we absolutely need to migrate GitHub issues to GHC Trac and I will do that.

But why shall we stop accepting new issues, comments, pull requests, or anything else here? If anyone cares enough to contribute their time into making the build system better, why put barriers on the way? If a new issue is important, I will go ahead and create a copy at Trac. If, say, it is just a quick question or a duplicate issue I'll deal with it without the need to involve Trac. Everybody wins.

An ideal scenario to me looks like this:

I copy new GitHub issues to Trac, but not vice versa. I'm fine to do it manually, it's not a big deal really.
I keep this repository as an unofficial personal branch and regularly sync it with the official master branch, where it undergoes the usual review process.
We welcome and encourage contributions via the standard workflow.
We also accept contributions here, with a perhaps slower and less objective review process, as this is usually just me.

Now, I don't know the standard workflow well enough to comment on how viable the above proposal is. But I hope it is. It should be, really! To a very crude approximation, every developer has a GitHub account, and no developers have a Trac account. (And yes, in another crude approximation, 100% of work is done by developers with a Trac account :smiley:)

snowleopard commented 8 years ago

I actually want to know what the tentative plans are, if Andrey has any plans for an ETA - you're more than free to stick around here for the near short term, but it's going to move eventually.

@thoughtpolice The planned ETA is 1 May, see the tree-tremble milestone:

https://github.com/snowleopard/shaking-up-ghc/milestones

snowleopard commented 8 years ago

By the way, hope you like the new name of the build system! :-) Approved at today's Great Merge meeting. No more "new build system", "Shake-based build system" verbosity.

thoughtpolice commented 8 years ago

Note that nothing precludes you from using GitHub and GHC itself during the rapid iteration time. My goal is to get the tool closer to real developers so they can use it more, because they'll un-doubtly give lots of informed opinions about it. That means it needs to be in the main repository they work on, somewhere.

You could very well just merge it into a wip/shake branch in the GHC repository, create your own GHC repository here, and everyone who's actually interested can join the free for all and work on wip/shake. You can synchronize it as you wish.

Nothing is stopping that - in fact, that's how several other GHC developers do large, out of band work (although not everyone pushes it into 'master', many only keep it in their local forks until merge time.) However, I still don't recommend it unless you're careful (see further below).

However, ultimately, I do not want (and would be strongly, strongly opposed) to keeping this on GitHub or as a submodule for a large amount of reasons. The main one is that having the build system separated from the thing it builds is totally, utterly pointless, and adds a bump for literally no reason. No other project in the universe does this, because it's so pointless. Merges suddenly get more complicated because you never merge content, only pointers. Actually having us review code we want in Phabricator then becomes pointless too, because it's always just a worthless submodule update that means nothing (but can at least be CI'd). In fact we moved base into GHC despite some complaints because 99% of the work was done by us anyway and keeping them synchronized was a huge, annoying effort.

Second, I appreciate your dedication to the craft! I can see your will to finish this is strong. But from experience, I have seen far, far too many people come along, dump thousands of lines of code into GHC, and then vanish into the void - never to return. It's understandable - life, kids, spouses, you decided to binge watch Game of Thrones. It happens.

That means we have to treat all of these things as if the primary developer is a huge flight risk who's going to leave the country tomorrow, honestly. Hinging development practices that will largely trip over and annoy people because one person is willing to 'pay the price' sounds noble, but leads to bottlenecks and disaster. It is not sustainable when things like this can easily occur. This isn't a value judgement on your work, or a doubt of your commitment - it's a risk assessment of what happens when the time comes, where we're the ones dealing with it forever.

This GitHub work also does not save as much time or effort as you would think. If we use an external repository, we are still on the hook for mirroring it. We have to keep our own synchronized copy and make sure it stays online in perpetuity, even if that repository later becomes obsolete - because to do otherwise destroys capabilities like bisection. As a result, commiting to an external repository at any point in time is effectively a commitment to keep it around forever for infrastructural and historical purposes. It's not something we want to do unnecessarily.

It's also very unwise for you to spend your own singular effort to shuffle bugs back and forth. If you decide to go herd goats in the mountains for a month, will we ever know if major problems arise? Will we even know what has been going on in the intervening time? Finally, it means that we're collectively out of the loop, because you are the only channel we have. That again is risky and a completely unnecessary bottleneck. If someone is just working on a fork off in space somewhere, sure, we can choose not to pay attention. But we can't choose not to pay attention to things in our tree we're directly responsible for and impact almost every other component of the system.

There are technical concerns, too. Things like GitHub issues pollute the trac namespace. If you merge a commit into the GHC repository that says Fixes #123 and you're referring to the #123 in this repository, it doesn't matter. Not only will it not close your issue, it will also pollute GHC's history and cause pointless warnings on Trac because it will target 'GHC issue #123' instead. This is a very important feature and as far as me and @hvr have seen, the best way to stop it from happening is just to never allow any merges directly from GitHub into GHC without careful review. It's too risky. This goes even for a wip/shake branch, which is why I recommend you do not track bugs here, but instead develop iteratively on the branch itself. Or at least, you can use GitHub - but never refer to issues or any PRs, and scrub any and all commit messages of upstream issue references.

Finally, I am not convinced by the argument that "Everyone has a GitHub! It's so easy!", because I don't think that argument is very important to GHC as a project (aside from the fact I don't think it's good as an argument, anyway), and I've yet to be convinced otherwise, while remaining open :). First off, it's not like GHC is going to die any time soon because this component wasn't on GitHub. If it was, we have bigger sustainability problems than Make or Shake can solve. And second, I'm not convinced that having this singular component on GitHub at the expense of all the things I mentioned above is actually worth it, when there's probably no evidence of it actually attracting large amounts of developers are users independently, who wouldn't work on GHC anyway. Anyone who uses the build system is working on GHC, basically. Sure, you get drive by contributors or some advice - but why would you not directly work with the upstream, but instead silo yourself? We're the ones who will end up doing a lot of work too. There's no argument this is actually better, as far as I can tell. There is definitely an argument of "I like GitHub and it would be cool if I could do stuff here still", but that's not the same, and while accurate, not nearly convincing enough.

Basically, this is bending over backwards for an extremely small subset of use cases and workflows that isn't clearly a win - yet with real costs and cognitive overhead, as well as large bottleneck issues for development. Versus just putting up with what we do today and just "doing as the Romans do", which a large amount of people already live with, and effectively work with, and are used to - and will be maintained by in the future. Yes, it's not GitHub. No, I understand it's your free time in many cases and people hate learning new stuff (I do NOT mean that pejoratively - I have limited time too!), but that's just how it is sometimes if you want play ball - you have to wear the uniform. And for a component as critical as the build system, there needs to be as little separation and friction between the project and that system as possible to have good developer feedback. I can see this as only adding substantial friction.

And as a final note - yes, I do like the name of the build :)

Ericson2314 commented 8 years ago

I don't mean to be a zealous git history partisan :). Part of my motivation for the subtree merge was that way this repo could be force-pushed into becoming a GHC fork without any loss of information. But the issue of "fixes #xxx" being picked up by GitHub and trac you mention might preclude that.

angerman commented 8 years ago

By the way, hope you like the new name of the build system! :-) Approved at today's Great Merge meeting. No more "new build system", "Shake-based build system" verbosity.

This name change should probably be announced on reddit and canonical mailing lists. I fear people will not make the mental leap from shaking-up-ghc to hadrian.

angerman commented 8 years ago

My take on the Github/GHC topic is as follows, from very personal experience contributing to ghc and hadrian, so take this with a grain of salt.

Contributing through github is ridiculously easy. It's almost trivial to do so. Pull requests and Issues are all lined up with the code, it's really easy to browse the source code on github as well.

Working with GHC, I got to know phabricator, and while I initially struggled a bit to understand how to properly use it, I fell in love. Especially from the point of the reviewing person, my opinion is that phabricator shines. Trac however, I can not stand; I find it hideous to work with, hard to find things (issues or wiki pages), and slow.

I have certainly fallen victim to what @thoughtpolice said, not only once.

But from experience, I have seen far, far too many people come along, dump thousands of lines of code into GHC, and then vanish into the void - never to return. It's understandable - life, kids, spouses, you decided to binge watch Game of Thrones.

I'm completely on board with respect to a single approach to reduce any duplication and overhead as well as making sure the tooling doesn't feather out, and the same channels are used for everything.

snowleopard commented 8 years ago

Thanks everyone!

So, just to repeat my priorities:

My main priority is to support the standard GHC development workflow. My secondary priority is to allow contributions from people who are unfamiliar with or dislike the standard workflow, even if it means that I will have to do extra work.

Can we all try to figure out the best way to achieve both if possible? I don't want to go into any arguments on which workflow is better; all I want is that everyone feels welcome and productive.

I can see now that the answer is: "No, we can never achieve both and it's not just a matter of extra work".

So be it! It's sad, but I can surely live with that. I'll be using 1 May as my target to get things prepared for the Great Move, as described at the top by @thoughtpolice.

snowleopard commented 8 years ago

This name change should probably be announced on reddit and canonical mailing lists. I fear people will not make the mental leap from shaking-up-ghc to hadrian.

@angerman Sure, we will announce this together with the Great Move announcement. Things are currently broken anyway because of the rename -- I need to fix paths in configure.ac, which will be my first try at the standard workflow.

ndmitchell commented 8 years ago

So sounds like GitHub is unworkable - fair enough. As an infrequent contributor my preferences should certainly not be weighed equally to the existing GHC team or @snowleopard.

snowleopard commented 8 years ago

@ndmitchell My personal preferences are the same as yours, but well... :-) GHC team rightfully has the highest priority.

thoughtpolice commented 8 years ago

As an aside, I know one thing people get anxious about is Arcanist. If it makes y'all feel better: I'm hoping soon that upstream Phabricator will actually support 'git push' functionality directly, which should really close the biggest annoyance with Phab, which is that you need arcanist for most workflows. (I say this because it's on their prioritized roadmap, AFAIK, so someone is paying $$$ to them to build it.) Right now someone is paying them for some other support, but it's on there.

So while it's not exactly GitHub, the actual mechanics will hopefully be more familiar before too long. (And the review tools really are much better, for reviewers and developers alike, IMO.)

angerman commented 8 years ago

@thoughtpolice if that happens that would certainly remove the pain with arc; so that is great news.

So while it's not exactly GitHub, the actual mechanics will hopefully be more familiar before too long. (And the review tools really are much better, for reviewers and developers alike, IMO.)

This!

Yet, if I could have a github like service (code, issues) with phabricators differential on top of PRs I'd switch instantly.

snowleopard commented 8 years ago

Trying out fearful arc: https://phabricator.haskell.org/D2153.

snowleopard commented 8 years ago

OK, I'm almost ready for the Great Merge. Just a couple of things left, which I will hopefully fix in a day or two (ICFP response period may interfere with this a bit).

@thoughtpolice Shall I go via Phabricator with the merge? I don't mind either way. Direct push is easier, but on the other hand we might as well give a chance for people to comment on the commit in Phab, so this way seems more polite.

As soon as we merge, I'll start migrating major issues to Trac. Mostly, they will correspond to the list of current limitations: https://github.com/snowleopard/hadrian#current-limitations. We will also stop accepting new issues on GitHub.

Furthermore, to follow up our IRC chat (with @thoughtpolice and @thomie), the mode of work in the short term will probably be influenced by whether we have new active contributors after the merge. If we do, then this GitHub repo will become history very fast. If it's still mostly myself, Neil and other GitHub contributors, we could continue putting our everyday commits here (I tend to use a lot of small commits to benefit from CI) and periodically squash them into patches to be pushed to GHC, say, one per Trac issue. Yes, the benefit of using GitHub is reduced (we can no longer use GitHub issues integration), but the rest of the workflow remains familiar and less traumatising -- the arc experience was as dreadful as I expected :-(

thoughtpolice commented 8 years ago

@thoughtpolice Shall I go via Phabricator with the merge? I don't mind either way. Direct push is easier, but on the other hand we might as well give a chance for people to comment on the commit in Phab, so this way seems more polite.

No, I'd just say push directly. Realistically nobody can review thousands of lines of code at once like this as a blob, it's just impossible, so there's very little advantage to it. A direct push into the repository just adding hadrian as one big commit is the way to go.

snowleopard commented 8 years ago

@thoughtpolice OK, thanks. Will do once #234 is finalised.

Ericson2314 commented 8 years ago

If it's still mostly myself, Neil and other GitHub contributors, we could continue putting our everyday commits here (I tend to use a lot of small commits to benefit from CI) and periodically squash them into patches to be pushed to GHC, say, one per Trac issue.

God I'm going to sound like a broken record here, but git subtree also automates that process exactly --- including the squashing.

bgamari commented 8 years ago

John Ericson notifications@github.com writes:

If it's still mostly myself, Neil and other GitHub contributors, we could continue putting our everyday commits here (I tend to use a lot of small commits to benefit from CI) and periodically squash them into patches to be pushed to GHC, say, one per Trac issue.

God I'm going to sound like a broken record here, but git subtree also automates that process (including the squashing).

I agree that it does seem like this would be a good application for git subtree.

thomie commented 8 years ago

If it's still mostly myself, Neil and other GitHub contributors, we could continue putting our everyday commits here (I tend to use a lot of small commits to benefit from CI) and periodically squash them into patches to be pushed to GHC, say, one per Trac issue.

Using this repository after the great merge will lead to some problems:

The ghc repository will now have a hadrian folder. How would you clone this repository into the ghc repository? Clone it into hadrian2? Delete ghc/hadrian first?
Once changes are made to the hadrian folder in the ghc repository (via phab, whatever), how would you keep ghc/hadrian and this hadrian in sync (both ways)? I suppose git-subtree helps here?
If ghc/hadrian and hadrian can be out-of-sync, it's just going to be confusing for contributors what the latest version is.

If you don't keep this repository, but work on a (Github) clone of the ghc repository, the ghc and hadrian .travis.yml files should get merged (slowing a Travis run down probably).

I still think it's better to withhold the merge until hadrian is closer to being finished, so @snowleopard and @ndmitchell can keep using the workflow they are used to. Running git clone git://github.com/snowleopard/hadrian for those who want to try out / help work on hadrian seems way easier (what's the problem really?) than this git-subtree/git-submodule/multiple-repositories/multiple-histories/multiple-places-for-code-review/arc stuff. You're just setting yourself up for extra work.

snowleopard commented 8 years ago

God I'm going to sound like a broken record here, but git subtree also automates that process exactly --- including the squashing.

@Ericson2314 My apologies, your git subtree comment somehow didn't register in my mind. The reason is very simple: I've no idea what git subtree is and have never used it! But now, after a quick google search it indeed looks like what we want. Thank you for pointing it. I need to rtfm, but if you could briefly share your understanding of how we should use git subtree that would be much appreciated.

snowleopard commented 8 years ago

You're just setting yourself up for extra work.

@thomie You are most probably right. And I don't yet have answers to most of your questions on how this will work in practice.

Why merge now? As I think I have mentioned before, the hope is that once we merge, we will get more users, more bug reports, and more contributors. There may be potential contributors who are not willing to contribute while the project lives in GitHub in a toy-like status, but will start contributing after the merge. There may be users who can't be bothered to run git clone git://github.com/snowleopard/hadrian, but will look into /hadrian once it is there out of curiosity and will give it a try.

So, let's merge and see. If the above hopes are false then we will just keep hacking in GitHub, occasionally synchronising via git subtree or whatever. Shouldn't be too much hassle. It only becomes a hassle if we do get new contributors, but then -- hurray -- mission accomplished!

Ericson2314 commented 8 years ago

Sure. Now its only something I use time to time either :), but the basic idea is add will do the initial merge, split will filter upstream changes that affect the hadrian subdir in a deterministic manner (to push to this repo) and merge will get changes from the hadrian branch back into the ghc branch.

Ericson2314 commented 8 years ago

I just did it to refresh my memory. For the initial merge:

cd ghc # the ghc repo root
# if the hadrian is in the ghc working dir
mv hadrian ../ # or remove it, just cant be here while we merge
git remote add hadrian git@github.com:snowleopard/hadrian.git # can also use local path like ../hadrian 
git fetch hadrian
git branch hadrian hadrian/master # make a `hadrian` branch tracking hadrian/master
git subtree add --prefix=hadrian hadrian/master # important not to squash the first time. if one wants to merge future changes.

snowleopard commented 8 years ago

@Ericson2314 Awesome, many thanks! These are pretty detailed instructions on how to do the initial merge of Hadrian into GHC tree.

There are two other important use-cases.

I work on a Trac issue, fire a series of commits into my Hadrian branch here until I am happy. How do I now squash these commits and merge the result into GHC tree? I presume I need to use merge.
Someone made a contribution to Hadrian in GHC tree and I need to update my branch accordingly. Here is where split comes into play, right?

If you could draft scripts corresponding to (1) and (2) above that would be great.

Ericson2314 commented 8 years ago

So you can squash with --squash for either of those in both cases. But you may want not to do so because when splitting a squashed merge, the history won't include the tip of the prefix branch (hadrian in my example) because its been squashed. Squashing works best when history only moves in one direction.

edit: there is an --onto flag that I assume works similarly to git rebase foo --onto bar that should be able to work around the issues with squashes.

Ericson2314 commented 8 years ago

Update master from hadrian

git checkout master
git subtree merge --prefix=hadrian hadrian

Update hadrian from master
```
git checkout master # not hadrian
git subtree split --prefix=hadrian -b hadrian --rejoin # **edit** forgot `-b
```
The rejoin helps subsequent merges understand that master and Hadrian have been "synchronized" by the split. But --onto can also be used to avoid that.

edit: actually there is no --squash variant for split, but git read-tree --prefix when the hadrian branch is checked out should able to achieve something similar. [Pretty sure split basically desugars to that on a git filter-branch (fmap for commit tree if only the git devs knew Haskell :)).]

snowleopard commented 8 years ago

@Ericson2314 Many thanks again! I will do a couple of experiments using your scripts.

There is one problem left to be solved. We need to make sure we do not pollute GHC history with pointers to GitHub issues and PRs, so we can't simply do git subtree add --prefix=hadrian hadrian/master. Or am I missing anything? You say it's important not to squash the first time. But I think we do want to squash, as otherwise we would need to rewrite the whole commit history somehow?

@thoughtpolice Do you see any pitfalls in the above approach proposed by @Ericson2314?

thomie commented 8 years ago

Let me start with reiterating that I'm very grateful that you're working on this project, and I'm looking forward to day we can delete the old build system.

once we merge, we will get more users, more bug reports

That's not a good thing! GHC developers should be doing GHC development, not be be bothered by build system bugs. As long as the issue queue here isn't empty, things like sdist/bindist are completely absent, users still running in complete build system failures every now and then, you don't need more input. I would really prefer GHC developers are shielded from this work-in-process.

You're just setting yourself up for extra work.

Let me rephrase: merging prematurely sets everybody up for extra work.

What are the open issues with the new build system? Check two places.
What is the latest version of new the build system? Check two places.
Where do I submit patches for the new build system? Check two places. More questions on irc.
People reading all ghc-commits. More emails, more noise.
Build bots building validating all GHC commits (https://perf.haskell.org/ghc/ https://phabricator.haskell.org/diffusion/GHC/history/). Longer wait queues. They certainly can't handle 20 [edit: commits] / day like you're pushing now.

If you were to use Phabricator exclusively, no more Github to avoid confusion, then you're just setting hadrian contributors up for arc annoyances and very flaky Continuous Integration (Phabricator hasn't been building patch submissions for the better part of this year).

In my opinion, we shouldn't be pushing unfinished work to the GHC repository. Code should be finished, documented, well tested, thoroughly reviewed etc, commits should have a proper commit message. It's going to be in the repository forever. If hadrian development were to follow the GHC process now, it would just slow its development down, which would be very bad.

Really, just keep working they way you are, it's going quite alright. You can move fast, make mistakes, break things, and nobody is too much bothered by it. Send out an email to ghc-devs every one in a while with a status update and asking for volunteers. @erikd said he would come to help with the cross-compiling part. @thoughtpolice was interested in using hadrian for day to day development already, so he can report bugs when he encounters them.

@ndmitchell My personal preferences are the same as yours, but well...

We don't have to follow @thoughtpolice's wishes. Stand up to the man!

ndmitchell commented 8 years ago

If I had the option, my roadmap would be:

Make hadrian a submodule of ghc, so every GHC checkout includes this github version of Hadrian. Keep everything at GitHub. The build system remains the old system.
A few interested users start playing, reporting, fixing, solving, improving.
A few GHC developers (not necessarily the main ones) are using it for their day-to-day work.
Then merge and flip the switch and make it the "preferred" system, through trac/Phab etc.

You end up at the same place, but its easier to experiment as we head there.

thomie commented 8 years ago

@snowleopard: if my rant didn't convince you, no need to refute every point, I'll stop arguing

To get more contributors for hadrian, writing extensive documentation is probably your best bet. I found https://ghc.haskell.org/trac/ghc/wiki/Building/Architecture very helpful when starting out with the old build system. Then, make a plea for contributors to reddit.com/r/haskell. People say they want to contribute to open source Haskell all the time, but don't know where to start.

snowleopard commented 8 years ago

@thomie I don't think you fully convinced me yet, but you did force me to think deeper, so thank you!

I still prefer to respond to your individual points.

What are the open issues with the new build system? Check two places. What is the latest version of new the build system? Check two places.

Why? I disagree on these two points. After the merge there is only one place to check. Issues are on Trac. Latest official version is in GHC tree. Always.

Where do I submit patches for the new build system? Check two places.

This is a good point. Indeed, there would be two options. Not sure this is necessarily a problem though, as both should work.

People reading all ghc-commits. More emails, more noise.

Sorry, I don't get this one.

Build bots building validating all GHC commits (https://perf.haskell.org/ghc/ https://phabricator.haskell.org/diffusion/GHC/history/). Longer wait queues. They certainly can't handle 20 [edit: commits] / day like you're pushing now.

Ah, that's a very strong point indeed. I rely on relatively fast CI a lot. And surely I wouldn't want to slow down everyone else by my commits.

OK. Let me think about it. I'll probably fire an email to SPJ & Co to also ask their opinion, CC-ing you.

thomie commented 8 years ago

What are the open issues with the new build system? Check two places.

I probably misunderstood your earlier comment: "I copy new GitHub issues to Trac, but not vice versa."

Maybe your idea is to close the Github issue after it has been copied, and referring the Github user to Trac for updates on their ticket. That seems actually less user friendly than not accepting Github tickets at all.

There would need to be a new "Build System new / Hadrian" component in Trac probably, to not mix up old (https://ghc.haskell.org/trac/ghc/query?status=!closed&component=Build+System&order=id&desc=1) and new build system tickets.

What is the latest version of new the build system? Check two places.

This refers to "we will just keep hacking in GitHub, occasionally synchronising via git subtree or whatever".

If development can take place in snowleopard/hadrian as well as in ghc/hadrian, then there will be two different masters (as that is how I understand git sub-tree works). Running into a bug in one that turns out to already have been fixed in the other would be quite frustrating.

People reading all ghc-commits. More emails, more noise.

For every GHC commit, an email is sent to https://mail.haskell.org/mailman/listinfo/ghc-commits. Some (most? I don't know) GHC developers subscribe to that, and read at least the commit message.

If I understand git sub-tree correctly, commits to ghc/hadrian would be just like normal ghc commits.

(I don't know what would happen on the initial import, but I hope it doesn't flood the mailing lists and build bots with > 1000 commits. I guess the squashing you talked about should prevent that?)

Everything would be so much simpler, if there could be one time point this year where you:

merge the code + open tickets + documentation/wiki
move development from github to phabricator
make hadrian the preferred system, used for validate, bindists, development, everything

Then, after a few (but only a few) hectic weeks of last-minute bug fixing and reduced productivity, kill all traces of the old system.

Ericson2314 commented 8 years ago

@snowleopard Ignore my --squash FUD :)! I forgot about --onto when I wrote it. Now using --onto does require manual intervention every time, negating the ease of use git subtree provides, but nevertheless should work. Note with that workflow, you may be interested in skipping git subtree and just using git read-tree for all 3 use-cases:

git checkout <dst-branch>
git read-tree <src-branch>:<src-prefix> --prefix=<dst-prefix>

thoughtpolice commented 8 years ago

We also cannot simply use git subtree to merge things, due to the namespacing issue with GitHub/Trac I've mentioned previously. Unless I missed something and the previous recommendations scrub all references? If Trac identifies all these commits as actual GHC commits, it's also going to send a bunch of false emails notifying all of the tickets. We should really, really, really not do that.

Please just don't bother with any of this subtree nonsense. Just make one giant commit and call it a day. subtree is not used by most of the GHC developers and none of us really care to know how it works in the long term. I'm sure it has technical advantages in some dimensions but the benefits are really immaterial especially if the thing I've mentioned above does not take care of scrubbing issue references, and at the end of the day most of the other reasons to use it are just red herrings (see below).

Make hadrian a submodule of ghc, so every GHC checkout includes this github version of Hadrian. Keep everything at GitHub. The build system remains the old system.

Honestly, I hate submodules so much and want to avoid adding any more of them in the development path (they slow down clones, suck to work with, etc) where possible that I'd just rather wait like @thomie said if this is the only option. We'll have to support it forever after that, too. Yes, I really dislike submodules that much.

So as a meta note about this conversation - just to be clear, I actually do want the build system merged, but that wasn't the actual motivation for starting this whole discussion. I mean, I think it's a good idea to get it into people's hands, and now is a good time. But that wasn't the motivation either.

The motivation was, I was under the impression that it was decided Long Ago that this would be done relatively soon and merged when it was viable to use for development. I was well out of the loop on that one, it was just "what was going to happen". It was just the inevitability.

Therefore, I needed to keep tabs on it. I need to know what the plan is because this will impact people and I have to help them get it in. It is in my interests to be on top of things like this to make life easier for everyone.

Thus, the amount of extreme discussion among multiple dimensions here, is, to me, incredibly odd, at this point. I'm not even sure why we're having most of it, on reflection, and it's frankly completely draining my will to live to have this conversation go on so long. To be clear: it should have been apparent by day 1 when this project started that it would eventually be merged into GHC and play by its rules as I outlined earlier in this post.

As far as I can see, most of the talk about git submodules and subtrees and stuff is really just a red herring to run around the conflicting facts of the matter, that:

A) The primary current developers want to keep using GitHub.

B) Some developers think this is an immature move to merge it so early, and we should merge it later.

These are the root problems. Not submodules or subtrees, only this. Anything else is a workaround to these facts.

As I have explained before, A is just, in the long run, never going to happen.

On the other hand, no matter how you look at B,eventually, it is inevitably going to happen because you're going to have to merge it one day. And on that same day, A will cease to be.

At this point I essentially see all the workarounds about this subtree/module/whatever as a way of saying "let us add technical debt to the process in the short term in order to accommodate some things that will not be problems in the long run". On second thought I regret recommending any sort of method of synchronizing outside with GitHub using some other strategy. It's clearly lead this conversation astray and into the weeds of a thousand technical tools, and is a tenuous short-term solution to a non-existent long term problem.

At this point I'm leaning towards the side of erroring with @thomie purely so we can avoid all this talk about things that don't really matter that much and actually get back to working on the build system.

That means it doesn't get into the hands of people immediately, like I want. But it also means we stop faffing about things that don't actually matter, and avoid committing to short-term 'solutions' to weird problems we invent for ourselves. That's probably for the best in the long run.

At the end of the day, this is going to go into GHC and it will play by the GHC rules. We don't generally give special treatment to almost anyone else who spends significant time working on significant improvements to any part of the compiler, in retrospect, so I see very little reason why this case is necessarily exceptional. This ticket is, in essence, a lot of arguing about why we should make a big exception and do all this weirdly complex stuff, when it's not actually clear that's true.

Therefore, after thinking about it, I now rescind my earlier vote. I recommend we shelve this issue and do not merge now. What we should do instead is:

1) Decide what the absolute minimum viable product is, to get it merged into the tree and make everybody happy that it really is a replacement. As @thomie mentioned, this would probably at least include sdist and bindist, and probably some more work on fixing the WAYs. We don't need to care about replacing things like the testsuite or autoconf, that's just a waste of time right now. This would mean we can ship GHC releases. I speculate sdist and bindist is less work than we think but would take a few weeks probably to sort out the details.

2) Do that, merge it into GHC, and call it a day.

Because there is already a tree-tremble milestone, I suggest repurposing that to include #1 above, and shelve the merge until 1) is done (that is, whatever we decide the MVP is).

thoughtpolice commented 8 years ago

To be extra clear: any commitment to some weird, short term solution is a made up solution to a made up problem, by us. Literally all of these complaints can be solved by saying, as @thomie said, "Do not merge it right now, then".

Don't want to move off GitHub for now? Don't merge it, then.

Don't think it can do all the stuff the other one can? Don't merge it, then.

Want to have some super weird complex synchronization method we'll have to remember and explain for everyone? Don't merge it, then.

I would rather have it now. But I don't want it now at the expense of technical debt we made up for ourselves, in an attempt to make everyone 'happy'. It will almost certainly end up making people miserable, in fact.

ndmitchell commented 8 years ago

I have never encouraged any of the subtree stuff (my experience is that when I do weird git things it bites me, hard, repeatedly, until I cry).

But temporarily making it a submodule seems a beautifully "in the middle" solution. Technical debt is nearly zero. Ease of experimentation increases for existing devs. Migration is not harder. It lets us hold off on the merge a bit longer (which seems to be desirable). Of course, if you'd rather not, just do nothing.

Ericson2314 commented 8 years ago

Yeah the second time I brought up subtree I plain forgot about the commit message issues, @thoughtpolice, sorry about that. For the record, git subtree is just a short-hand for creating perfectly normally commits/trees/whatever, not a new sort of data like submodules. That with --squash or read-tree are basically fancy (or overwrought :)) ways to copy files around.

snowleopard commented 8 years ago

The motivation was, I was under the impression that it was decided Long Ago that this would be done relatively soon and merged when it was viable to use for development. I was well out of the loop on that one, it was just "what was going to happen". It was just the inevitability.

@thoughtpolice You are right. It was decided. Well, it wasn't a grand carefully thought-through decision, it was basically just a couple of conversations, concluding with something like "Indeed, why not merge it sooner? Maybe people will like it and start using it!" We have never actually thought what such a merge would involve. And in this thread we finally started to look deeper and uncovered various pitfalls.

I also dislike spending much time on these discussions, I'm eager to get back to hacking instead. So, if we now have both @thomie and @thoughtpolice recommending not to rush with the merge, so be it. Less talking, more coding.

Figuring out the MVP is useful. To avoid confusing everyone, let me create a separate issue where I will list what should go into the MVP. I will then amend the tree-tremble milestone accordingly.

@thoughtpolice Feel free to close this issue if there is nothing more to discuss. Thank you for creating it; it surely was helpful for me to see how things should and will work after the merge.

snowleopard commented 7 years ago

I'm closing this issue. Once #239 is complete we can look up the Great Merge steps discussed above.

snowleopard / hadrian

Prepare ourselves for the Great Merge #232