leanprover / lean4

Lean 4 programming language and theorem prover
https://lean-lang.org
Apache License 2.0
3.88k stars 329 forks source link

Transfer Lake into repo #2155

Closed Kha closed 10 months ago

Kha commented 1 year ago

Resolves #2139

Kha commented 1 year ago

The merge commit was created by

git subtree -P src/lake add https://github.com/leanprover/lake cloexec --squash

and subsequently reworded, which does not seem to break anything. A subsequent

git subtree -P src/lake pull https://github.com/leanprover/lake master --squash

for example, creates this structure: image It will be interesting to see what Velcom thinks of the merge commit; might be wise to keep squashing simply to prevent it from being confused.

Kha commented 1 year ago

Oh, right! image

But I think it still allows me to push the button...?

Note that the merge commit really is obligatory: the squashed commit has the Lake repo's original structure, it is only the merge that moves all files into src/lake

gebner commented 1 year ago

Am I understanding this right that we'll have merge commits (and nonlinear history) for every Lake update in the future?

Kha commented 1 year ago

Yes, it does seem that way. Though the history will be far from messy with at most two concurrent branches at any time (edit: but only if we push Lake updates directly instead of merging them from PRs themselves containing a merge commit). Are you particularly concerned about some aspect? Do you believe it would still be worth it?

gebner commented 1 year ago

I don't think that's a problem, I just wanted to confirm. We've always tried very hard to have a linear (or was it just "clean"?) history in the Lean repo, but I don't know the full story behind that decision. I'll defer to your judgement on this.

Kha commented 1 year ago

Yes, I believe cleanliness of the history was the goal. I would say it is still quite clean after this change, with at most one parallel branch, and only when you scroll down far enough to get past a Lake update commit.

* feat:
* feat:
...
* chore: update Lake
|\
| * Squashed 'src/lake/' changes from 18430cf941..f59328e3d9
* | feat:
* | feat:
...
* | chore: update Lake
|\|
| * Squashed 'src/lake/' ...
* | feat:
...

Tools like git bisect should behave as before if used with --first-parent.

But we should wait for @leodemoura to decide whether getting rid of the submodule is worth this change.

Kha commented 1 year ago

Yet another option would be to use https://github.com/ingydotnet/git-subrepo, which apparently addresses exactly this and other limitations of git-subtree. The downside is that it is not installed with git by default, but only people pulling changes from or pushing to Lake (i.e. @tydeu) would have to use it.

gebner commented 1 year ago

I have used git-subrepo in the past and I was not happy with it. Not only is it not bundled with git, it doesn't even come packaged for most Linux distributions. And it often ended up getting stuck when pulling subrepos with local changes. It is also effectively unmaintained. (Which is a shame because I really like the concept of git-subrepo.)

At least git-subtree is used by Rust; if anything doesn't work they've probably ran into it first and also have the manpower to fix it.

gebner commented 1 year ago

Another issue we should think about is migration. I just tried checking out the PR and got this:

$ git checkout lake-subtree
error: The following untracked working tree files would be overwritten by checkout:
        src/lake/.envrc
        src/lake/.gitattributes
[several hundred lines]
Kha commented 1 year ago

Yes, checkout -f should work

Kha commented 1 year ago

New plan: KISS

Pull from Lake into Lean master:

git fetch -f https://github.com/leanprover/lake master:lake/master lean4-master:lake/lean4-master
git format-patch lake/lean4-master..lake/master --stdout | git am --directory=src/lake
git push -f https://github.com/leanprover/lean4 master:lake-master

Pull from Lean into Lake master:

git fetch -f https://github.com/leanprover/lean4 master:lean4/master lake-master:lean4/lake-master
git format-patch lean4/lake-master..lean4/master --stdout -- src/lake | git am -p3
git push -f https://github.com/leanprover/lake master:lean4-master

No merge commits, no dependencies, no voodoo

(I'm reusing the existing lean4-master branch here, but note that in this setup, the branch lean4-master is not necessarily identical to the src/lake contents in lean4 but merely tags the last lake master commit where we pulled from lean4)

gebner commented 1 year ago

Pull from Lean into Lake master:

git fetch -f https://github.com/leanprover/lean4 master:lean4/master lake-master:lean4/lake-master
git format-patch lean4/lake-master..lean4/master --stdout -- src/lake | git am -p3
git push -f https://github.com/leanprover/lake master:lean4-master

It took me a couple of minutes to understand that these instructions are missing a cd ~/lake at the top. :smile:

If we have changes to both Lean and Lake, then this part will silently discard the Lake changes, right? (In the sense that they won't propagate to Lean.)

Kha commented 1 year ago

It took me a couple of minutes to understand that these instructions are missing a cd ~/lake at the top. smile

Yeah, you have to start on the specified repo and branch.

If we have changes to both Lean and Lake, then this part will silently discard the Lake changes, right? (In the sense that they won't propagate to Lean.)

True, we would need two helper branches in each repo to accurately track when we last pulled and pushed for each one. Honestly, I think we can live without the branches, it's really not that hard to figure out the correct ranges manually from a quick look at git log and it's not something we will do every day. We pull from Lean into Lake when updating the lean-toolchain, with the old and new nightly indicating the range. From Lake into Lean, we know the Lake tag of the previous state from the release notes.

gebner commented 1 year ago

it's not something we will do every day. We pull from Lean into Lake when updating the lean-toolchain, with the old and new nightly indicating the range. From Lake into Lean, we know the Lake tag of the previous state from the release notes.

That's not a rare situation. For example, right now we've got both pending changes in Lake master as well as lean4-master. If we have to manually check every time we sync I'm certain we will miss a commit at some point.

I had also hoped that getting rid of the submodule would make it easier to implement some changes which require modifying both Lean and Lake, e.g. #2061 or #2154. In other words, after this PR I expect a lot more changes to Lake on the Lean side than before.

Kha commented 1 year ago

I see, if you want to do Lake feature development inside lean4, making sure we transferred all commits to the Lake repo might not be as easy as checking that it compiles. I'm sure we can build a solution mixing the two above suggestions that involves neither guessing nor four helper branches, but now I'm wondering if it wouldn't be easier to move Lake development into lean4 wholesale and avoid all this synchronization overhead. It doesn't seem sensible to develop features for the same project in two different repos.

gebner commented 1 year ago

I'm wondering if it wouldn't be easier to move Lake development into lean4 wholesale and avoid all this synchronization overhead.

That has been my preferred solution since the beginning. Lake and Lean are intimately tied together anyhow. We'd only need to include the native code for Lake as well (so that LeanInk and co. continue to work).

Kha commented 1 year ago

@tydeu Thoughts on the above? I recall you preferred the repo split option, but of course that was in a quite different phase of the project.

tydeu commented 1 year ago

@Kha, @gebner Personally, I still like the split repos. It makes development on my side much easier as I do not have to merge upstream Lean 4 changes and rebuild the whole thing (which is quite time consuming on my end) every time I make some commits.

I am also not sure what strong motivation there is for such a change. With lean4-master in Lake, changes are rather straightforward from both sides. On the Lake side, I just merge lake/master into lean4-master, pull it into the lean4 repo and then create a PR here (I even have a script for it). On the Lean side, if pushing, one just makes the changes in the repo and pushes to lean4/master and lake/lean4-master. With a PR, one creates a new Lean and Lake branch and PRs changes to both lean4/master and lake/lean4-master (and then the Lake changes are merged when the Lean 4 changes are). I then cherry pick the commits from lake/lean4-master into master as needed.

gebner commented 1 year ago

With lean4-master in Lake, changes are rather straightforward from both sides.

No it is not. On the Lake side everything is peachy of course. Changes that go directly to Lean 4 master are also kind of okay if a bit annoying because you can immediately push to the lean4-master branch. But the real pain are PRs or slightly longer-lived branches which require changes to Lake. Not only do you have to create separate Lake branches (and push separately, with separate commit messages, etc.), you need to merge the Lake branch when merging the Lean 4 PR (i.e., it's no longer enough to click "rebase and merge" here), and you lose git rebase.

The separate repo was maybe a good idea when there were a lot more changes to Lake than to Lean, but these days it's the other way around.

I wouldn't be so harsh if Lake was just some random external project. But it's literally impossible to work on Lean core if Lake doesn't build (hell, even make update-stage0 fails in that case).

tydeu commented 1 year ago

@gebner

Changes that go directly to Lean 4 master are also kind of okay if a bit annoying because you can immediately push to the lean4-master branch.

I am not sure what you mean by "a bit annoying because you can immediately push to the lean4-master branch. " Do you mean the problem is that the branch is not protected? As far as I know, neither is the lean4/master branch.

But the real pain are PRs or slightly longer-lived branches which require changes to Lake.

I agree that PRs are the most difficult and it does add unfun duplicate work. This is also the case that seems most benefited by this subtree idea, which is why I was happy with this solution.

I would that it is my observation that, at least in the past, is that the direct commit to PR ratio in Lean was biased heavily towards direct commits. PRs were generally reserved for major feature changes that required discussion or very localized changes that would rarely effect Lake. For the major feature PRs, the duplicate administration work required to change Lake would be a really small amount of work added for those PRs. Thus I would expect that most relevant pain point would be direct commits, which are pretty straightforward with the current setup.

Not only do you have to create separate Lake branches (and push separately, with separate commit messages, etc.)

I am not sure what your metric of difficulty is, but my thought process was that doubling up the same command twice (i.e., either git checkout -b or git commit) is not that difficult. Furthermore, IIRC, IDEs like VS Code can even make the commit simultaneously on both branches.

you need to merge the Lake branch when merging the Lean 4 PR (i.e., it's no longer enough to click "rebase and merge" here), and you lose git rebase

Yeah, this is unfortunate. The problem here is that GitHub does not have fast-forward merge button, so a Lake PR doesn't help and you have to merge the feature branch manually. At the command line, though, this is just a git rebase on for lean4 branch and a git merge --ff-only for the lake branch.

The separate repo was maybe a good idea when there were a lot more changes to Lake than to Lean, but these days it's the other way around.

It is worth noting that Sebastian mentioned, Rust also used submodules (and now uses subtrees), so its not like this particularly rare. I also do plan on picking up on the Lake changes this summer (there is quite a backlog of issues and feature requests to address).

But it's literally impossible to work on Lean core if Lake doesn't build (hell, even make update-stage0 fails in that case).

Again, I sympathize with the problem. However, the inverse is also true for Lake development. If we merge Lake into the Lean core, that I can't synchronize Lake without rebuilding Lean. I see that as a much, much larger time sink than the few extra commands required to setup a PR.

I had also hoped that getting rid of the submodule would make it easier to implement some changes which require modifying both Lean and Lake, e.g. https://github.com/leanprover/lean4/issues/2061 or https://github.com/leanprover/lean4/issues/2154.

You mentioned these as example of cases were need to update both Lake and Lean simultaneously. As far as I can tell, #2061 doesn't need this. The change can be made to Lean and then to Lake (avoiding the problem of synchronizing the PRs). I mention this because this is is something I have often found to be the case -- joint changes to Lake and Lean generally only need to made on large refactors. Many features can be implement by changing one and then the other (e.g., like Seth did in https://github.com/leanprover/lake/pull/168).

TL;DR: I sympathize with your pain and I can understand why you would advocate for the solution that is works best for you. However, putting Lake directly into the Lean 4 repository has a lot of cost and no benefit for me (that I can see), so it should come as no surprise that I am personally against it. But, it is not my decision how Lake is integrated into the Lean 4 repository so this is, in the end, just my opinion on the topic.

digama0 commented 1 year ago

But it's literally impossible to work on Lean core if Lake doesn't build (hell, even make update-stage0 fails in that case).

Again, I sympathize with the problem. However, the inverse is also true for Lake development. If we merge Lake into the Lean core, that I can't synchronize Lake without rebuilding Lean. I see that as a much, much larger time sink than the few extra commands required to setup a PR.

If Gabriel is saying "Lean is coupled with Lake" and you are saying "oh, but also Lake is coupled with Lean" then it comes out the same, doesn't it? I agree that these two projects are coupled and the last thing we need is a bootstrap build that spans multiple repos.

You already can't "synchronize lake without rebuilding lean", because if you make a change to lake that breaks the lean build process then the change is not long-term acceptable anyway. That's just like making a change to stage1 without testing that it works in stage0.

EDIT: Here's another approach that might satisfy all requirements: the lake repo becomes an "unofficial fork", while the official version moves to the appropriate location in lean4 repo. The lake repo can be used for testing out changes and so on without the cost of bootstrapping, but at the end of the day to distribute the changes via the official lean downloads they need to also be PR'd to lean4 repo.

TL;DR: I sympathize with your pain and I can understand why you would advocate for the solution that is works best for you. However, putting Lake directly into the Lean 4 repository has a lot of cost and no benefit for me (that I can see), so it should come as no surprise that I am personally against it. But, it is not my decision how Lake is integrated into the Lean 4 repository so this is, in the end, just my opinion on the topic.

You clearly are not sympathizing enough if you are only making recommendations from your own perspective, based on the personal cost to you. Sympathy is by definition looking at things from the other person's perspective.

From my perspective, the bootstrap process is already complicated enough and confusing to many, e.g. https://github.com/leanprover/lake/pull/168 that you linked even has a comment about the complexity of making coordinated changes in the description. If Lake is a required part of the lean build process (and it is), then it should be developed in the main repo.

tydeu commented 1 year ago

@digama0

You already can't "synchronize lake without rebuilding lean", because if you make a change to lake that breaks the lean build process then the change is not long-term acceptable anyway.

That is not the problem -- changes to Lake rarely break the Lean build process. The only cases I can recall where that has happen thus far is when some the of the Lake-based tests in the Lean 4 repository needed updates when the syntax changed. The problem is the other way around. That is, in order to make changes to Lake in Lean, I would need to frequently rebase and rebuild changes onto Lean master, which is a very time consuming process on Windows (~30 minutes each time). It also makes testing just more difficult due to unique way the Lake executable gets built in the Lean repository (and if I were to just run all the tests that also takes 30+ minutes).

The lake repo can be used for testing out changes and so on without the cost of bootstrapping, but at the end of the day to distribute the changes via the official lean downloads they need to also be PR'd to lean4 repo.

That is currently what is currently done.

You clearly are not sympathizing enough if you are only making recommendations from your own perspective, based on the personal cost to you. Sympathy is by definition looking at things from the other person's perspective.

All the comments I made in my post (before the TL;DR) were attempts to analyze the problem from Gabriel's perspectives (especially since I was a bit confused with some of his rational). Apparently, I did not do a good job, as it does not seem to have come off that way. You are also correct, I mixed up sympathy and empathy. I empathize with the pain (I can understand it) but do not sympathize with it (I do not feel the same way),

I am also not sure what you want from me. I understand the some people here find the submodule to be intolerable. If a subtree would makes people happier, which it originally seemed poised to do, I was happy to accept that (as it had its benefits and costs for both sides). Furthermore, even if Lake ends up embedded in Lean 4 by executive decision, it is not like I will rage quit over that. However, when asked for my opinion (as Sebastian did), I will note that such an embedded Lake is obviously not my preference since it simply makes my life as Lake's developer more difficult at no benefit to me.

If Lake is a required part of the lean build process (and it is), then it should be developed in the main repo.

Again, submodules or subtrees are not a rare or unheard concept in this situation. Rust uses them in its repo. In fact, its build system Cargo, which is is used in its bootstrap build just like Lake, is a submodule or subtree (don't know how to tell the difference in GitHub) of the rust repository just like Lake (see https://github.com/rust-lang/rust/tree/master/src/tools). Thus, I would argue that as Lake can function as independent Lean package (like cargo) it should be a separate repository (like Cargo) because that is state-of-the-art way of doing it (since Rust is usually our go-to point of comparison for things like this).

digama0 commented 1 year ago

The lake repo can be used for testing out changes and so on without the cost of bootstrapping, but at the end of the day to distribute the changes via the official lean downloads they need to also be PR'd to lean4 repo.

That is currently what is currently done.

That is why I suggested it, it is not much of a procedural change so much as a change in perspective. The main effect would be that people could make changes to lake simply by making a lean4 PR, without an obligation to make a lake PR as well. It should also be possible to wean third party projects that depend on the lake repo onto the copy of lake inside lean4 (they could of course also just continue to depend on lake repo just like any other project, but that should be just an option and not a requirement).

In the end I would imagine that lake repo would follow along with lean4 lake, pulling updates whenever you feel like working on it rather than making it the problem of everyone who wants to make a lake modification.

If Lake is a required part of the lean build process (and it is), then it should be developed in the main repo.

Again, submodules or subtrees are not a rare or unheard concept in this situation. Rust uses them in its repo. In fact, its build system Cargo, which is is used in its bootstrap build just like Lake, is a submodule or subtree (don't know how to tell the difference in GitHub) of the rust repository just like Lake (see https://github.com/rust-lang/rust/tree/master/src/tools). Thus, I would argue that as Lake can function as independent Lean package (like cargo) it should be a separate repository (like Cargo) because that is state-of-the-art way of doing it (since Rust is usually our go-to point of comparison for things like this).

Rust and Cargo are also both past 1.0 and have strong stability promises, which helps a lot in making them more loosely coupled. I don't think we are anywhere near ready for that. I'm pretty sure Cargo didn't even exist when Rust was in a similar situation as us. The Rust bootstrap build process is also fantastically complicated, with some additional tools just to manage the whole thing. (You can't use cargo to build rustc BTW, there is a specialized wrapper x.py which does the bootstrap stuff.) With enough elbow grease we could make literally any solution ergonomic, so I'm not sure the current state of rust package organization is a strong signal one way or another.

Kha commented 1 year ago

That is, in order to make changes to Lake in Lean, I would need to frequently rebase and rebuild changes onto Lean master, which is a very time consuming process on Windows (~30 minutes each time). It also makes testing just more difficult due to unique way the Lake executable gets built in the Lean repository (and if I were to just run all the tests that also takes 30+ minutes).

Is this necessarily the case? It seems to me that ensuring that src/lake remains a valid Lake package that can independently be built using its lean-toolchain should be strictly less maintenance overhead than the current setup. When changing src/lake because of a backwards-incompatible change in Lean, we would merely need to bump lean-toolchain to the nightly of the next day. In that window of <24h, master src/lake would not be buildable as an independent package, but that really is not the case all that often and one can still work on a different branch or not pull during that timespan. We could have a separate branch that points to the latest valid lean-toolchain bump, but I don't think that would be necessary.

digama0 commented 1 year ago

In that window of <24h, master src/lake would not be buildable as an independent package, but that really is not the case all that often and one can still work on a different branch or not pull during that timespan. We could have a separate branch that points to the latest valid lean-toolchain bump, but I don't think that would be necessary.

I think that branch would just be "the last nightly" though. For external users (which only use nightly), lake would never appear broken or inconsistent with the lean-toolchain. Nightly works with nightly, master works with master, makes sense to me. (Lake master also usually works with nightly lean, which I guess is your point.)

tydeu commented 1 year ago

@digama0

That is why I suggested it, it is not much of a procedural change so much as a change in perspective ,.. In the end I would imagine that lake repo would follow along with lean4 lake, pulling updates whenever you feel like working on it rather than making it the problem of everyone who wants to make a lake modification.

As far as I am aware, without a submodule or subtree, it would be impossible to pull, push changes directly to Lake. Instead, I would have to fork the the entire Lean 4 repository and copy/paste files from my Lake repository into there (assuming the Lake repository is some weird separate repository that is not actually a fork of the Lake in Lean). That is, unless it is possible to just have Lake sit as separate repository within the Lean source tree (with a committed .git directory) without making it submodule or subtree. Is that possible @Kha?

Rust and Cargo are also both past 1.0 and have strong stability promises

Cargo is still not at version 1,.0. Master is currently 0.71.

I'm pretty sure Cargo didn't even exist when Rust was in a similar situation as us.

Cargo has existed from the beginning of Rust (i.e. 0.1). Like the old leanpkg, it was a simple build system (~750) that was part of Rust's 0.1 core. It was then pulled out of the rust core around version 0.10 and split into a separate package in 2014. It was also presumably built separate (since by version 1.0.0 the README said that the standard rust build does not include Cargo). Both Rust and Cargo appear to have been built through Make without bootstrapping. The bootstrap appears to have been introduced in version 1.8 in 2016 as essentially a way to build Rust with Cargo. Cargo itself appears to have still required a a separate build. Cargo then become a submodule in the Rust repository around version 1.19 in 2017. But, even at this point, the bootstrap process apparently downloaded pre-built cargo binaries for the stage0 bootstrap.

You can't use cargo to build rustc BTW, there is a specialized wrapper x.py which does the bootstrap stuff.

While it is true that x.py manages the bootstrap, the bootstrap does use Cargo to build Rust as much as possible.

gebner commented 1 year ago

I am not sure what you mean by "a bit annoying because you can immediately push to the lean4-master branch. " Do you mean the problem is that the branch is not protected? As far as I know, neither is the lean4/master branch.

Sorry for the confusion. I meant "okay (if a bit annoying) because you can immediately push".

I would that it is my observation that, at least in the past, is that the direct commit to PR ratio in Lean was biased heavily towards direct commits.

This depends on the person. Leo pushes most things directly, whereas Sebastian, I, and everyone else mostly do PRs. As you might have noticed, the second group is vocal about hating the submodule.

I am not going to address the multiple pages of "everything's great on my end, it's only a really small amount of work for you."

It also makes testing just more difficult due to unique way the Lake executable gets built in the Lean repository (and if I were to just run all the tests that also takes 30+ minutes).

Can you elaborate on this? It is built differently (using the makefile), but we still get a lake executable in the end which can be tested, no?

Also you should know that you can run a subset of the tests using ctest -R 'leanlaketest_.*'.

For external users (which only use nightly), lake would never appear broken or inconsistent with the lean-toolchain.

@digama0 External users (you're thinking of doc-gen4 and co., right?) should IMHO link to the Lake bundled with Lean. The only piece missing for this to work is to ship the Lake native code, we already ship the oleans.

Is this necessarily the case? It seems to me that ensuring that src/lake remains a valid Lake package that can independently be built using its lean-toolchain should be strictly less maintenance overhead than the current setup.

:+1: If we keep a src/lake/lakefile.lean, then we can still build Lake without having to build Lean.

tydeu commented 1 year ago

@gebner

This depends on the person. Leo pushes most things directly, whereas Sebastian, I, and everyone else mostly do PRs. As you might have noticed, the second group is vocal about hating the submodule.

Sebastian does a mix, He pushes directly quite a lot too. He seems to mostly use PRs when he wants a discussion or a benchmark.

I am not going to address the multiple pages of "everything's great on my end, it's only a really small amount of work for you."

You wish to saddle me with 30-60 minute builds on every pull (plus the many other hurdles I mention below), so I would appreciate it if you would explain how the submodule or subtree is that comparably bad for you. Currently, you have mostly just stated you do not like it and given examples of what appear to be, as far as I can tell (since you have yet to explain otherwise) minor inconveniences quantitatively (i.e., things that can be done in a few minutes at most per PR).

It is built differently (using the makefile), but we still get a lake executable in the end which can be tested, no?

There are a number of complications with the different build structure.

Most important, the Lean build is bootstrapped. If I break Lake, it can break all the interactive features of the server (making it a pain to fix the bug). Furthermore, changes to dependent files require a full rebuild of the Lean repository (not a simple 'Refresh File Dependencies') to be reflected properly in editing (since the object files are sourced from the full toolchain build and not a local lake print-paths build). The build itself is also slower because even when the rest of Lean hasn't changed, the cmake build still has to go through check that it hasn't, and the whole of Lake has to be built (instead of just the dependencies for the file I am editing). Finally, on Windows, rebuilding Lean requires closing VS Code and then running the build from the command line because Windows cannot overwrite the executables while the Lean server (and Lake) is in use by VS Code. This means any change I make to Lake that I want to see reflected in the interactive editors a good 3-5 minutes to appear even with no Lean changes (close VS Code, run make in the build directory from the terminal, wait for it check that only Lake needs to be built, build all of Lake, reopen VS Code).

Also, at least on Windows, many interactive features just do not work properly within the bootstrapped Lean repository. For example, go-to definition will take you to the source files in the toolchain folder which has a different path than the one in the repository (even though they are the same file). This breaks some part of VS Code (e.g., it will open what it is technically the same file in two different tabs instead of just switching to the currently open one).

Maybe you do not encounter these problems because you use Linux and Nix, but it is without a doubt much easier and convenient to work on an independent Lean package than it is to work on anything in the core on Windows (and that is true even beyond Lake).

Finally, as a minor annoyance, due to the nonstandard build directory setup for Lean, testing Lake using the test scripts requires explicitly specifying the paths to the Lake (via LAKE). The ctest suite is also insufficient because it does not run some of the Lake tests (e.g., those in test).

External users (you're thinking of doc-gen4 and co., right?) should IMHO link to the Lake bundled with Lean. The only piece missing for this to work is to ship the Lake native code, we already ship the oleans.

This is a point I agree with you whole-heartedly on! 😄

Also you should know that you can run a subset of the tests using ctest -R 'leanlaketest_.*'.

Oh, neat. Did not know that. Thanks!

digama0 commented 1 year ago

You wish to saddle me with 30-60 minute builds on every pull (plus the many other hurdles I mention below),

Wasn't Sebastian's point that rather than assuming this must necessarily be the case we could instead try to solve this problem by making lake be a standalone lake project despite its residence in the lean4 repo? Those don't seem like innately linked things, and no one is suggesting that you have to bootstrap literally every change you make.

My suggestion also helps here: if you do most of your development on an external project then you can just wrap everything up in a biggish PR to lean4 when you are done, and suffer a minimum of bootstrapping pain.

Most important, the Lean build is bootstrapped. If I break Lake, it can break all the interactive features of the server (making it a pain to fix the bug).

This is surprising to me. I almost never do this: when I am developing for lean core I use a separate project for testing purposes, hooked up to the stage1 build, and it would be the same for testing lake changes. The server and vscode instance used to view the lake sources themselves will be running the stage0 build, and I only run update-stage0 once I am reasonably confident that all the bugs are worked out (and only if the bootstrap itself is relevant, e.g. for some parser changes). The majority of changes are not like this (and I think this also holds for lake changes except for ones that change syntax), so they would be exclusively stage1 changes with no bootstrap required.

Beyond this, most of the issues you mention sound like issues with lake :stuck_out_tongue: , or configuration issues (which may indicate that our documentation needs updating). What you describe doesn't sound normal to me.

tydeu commented 1 year ago

@digama0

Wasn't Sebastian's point that rather than assuming this must necessarily be the case we could instead try to solve this problem by making lake be a standalone lake project despite its residence in the lean4 repo? Those don't seem like innately linked things

Admittedly, I did not exactly understand what Sebastian's suggestion was. Was the idea that when working on just Lake within the Lean core, one should open it as independent project and develop, build, and test it like it was an independent project? If so, it sounds like that could resolve most of my development problems with the Lean core. However, it complicates Git management (as one would still need to branch, checkout, etc. on the whole Lean 4 repository). and also means I could no longer rely on Lake's CI/CD for quick testing of changes. However, such Git issues seem like minor squabbles, so this idea could work. It is probably worth trying out to see if functions well in practice.

If you do most of your development on an external project then you can just wrap everything up in a biggish PR to lean4 when you are done, and suffer a minimum of bootstrapping pain.

Yeah, I guess developing in the Lake repo and copy/pasting to and from the Lean core is always a possible solution (not sure how that connects to PRs, though).

This is surprising to me. I almost never do this: when I am developing for lean core I use a separate project for testing purposes, hooked up to the stage1 build, and it would be the same for testing lake changes. The server and vscode instance used to view the lake sources themselves will be running the stage0 build

My setup was following the Lean 4 Development Guide which states that the root of the project should be set to the stage1 build. As VS Code uses the toolchain from the root, it uses the stage1 build not stage0. I imagine this is because, otherwise, tests would be hard to debug and changes you make to the source tree will simply never show up interactively.

What you describe doesn't sound normal to me.

Do you use Windows? That might be part of it. If there is something I am doing wrong that I could fix, that would be lovely! It would be wonderful if working on the Lean 4 core was not always such a hassle for me. 🙏

gebner commented 1 year ago

If I break Lake, it can break all the interactive features of the server (making it a pain to fix the bug).

This is a real pain point. Not just for you, I constantly run into this during development when the stage1 Lake executable doesn't run for whatever reason.

Finally, on Windows, rebuilding Lean requires closing VS Code and then running the build from the command line because Windows cannot overwrite the executables while the Lean server (and Lake) is in use by VS Code.

If you change a file in src/lake then the cmake build will only rebuild Lake and not Lean, at least that's how it works here.

A very practical workaround is to use WSL2. I use it, and I don't get any of the issues you describe.

tydeu commented 1 year ago

@gebner

This is a real pain point. Not just for you, I constantly run into this during development when the stage1 Lake executable doesn't run for whatever reason.

If we are talking generally, it is also worth noting that the same thing happens if during development one breaks the Lean server for whatever reason. Maybe, like @digama0 suggested, we should be using the stage0 build at root during development (and switch to stage1 if we wish to debug tests)?

If you change a file in src/lake then the cmake build will only rebuild Lake and not Lean, at least that's how it works here.

True, but the Lake rebuild has the same problem as VS Code uses lake serve and thus locks the Lake executable as well. However, now that I think about, I could probably just disable Lake in the VS Code settings for the Lean 4 core. More principally, this could be solved for both the Lean server and Lake if the build process built a temporary executable and it copied to the standard one (since I have found cp to still work while the executable is locked).

A very practical workaround is to use WSL2. I use it, and I don't get any of the issues you describe.

My objection to WSL2 is that it does not build real Windows binaries, so everything must then always go through WSL2 in order to utilize Lean. Furthermore, when debugging, I would like to make sure things actual work on Windows not just WSL2.

Kha commented 1 year ago

and also means I could no longer rely on Lake's CI/CD for quick testing of changes

I don't usually do that, is CI really faster than local testing for you? In any case, the Nix CI could likely be faster at building Lake than Lake's current CI because of caching, just restricted to one platform.

As VS Code uses the toolchain from the root

That's not how it is intended to be used, please see https://leanprover.github.io/lean4/doc/dev/index.html#development-setup

digama0 commented 1 year ago

As VS Code uses the toolchain from the root

That's not how it is intended to be used, please see https://leanprover.github.io/lean4/doc/dev/index.html#development-setup

Mac is correct, the setup described there doesn't work for vscode, because vscode will only run one instance of the server and uses the override at the root of the project, which means that if you set the root to stage1 then you will get a stage1 server for the sources, which is bad.

I double-checked my elan override setup and I have a stage0 override at the root and a stage1 override in my lean4-test external project (which is just a basic standalone package which I can use for testing). I have one vscode instance open to lean4 and one to lean4-test; if I want to run some of the tests in the stage1 server I open the test files directly from my lean4-test vscode instance. (As an added bonus, you can also open the test files in the lean4 instance to get the file checked by both stage0 and stage1, which is useful for differential testing.)

tydeu commented 1 year ago

@Kha

Is CI really faster than local testing for you?

No, my local Windows testing is definitely faster. However, the CI tests all three 3 OSes at once, which helps me know if I accidently broke something on MacOS (which I have done sometimes and which I otherwise do not have an convenient way to test).

Kha commented 11 months ago

I think that was the wrong issue to close :smile: . And I am still interested in resolving this issue, as I again experience the death from a thousand submodules papercuts now that both lean4 and lake have frequent commits. Is this still our best understanding of the most promising plan and potential downsides?

Was the idea that when working on just Lake within the Lean core, one should open it as independent project and develop, build, and test it like it was an independent project? If so, it sounds like that could resolve most of my development problems with the Lean core. However, it complicate Git management (as one would still need to branch, checkout, etc. on the whole Lean 4 repository). and also means I could no longer rely on Lake's CI/CD for quick testing of changes. However, such Git issues seem like minor squabbles, so this idea could work. It is probably worth trying out to see if functions well in practice.

Then I agree that we should give it a try.

tydeu commented 10 months ago

@Kha I see you choose to merge master rather than lean4-master. As a result, you might also want to delete the .github directory. Otherwise, LGTM.

tydeu commented 10 months ago

@Kha Upon looking through the commits some more, there is also the unfortunate side effect that the links now refer to issues/PRs from this repository instead of the Lake repository, but I imagine that is GitHub issue we cannot fix.

Kha commented 10 months ago

I see you choose to merge master rather than lean4-master. As a result, you might also want to delete the .github directory.

Yes, master is almost linear. I thought it might be good to keep the .github directory around until you've decided on a CI workflow.

Upon looking through the commits some more, there is also the unfortunate side effect that the links now refer to issues/PRs from this repository instead of the Lake repository, but I imagine that is GitHub issue we cannot fix.

It depends on whether preserving commit hashes is more important to us than the links; if it is not, we could also use that opportunity to linearize the Lake history.

tydeu commented 10 months ago

@Kha

It depends on whether preserving commit hashes is more important to us than the links; if it is not, we could also use that opportunity to linearize the Lake history.

I would think that the exact commit hashes are likely unimportant. We will be keeping the old Lake repository up (but frozen), so Lean programs which import Lake on old Leans should not break, and the old hashes will still be there for visual reference.

Kha commented 10 months ago

There doesn't seem to be a way to rebase away the merge commits in Lake's history without resetting the committer and committer date, so I merely rewrote all issue references with filter-branch. I dearly hope no further changes are needed :smile: .

JasonGross commented 10 months ago

There doesn't seem to be a way to rebase away the merge commits in Lake's history without resetting the committer and committer date

Does git rebase -i (and then don't change the buffer before exiting) not do the right thing?

Kha commented 10 months ago

Does git rebase -i (and then don't change the buffer before exiting) not do the right thing?

No, unfortunately there are actual rebase conflicts in the history