Open roystgnr opened 9 years ago
On Wed, Dec 17, 2014 at 1:26 PM, roystgnr notifications@github.com wrote:
There are a few components in libMesh that ought to be suitable for non-FEM-based third-party projects.
We're already using the GetPot libMesh-fork in other codes. This is as simple as it gets since that's a single header file, but even still it's a headache to keep changes synched.
I'd like to reduce that headache, and to use the NumericVector class hierarchy and Parallel:: stuff in other codes. This is probably hopeless unless we use something like git submodules to handle the modularization.
Any thoughts?
My first thought is: no way!
I definitely don't want to start splitting libmesh up into separate repos, we have enough trouble integrating changes and PRs into a single repo.
John
It will definitely be a headache, but we have created a workable system with submodules with the MOOSE "herd" repositories. It's certainly an option if you really want to move that direction. On Wed Dec 17 2014 at 1:37:15 PM John W. Peterson notifications@github.com wrote:
On Wed, Dec 17, 2014 at 1:26 PM, roystgnr notifications@github.com wrote:
There are a few components in libMesh that ought to be suitable for non-FEM-based third-party projects.
We're already using the GetPot libMesh-fork in other codes. This is as simple as it gets since that's a single header file, but even still it's a headache to keep changes synched.
I'd like to reduce that headache, and to use the NumericVector class hierarchy and Parallel:: stuff in other codes. This is probably hopeless unless we use something like git submodules to handle the modularization.
Any thoughts?
My first thought is: no way!
I definitely don't want to start splitting libmesh up into separate repos, we have enough trouble integrating changes and PRs into a single repo.
John
— Reply to this email directly or view it on GitHub https://github.com/libMesh/libmesh/issues/423#issuecomment-67389151.
In the better late than never category (hopefully), I'd be very much in favor of this, admittedly probably for the same reasons as @roystgnr. Parallel:: and NumericVector would be no-brainers to reuse in at least 3 other codes I've played with that don't necessarily use libMesh. Having to reinvent wrappers for vectors for PETSc, Trilinos, etc. is annoying and stupid when there are perfectly good ones around. Plus, it would give a central place to update without having to fork what's done in libMesh.
I think submodules would make the integration tolerable.
Just my two cents. I acknowledge this would be a transition pain, but I think the payoff would be worth it.
I vote against this.
I think it's a lot of headache when you can just link in libMesh and use the relevant parts and ignore the rest just as it is (I've done it in my side projects).
Further: you guys already destroyed the wonderfully simple build system we used to have (I still use special scripts to do intree builds because I refuse to "make install" every time I barely modify libMesh!). How much more crazy (and fragile) is the build system going to get if you do this?? On Wed, Feb 4, 2015 at 7:25 PM Paul T. Bauman notifications@github.com wrote:
In the better late than never category (hopefully), I'd be very much in favor of this, admittedly probably for the same reasons as @roystgnr https://github.com/roystgnr. Parallel:: and NumericVector would be no-brainers to reuse in at least 3 other codes I've played with that don't necessarily use libMesh. Having to reinvent wrappers for vectors for PETSc, Trilinos, etc. is annoying and stupid when there are perfectly good ones around. Plus, it would give a central place to update without having to fork what's done in libMesh.
I think submodules would make the integration tolerable.
Just my two cents. I acknowledge this would be a transition pain, but I think the payoff would be worth it.
— Reply to this email directly or view it on GitHub https://github.com/libMesh/libmesh/issues/423#issuecomment-72970643.
Oh: one more thing: we use submodules for READ ONLY access to dependent libraries. Development in submodules is a REAL pain in the ass... On Wed, Feb 4, 2015 at 9:59 PM Derek Gaston friedmud@gmail.com wrote:
I vote against this.
I think it's a lot of headache when you can just link in libMesh and use the relevant parts and ignore the rest just as it is (I've done it in my side projects).
Further: you guys already destroyed the wonderfully simple build system we used to have (I still use special scripts to do intree builds because I refuse to "make install" every time I barely modify libMesh!). How much more crazy (and fragile) is the build system going to get if you do this?? On Wed, Feb 4, 2015 at 7:25 PM Paul T. Bauman notifications@github.com wrote:
In the better late than never category (hopefully), I'd be very much in favor of this, admittedly probably for the same reasons as @roystgnr https://github.com/roystgnr. Parallel:: and NumericVector would be no-brainers to reuse in at least 3 other codes I've played with that don't necessarily use libMesh. Having to reinvent wrappers for vectors for PETSc, Trilinos, etc. is annoying and stupid when there are perfectly good ones around. Plus, it would give a central place to update without having to fork what's done in libMesh.
I think submodules would make the integration tolerable.
Just my two cents. I acknowledge this would be a transition pain, but I think the payoff would be worth it.
— Reply to this email directly or view it on GitHub https://github.com/libMesh/libmesh/issues/423#issuecomment-72970643.
Oh: one more thing: we use submodules for READ ONLY access to dependent libraries. Development in submodules is a REAL pain in the ass...
The way I thought this would play out (and maybe I'm wrong) was the "common" things have a separate repo - you develop in that. Then, libMesh pulls that in as a submodule, which can suck up changes easily. Maybe I don't understand what you mean, but aren't you guys using libMesh exactly in this way in MOOSE?
I think it's a lot of headache when you can just link in libMesh and use the relevant parts and ignore the rest just as it is (I've done it in my side projects).
Building and linking to 100,000+ lines of libMesh to get access to a PetscVector in something that has nothing to do with FEM (like a chemistry library for example) is really just silly IMO, even sillier than just forking that part and reimplementing it (which is what will happen if we keep the common stuff in libMesh). I don't expect a chemist to care about FEM nor go to the trouble to grab libMesh and build it just so they can get access to a PetscVector and nonlinear solver (which is all we'd need). I can imagine the responses in a journal review were we to suggest that.
Yes and no. Our users use libMesh this way... but all developers have their own clone of libMesh that they modify (and they ignore the submodule because developing in a submodule is a pain).
If we had to do the same for libMesh and vecLib and parallelLib, etc.... Then modifying libMesh will become a real chore.
But: I'll stop here because it sounds like you guys have use cases so you're probably going to do it anyway (like you did with the build system). Note: I'm not unhappy: just being a realist... no reason to fight against the tide! :-)
Whatever you guys do we'll wrap a few scripts around to make sure it doesn't suck for our users (just like we did for the build system)...
I'll leave you to it now... On Wed, Feb 4, 2015 at 10:02 PM Paul T. Bauman notifications@github.com wrote:
Oh: one more thing: we use submodules for READ ONLY access to dependent libraries. Development in submodules is a REAL pain in the ass...
The way I thought this would play out (and maybe I'm wrong) was the "common" things have a separate repo - you develop in that. Then, libMesh pulls that in as a submodule, which can suck up changes easily. Maybe I don't understand what you mean, but aren't you guys using libMesh exactly in this way in MOOSE?
— Reply to this email directly or view it on GitHub https://github.com/libMesh/libmesh/issues/423#issuecomment-72986308.
One more piece of info... we've found out that submodules can be problematic because you have to choose the "path" for the remote repo. You don't want to require people to have ssh keys set up so then you go to https... but people might be behind firewalls and not have proxy stuff setup properly... which can lead to not being able to pull down the submodules.
Also: since the submodule is an https checkout directly from the master repo it makes it difficult for new people to edit in there and get their changes back up to a fork and into a PR.
All possible to overcome: I just wanted to mention a few of the gotchas we've run into since we started using submodules... there are real downsides..:
That's good info, thanks. That is definitely more of a headache than I was envisioning.
And note there are no immediate plans or conversations happening offline or anything. I'm just getting my head above water from the proposal and chiming in on this was on my todo list.
Last thing, I swear ;-)
Instead of just whining: I should offer up an alternative.
Why not just allow turning off all of the pieces of libMesh you don't need at configure time?
This is basically what PETSc does. No one stops your papers on using PETSc vectors to do an explicit finite difference code and linking against PETSc even though there are MILLIONS of lines of code in PETSc that you're not using...
(I lied!)
When you go down this path you also have to deal with cross-repo integration issues. You can have incompatible changes that need to go into BOTH libMesh and dependent repos simultaneously... And neither libMesh nor the dependent libraries will build or work properly until it's all merged into all repos.
Testing in that environment is real trouble... and often there is nothing you can do but break one or the other.
See our recent paper on how we deal with this: http://figshare.com/articles/Continuous_Integration_for_Concurrent_MOOSE_Framework_and_Application_Development_on_GitHub/1112585
Why not just allow turning off all of the pieces of libMesh you don't need at configure time?
Hmmm, that's not a bad idea. I could live with telling someone to grab a tarball and giving them the build script and tell them to ignore the man behind the curtain.
Along those lines, what about packaging the common stuff separately? That is, we still package libMesh as we do now, but we additionally package the common stuff separately (it would still be in the libMesh package too). It's really easy to build "convenience" libraries in the build system - I bet we could package it separately too. That stuff won't change very often... well... as often as a PETSc minor release anyway and grabbing a self-contained tar ball would be one step better than cloning libMesh/libMesh tar ball + build script.
Use an automake subpackage but not a git submodule?
I actually like that idea. Doing additional per-submodule tarballs would then be easy so we'd have something to provide to newbie users, and for any users advanced enough to handle git I've got no objections to saying "clone the whole libMesh repo, just work in subdirectories A and B". Moving to separate repos could be postponed indefinitely, or at least until git makes submodules more friendly to work with.
I think that it just becomes a maintenance problem to keep the two synced.
I also think that this stuff changes more often than you're thinking. There have been 24 commits in include/parallel
in the last year and 51 in include/numerics
for instance. That's not huge... but if each one of those requires you to commit into one repository then manually pull those changes over that would suck. I really vote against tarballs for this stuff... I would actually start voting for submodules if those were my choices ;-)
Finally, what about cross-library stuff? vecLib definitely depends on parallelLib. Are you going to have submodules like this:
parallelLib <- vecLib <- libMesh
So now we have multilevel submodules?
Use an automake subpackage but not a git submodule?
Oh god please no. This is what I get for opening my mouth...
Oh, I'm not talking about tarballs for libMesh users; just for other consumers of the subpackages. They'd treat the libMesh subpackages the way we treat PETSc or ExodusII, say - updating with major releases or bugfixes, but not trying to track every commit.
But: I'll stop here because it sounds like you guys have use cases so you're probably going to do it anyway
I certainly hope not as I'm definitely still against it. Just out of curiosity, are @roystgnr and @pbauman currently using libraries that split up their components among multiple git submodules on a daily basis? If yes, I'd love to hear what your day-to-day workflow is. Here are some additional points I'd like to raise:
I could maybe get behind this idea if we could come up with a way to "invert" the entire process. That is, the libmesh repo is still a single git repo with no submodules, while all the subsidiary parts that you have planned exist as automatically updated mirrors of subsets (with history) of the relevant parts of the library.
Using git subtree it's pretty easy to peel off a subdirectory of a repo to make a new Git repo (or update a repo). So it could be easy to keep an external "mirror" of numerics
or parallel
out there for people to use...
The question is how to take in contributions. If someone wants to modify parallelLib
... and they put in a PR to the parallelLib
repo... what are you going to do? Manually transition their changes over to being against the libMesh repo? Doable... but not ideal...
The question is how to take in contributions.
I guess at that point if the person wants to make a PR they have to fork the main repo.... Otherwise they could always send in patches.... I didn't envision forks of the automatically maintained mirrors, but of course they could exist.
What I'm trying to suggest is not changing the git tree at all. All that would change is the guts of the build system. Note that this would be invisible to you. The only difference is that every time we would make dist
libMesh, in addition to the tar ball we usually get, we get an extra tar ball for the core stuff (by running make dist
in the sub package part. Again, from your standpoint, nothing would change.
If folks want to make contributions, they do the usual thing. Perhaps we could put the common bits in their own namespace to facilitate reading the code for the common bits?
There are a few components in libMesh that ought to be suitable for non-FEM-based third-party projects.
We're already using the GetPot libMesh-fork in other codes. This is as simple as it gets since that's a single header file, but even still it's a headache to keep changes synched.
I'd like to reduce that headache, and to use the NumericVector class hierarchy and Parallel:: stuff in other codes. This is probably hopeless unless we use something like git submodules to handle the modularization.
Any thoughts?