libMesh / libmesh

libMesh github repository
http://libmesh.github.io
GNU Lesser General Public License v2.1
659 stars 286 forks source link

Breaking out libMesh components into submodules? #423

Open roystgnr opened 9 years ago

roystgnr commented 9 years ago

There are a few components in libMesh that ought to be suitable for non-FEM-based third-party projects.

We're already using the GetPot libMesh-fork in other codes. This is as simple as it gets since that's a single header file, but even still it's a headache to keep changes synched.

I'd like to reduce that headache, and to use the NumericVector class hierarchy and Parallel:: stuff in other codes. This is probably hopeless unless we use something like git submodules to handle the modularization.

Any thoughts?

jwpeterson commented 9 years ago

On Wed, Dec 17, 2014 at 1:26 PM, roystgnr notifications@github.com wrote:

There are a few components in libMesh that ought to be suitable for non-FEM-based third-party projects.

We're already using the GetPot libMesh-fork in other codes. This is as simple as it gets since that's a single header file, but even still it's a headache to keep changes synched.

I'd like to reduce that headache, and to use the NumericVector class hierarchy and Parallel:: stuff in other codes. This is probably hopeless unless we use something like git submodules to handle the modularization.

Any thoughts?

My first thought is: no way!

I definitely don't want to start splitting libmesh up into separate repos, we have enough trouble integrating changes and PRs into a single repo.

John

permcody commented 9 years ago

It will definitely be a headache, but we have created a workable system with submodules with the MOOSE "herd" repositories. It's certainly an option if you really want to move that direction. On Wed Dec 17 2014 at 1:37:15 PM John W. Peterson notifications@github.com wrote:

On Wed, Dec 17, 2014 at 1:26 PM, roystgnr notifications@github.com wrote:

There are a few components in libMesh that ought to be suitable for non-FEM-based third-party projects.

We're already using the GetPot libMesh-fork in other codes. This is as simple as it gets since that's a single header file, but even still it's a headache to keep changes synched.

I'd like to reduce that headache, and to use the NumericVector class hierarchy and Parallel:: stuff in other codes. This is probably hopeless unless we use something like git submodules to handle the modularization.

Any thoughts?

My first thought is: no way!

I definitely don't want to start splitting libmesh up into separate repos, we have enough trouble integrating changes and PRs into a single repo.

John

— Reply to this email directly or view it on GitHub https://github.com/libMesh/libmesh/issues/423#issuecomment-67389151.

pbauman commented 9 years ago

In the better late than never category (hopefully), I'd be very much in favor of this, admittedly probably for the same reasons as @roystgnr. Parallel:: and NumericVector would be no-brainers to reuse in at least 3 other codes I've played with that don't necessarily use libMesh. Having to reinvent wrappers for vectors for PETSc, Trilinos, etc. is annoying and stupid when there are perfectly good ones around. Plus, it would give a central place to update without having to fork what's done in libMesh.

I think submodules would make the integration tolerable.

Just my two cents. I acknowledge this would be a transition pain, but I think the payoff would be worth it.

friedmud commented 9 years ago

I vote against this.

I think it's a lot of headache when you can just link in libMesh and use the relevant parts and ignore the rest just as it is (I've done it in my side projects).

Further: you guys already destroyed the wonderfully simple build system we used to have (I still use special scripts to do intree builds because I refuse to "make install" every time I barely modify libMesh!). How much more crazy (and fragile) is the build system going to get if you do this?? On Wed, Feb 4, 2015 at 7:25 PM Paul T. Bauman notifications@github.com wrote:

In the better late than never category (hopefully), I'd be very much in favor of this, admittedly probably for the same reasons as @roystgnr https://github.com/roystgnr. Parallel:: and NumericVector would be no-brainers to reuse in at least 3 other codes I've played with that don't necessarily use libMesh. Having to reinvent wrappers for vectors for PETSc, Trilinos, etc. is annoying and stupid when there are perfectly good ones around. Plus, it would give a central place to update without having to fork what's done in libMesh.

I think submodules would make the integration tolerable.

Just my two cents. I acknowledge this would be a transition pain, but I think the payoff would be worth it.

— Reply to this email directly or view it on GitHub https://github.com/libMesh/libmesh/issues/423#issuecomment-72970643.

friedmud commented 9 years ago

Oh: one more thing: we use submodules for READ ONLY access to dependent libraries. Development in submodules is a REAL pain in the ass... On Wed, Feb 4, 2015 at 9:59 PM Derek Gaston friedmud@gmail.com wrote:

I vote against this.

I think it's a lot of headache when you can just link in libMesh and use the relevant parts and ignore the rest just as it is (I've done it in my side projects).

Further: you guys already destroyed the wonderfully simple build system we used to have (I still use special scripts to do intree builds because I refuse to "make install" every time I barely modify libMesh!). How much more crazy (and fragile) is the build system going to get if you do this?? On Wed, Feb 4, 2015 at 7:25 PM Paul T. Bauman notifications@github.com wrote:

In the better late than never category (hopefully), I'd be very much in favor of this, admittedly probably for the same reasons as @roystgnr https://github.com/roystgnr. Parallel:: and NumericVector would be no-brainers to reuse in at least 3 other codes I've played with that don't necessarily use libMesh. Having to reinvent wrappers for vectors for PETSc, Trilinos, etc. is annoying and stupid when there are perfectly good ones around. Plus, it would give a central place to update without having to fork what's done in libMesh.

I think submodules would make the integration tolerable.

Just my two cents. I acknowledge this would be a transition pain, but I think the payoff would be worth it.

— Reply to this email directly or view it on GitHub https://github.com/libMesh/libmesh/issues/423#issuecomment-72970643.

pbauman commented 9 years ago

Oh: one more thing: we use submodules for READ ONLY access to dependent libraries. Development in submodules is a REAL pain in the ass...

The way I thought this would play out (and maybe I'm wrong) was the "common" things have a separate repo - you develop in that. Then, libMesh pulls that in as a submodule, which can suck up changes easily. Maybe I don't understand what you mean, but aren't you guys using libMesh exactly in this way in MOOSE?

pbauman commented 9 years ago

I think it's a lot of headache when you can just link in libMesh and use the relevant parts and ignore the rest just as it is (I've done it in my side projects).

Building and linking to 100,000+ lines of libMesh to get access to a PetscVector in something that has nothing to do with FEM (like a chemistry library for example) is really just silly IMO, even sillier than just forking that part and reimplementing it (which is what will happen if we keep the common stuff in libMesh). I don't expect a chemist to care about FEM nor go to the trouble to grab libMesh and build it just so they can get access to a PetscVector and nonlinear solver (which is all we'd need). I can imagine the responses in a journal review were we to suggest that.

friedmud commented 9 years ago

Yes and no. Our users use libMesh this way... but all developers have their own clone of libMesh that they modify (and they ignore the submodule because developing in a submodule is a pain).

If we had to do the same for libMesh and vecLib and parallelLib, etc.... Then modifying libMesh will become a real chore.

But: I'll stop here because it sounds like you guys have use cases so you're probably going to do it anyway (like you did with the build system). Note: I'm not unhappy: just being a realist... no reason to fight against the tide! :-)

Whatever you guys do we'll wrap a few scripts around to make sure it doesn't suck for our users (just like we did for the build system)...

I'll leave you to it now... On Wed, Feb 4, 2015 at 10:02 PM Paul T. Bauman notifications@github.com wrote:

Oh: one more thing: we use submodules for READ ONLY access to dependent libraries. Development in submodules is a REAL pain in the ass...

The way I thought this would play out (and maybe I'm wrong) was the "common" things have a separate repo - you develop in that. Then, libMesh pulls that in as a submodule, which can suck up changes easily. Maybe I don't understand what you mean, but aren't you guys using libMesh exactly in this way in MOOSE?

— Reply to this email directly or view it on GitHub https://github.com/libMesh/libmesh/issues/423#issuecomment-72986308.

friedmud commented 9 years ago

One more piece of info... we've found out that submodules can be problematic because you have to choose the "path" for the remote repo. You don't want to require people to have ssh keys set up so then you go to https... but people might be behind firewalls and not have proxy stuff setup properly... which can lead to not being able to pull down the submodules.

Also: since the submodule is an https checkout directly from the master repo it makes it difficult for new people to edit in there and get their changes back up to a fork and into a PR.

All possible to overcome: I just wanted to mention a few of the gotchas we've run into since we started using submodules... there are real downsides..:

pbauman commented 9 years ago

That's good info, thanks. That is definitely more of a headache than I was envisioning.

And note there are no immediate plans or conversations happening offline or anything. I'm just getting my head above water from the proposal and chiming in on this was on my todo list.

friedmud commented 9 years ago

Last thing, I swear ;-)

Instead of just whining: I should offer up an alternative.

Why not just allow turning off all of the pieces of libMesh you don't need at configure time?

This is basically what PETSc does. No one stops your papers on using PETSc vectors to do an explicit finite difference code and linking against PETSc even though there are MILLIONS of lines of code in PETSc that you're not using...

friedmud commented 9 years ago

(I lied!)

When you go down this path you also have to deal with cross-repo integration issues. You can have incompatible changes that need to go into BOTH libMesh and dependent repos simultaneously... And neither libMesh nor the dependent libraries will build or work properly until it's all merged into all repos.

Testing in that environment is real trouble... and often there is nothing you can do but break one or the other.

See our recent paper on how we deal with this: http://figshare.com/articles/Continuous_Integration_for_Concurrent_MOOSE_Framework_and_Application_Development_on_GitHub/1112585

pbauman commented 9 years ago

Why not just allow turning off all of the pieces of libMesh you don't need at configure time?

Hmmm, that's not a bad idea. I could live with telling someone to grab a tarball and giving them the build script and tell them to ignore the man behind the curtain.

Along those lines, what about packaging the common stuff separately? That is, we still package libMesh as we do now, but we additionally package the common stuff separately (it would still be in the libMesh package too). It's really easy to build "convenience" libraries in the build system - I bet we could package it separately too. That stuff won't change very often... well... as often as a PETSc minor release anyway and grabbing a self-contained tar ball would be one step better than cloning libMesh/libMesh tar ball + build script.

roystgnr commented 9 years ago

Use an automake subpackage but not a git submodule?

I actually like that idea. Doing additional per-submodule tarballs would then be easy so we'd have something to provide to newbie users, and for any users advanced enough to handle git I've got no objections to saying "clone the whole libMesh repo, just work in subdirectories A and B". Moving to separate repos could be postponed indefinitely, or at least until git makes submodules more friendly to work with.

friedmud commented 9 years ago

I think that it just becomes a maintenance problem to keep the two synced.

I also think that this stuff changes more often than you're thinking. There have been 24 commits in include/parallel in the last year and 51 in include/numerics for instance. That's not huge... but if each one of those requires you to commit into one repository then manually pull those changes over that would suck. I really vote against tarballs for this stuff... I would actually start voting for submodules if those were my choices ;-)

Finally, what about cross-library stuff? vecLib definitely depends on parallelLib. Are you going to have submodules like this:

parallelLib <- vecLib <- libMesh

So now we have multilevel submodules?

friedmud commented 9 years ago
Use an automake subpackage but not a git submodule?

Oh god please no. This is what I get for opening my mouth...

roystgnr commented 9 years ago

Oh, I'm not talking about tarballs for libMesh users; just for other consumers of the subpackages. They'd treat the libMesh subpackages the way we treat PETSc or ExodusII, say - updating with major releases or bugfixes, but not trying to track every commit.

jwpeterson commented 9 years ago

But: I'll stop here because it sounds like you guys have use cases so you're probably going to do it anyway

I certainly hope not as I'm definitely still against it. Just out of curiosity, are @roystgnr and @pbauman currently using libraries that split up their components among multiple git submodules on a daily basis? If yes, I'd love to hear what your day-to-day workflow is. Here are some additional points I'd like to raise:

  1. Multiple submodules means multiple different versions of each submodule can be checked out. Not all of these will work together, and not everyone will want to use the same versions at the same time. When someone files a bug report, we have to ask them what every submodule hash they are using is.
  2. Who is going to maintain, integrate, and test all the different submodules and ensure that they work together with different configure options, METHODs, etc. etc? None of us is devoting 100% of our time to libmesh these days. i feel that if we construct a Rube Goldberg machine for developing the code, people will just... stop developing it.
  3. Is there a simple way to commit across all submodules simultaneously? I don't think so, and this means that any changes that need to touch all the submodules have to happen in separate pushes/PRs which will be tested asynchronouly, etc. We have been dealing with making API breaking changes simultaneously in applications and framework, and while it is possible, I wouldn't call it a very fun or smooth process. Even if there was a way to commit across all the submodules in a single command it wouldn't matter, because people can update them or not at their whim.

I could maybe get behind this idea if we could come up with a way to "invert" the entire process. That is, the libmesh repo is still a single git repo with no submodules, while all the subsidiary parts that you have planned exist as automatically updated mirrors of subsets (with history) of the relevant parts of the library.

friedmud commented 9 years ago

Using git subtree it's pretty easy to peel off a subdirectory of a repo to make a new Git repo (or update a repo). So it could be easy to keep an external "mirror" of numerics or parallel out there for people to use...

The question is how to take in contributions. If someone wants to modify parallelLib... and they put in a PR to the parallelLib repo... what are you going to do? Manually transition their changes over to being against the libMesh repo? Doable... but not ideal...

jwpeterson commented 9 years ago

The question is how to take in contributions.

I guess at that point if the person wants to make a PR they have to fork the main repo.... Otherwise they could always send in patches.... I didn't envision forks of the automatically maintained mirrors, but of course they could exist.

pbauman commented 9 years ago

What I'm trying to suggest is not changing the git tree at all. All that would change is the guts of the build system. Note that this would be invisible to you. The only difference is that every time we would make dist libMesh, in addition to the tar ball we usually get, we get an extra tar ball for the core stuff (by running make dist in the sub package part. Again, from your standpoint, nothing would change.

If folks want to make contributions, they do the usual thing. Perhaps we could put the common bits in their own namespace to facilitate reading the code for the common bits?