visit-dav / visit

VisIt - Visualization and Data Analysis for Mesh-based Scientific Data
https://visit.llnl.gov
BSD 3-Clause "New" or "Revised" License
406 stars 110 forks source link

This is alarming...Engine and mdserver linked with DB plugin libs! #19565

Open markcmiller86 opened 1 month ago

markcmiller86 commented 1 month ago

Go to build dir for engine and do a make clean; make VERBOSE=1 >& junk.out and then grep junk.out for hdf5. You will get hits. But, you should NOT get hits for hdf5. hdf5 is used only in a database plugin lib. If I look at a link of the libengine_ser.dylib, I get all the items listed below.

We are also getting hits for conduit, condiut_relay and blueprint. See below. Again, those are only used in DB plugins and should not be being linked into the engine.

I believe the reason this is happening is the use of MFEM in the engine. That is fine. But, we cannot use an MFEM in the engine that depends on I/O libs needed only in the plugins. We need to build MFEM differently for use in the engine.

Taggiging @iulian787 and @vijaysm because this is impacting MOAB plugin which uses HDF5 in either serial or parallel and the fact that engine is loading hdf5 serial prevents parallel MOAB plugin from operating correctly.

/usr/bin/clang++
-fno-common
-fexceptions
-g
-isysroot
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk
-dynamiclib
-Wl,-headerpad_max_install_names
-o
../../lib/libengine_ser.dylib
-install_name
@rpath/libengine_ser.dylib
CMakeFiles/engine_ser.dir/ProgrammableCompositer.C.o
CMakeFiles/engine_ser.dir/DataNetwork.C.o
CMakeFiles/engine_ser.dir/ClonedDataNetwork.C.o
CMakeFiles/engine_ser.dir/CumulativeQueryNamedSelectionExtension.C.o
CMakeFiles/engine_ser.dir/Engine.C.o
CMakeFiles/engine_ser.dir/EngineBase.C.o
CMakeFiles/engine_ser.dir/LoadBalancer.C.o
CMakeFiles/engine_ser.dir/MesaDisplay.C.o
CMakeFiles/engine_ser.dir/Netnodes.C.o
CMakeFiles/engine_ser.dir/NetworkManager.C.o
CMakeFiles/engine_ser.dir/VisItDisplay.C.o
-L/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/conduit/v0.9.1/darwin-x86_64/lib
-L/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/hdf5/1.8.14/darwin-x86_64/lib
-L/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/zlib/1.2.13/darwin-x86_64/lib
-L/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/mfem/4.6/darwin-x86_64/lib
-Wl,-rpath,/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/conduit/v0.9.1/darwin-x86_64/lib
-Wl,-rpath,/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/hdf5/1.8.14/darwin-x86_64/lib
-Wl,-rpath,/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/zlib/1.2.13/darwin-x86_64/lib
-Wl,-rpath,/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/mfem/4.6/darwin-x86_64/lib
-Wl,-rpath,/Users/miller86/visit/visit/34rc/build/lib
../../lib/libenginerpc.dylib
../../lib/libavtwriter_ser.dylib
../../lib/libavtquery_ser.dylib
../../lib/libavtviswindow_ser.dylib
-lmfem
-lhdf5
-lz
-lhdf5
-lz
-lz
-lz
../../lib/libavtblueprint.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkIOPLY.9.2.6.dylib
../../lib/libavtshapelets.dylib
../../lib/libavtexpressions_ser.dylib
../../lib/libavtdbin_ser.dylib
../../lib/libavtpythonfilters_ser.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/python/3.9.18/darwin-x86_64/lib/libpython3.9.dylib
../../lib/libavtplotter_ser.dylib
../../lib/libavtfilters_ser.dylib
../../lib/libavtview.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkRenderingFreeType.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkfreetype.9.2.6.dylib
../../lib/libavtdatabase_ser.dylib
../../lib/libavtmir_ser.dylib
../../lib/libavtpipeline_ser.dylib
../../lib/libavtdbatts.dylib
../../lib/libvisit_verdict.dylib
../../lib/libtess2.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkRenderingRayTracing.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkDomainsChemistry.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkRenderingSceneGraph.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkRenderingOpenGL2.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkRenderingHyperTreeGrid.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkFiltersHybrid.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkRenderingUI.9.2.6.dylib
-framework
Cocoa
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkglew.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkRenderingVolume.9.2.6.dylib
-Xlinker
-framework
-Xlinker
OpenGL
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkInteractionStyle.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkRenderingCore.9.2.6.dylib
../../lib/libvisit_vtk.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkFiltersExtraction.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkIOXMLParser.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkFiltersFlowPaths.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkFiltersGeometry.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkImagingHybrid.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkIOImage.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/zlib/1.2.13/darwin-x86_64/lib/libz.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtktiff.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkImagingCore.9.2.6.dylib
../../lib/libavtmath.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkFiltersModeling.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkFiltersSources.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkFiltersGeneral.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkCommonComputationalGeometry.9.2.6.dylib
-lmfem
../../lib/liblightweight_visit_vtk.dylib
../../lib/libvisitcommon.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/zlib/1.2.13/darwin-x86_64/lib/libz.dylib
-ldl
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkIOLegacy.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkIOCore.9.2.6.dylib
-lconduit
-lconduit_relay
-lconduit_blueprint
-lhdf5
-lz
-lhdf5
-lz
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkFiltersCore.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkCommonExecutionModel.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkCommonDataModel.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkCommonTransforms.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkCommonMisc.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkCommonMath.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkkissfft.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtkCommonCore.9.2.6.dylib
/Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/vtk/9.2.6/darwin-x86_64/lib/libvtksys.9.2.6.dylib
markcmiller86 commented 1 month ago

If I disable mfem, I get the same problem with conduit. The engine includes and links with conduit so any dependencies for conduit get linked into the engine.

markcmiller86 commented 1 month ago

@iulian787 and @vijaysm if I disable conduit, mfem and fsm in my build, I can get the engine to run and use MOAB in parallel correctly...

Screen Shot 2024-05-26 at 8 01 15 PM

vijaysm commented 1 month ago

That is excellent news! Thanks for checking this thoroughly @markcmiller86

When you say MOAB parallel engine is working correctly, I assume this means there are no HDF5 property table errors that we were seeing before? I am still puzzled as to why the Ubuntu 22 version of engine_par is still failing (eventhough ldd verification showed that it was not linked against HDF5-serial).

markcmiller86 commented 1 month ago

When you say MOAB parallel engine is working correctly, I assume this means there are no HDF5 property table errors that we were seeing before?

Yes, correct.

I am still puzzled as to why the Ubuntu 22 version of engine_par is still failing (eventhough ldd verification showed that it was not linked against HDF5-serial).

I am too but I believe linux is more lenient about shared lib dependencies and so think the engine is still loading the serial hdf5 library. An strace would confirm that.

markcmiller86 commented 1 month ago

Was chatting with @brugger1 about this. One idea he has...build engine_par against hdf5_mpi instead of hdf5. That would fix the collision with MOAB parallel database plugin.

But, it would mean we continue to link the engine (and mdsever) against hdf5 and that would mean nobody would be allowed to build a custom plugin using a different version of hdf5 than what VisIt was built with.

If VisIt was not on such an ancient version of hdf5, that is probably ok. But, because we are on such an ancient version of hdf5, it would likely conflict with any plugin developer using many of the newer features in newer versions of hdf5.

So, we would need to upgrade hdf5 in VisIt.

That said, I still think it would be best to get hdf5 out of the engine and mdserver if possible.

biagas commented 1 month ago

@markcmiller86 conduit and mfem were added as dependencies when avt/Blueprint and avt/MFEM were added. With those additions, we also get whatever dependencies conduit and mfem have.

markcmiller86 commented 1 month ago

@markcmiller86 conduit and mfem were added as dependencies when avt/Blueprint and avt/MFEM were added. With those additions, we also get whatever dependencies conduit and mfem have.

Yes, that is right. So, it may mean we have to build those libs two different ways...one way for VisIt components and another way for database plugins...only the latter of which can depend on things like hdf5, netcdf, etc.

cyrush commented 1 month ago

@markcmiller86 Both Conduit and MFEM are used outside of the database plugins. We have avt libs that provide MFEM to VTK and Conduit to VTK as a general service for the engine.

While Conduit's I/O that uses HDF5 does not need to link HDF5, I think that MFEM actually links Conduit relay, which does link HDF5.

In general unless we have a fully name mangled serial hdf5 and mpi hdf5, I think we have an issue regardless if it's DB only vs in the engine. Yes we can disable plugins, but that approach won't work for an install that can be widely used

markcmiller86 commented 1 month ago

Both Conduit and MFEM are used outside of the database plugins.

Sure...but I guess my question is...what do conduit and MFEM need to do in the way of I/O with HDF5 in the engine and mdserver? I don't think the engine or mdserver need to do any I/O with or without HDF5 and so the question remains...why do business this way? It can't be for the convenience of a 3rd party lib dependency that isn't designed to build without HDF5?

As an aside, as things are designed now, no one building a custom HDF5 plugin would be able to build against an hdf5 version other than 1.8.XX (1.8.14 is over 10 years old now). So, someone wishing to use newer features in HDF5 would simply not be able to use a newer HDF5. That paricular issue is solved, of course, by updating to newer HDF5 in VisIt. But, that only minimizes that particular issue. It doesn't fix it.

cyrush commented 1 month ago

The answer is simple: MFEM does not have multiple libraries that partition features based on dependencies.

Those features are either on or off for a build of mfem.

Conduit has multiple libraries - (relay is the one with all the i/o deps), but if MFEM is using Conduit, it will also link those I/O libs.

cyrush commented 1 month ago

We are going to explore building MPI enabled HDF5 will work for all cases (engine_ser and engine_par)

markcmiller86 commented 1 month ago

Ok, I tried a simple test with my build of VisIt 3.4.1. on macOS where I have disabled MFEM and conduit. But, I do have things like Silo (which is using HDF5 in serial) and MOAB (using HDF5 in serial in a serial engine and in parallel in a parallel engine).

  1. Start VisIt ./bin/visit -np 4 (confirm I've got a parallel engine)
  2. Open Silo's multi_ucd3d.silo in the silo_hdf5_test_data directory (confirm it is indeed Silo/HDF5 data...it is)
  3. Put up a mesh and PC plot...works fine!
  4. Delete the plots and close the database...or not...it doesn't change the outcome
  5. Open a MOAB database (which will trigger the MOAB parallel plugin)
  6. Put up a PC plot of ELEM_GLOBAL_ID...it plots fine...in particular, no hdf5 api trace errors on stdout/stderr...which we have seen when its wound up confusing parallel HDF5 with serial HDF5

So, this works. And, that is because we DO NOT load plugin shared libraries using RTLD_GLOBAL which would make the symbols from the loaded library visible in the global namespace of the calling executable. We DO NOT specify RTLD_LOCAL either but it turns out if neither is specified, RTLD_LOCAL is the default behavior (on both macOS and Linux). See related ChatGPT discussion about this.

Below, I use macOS lsof to report which libraries are actually loaded into the running engine_par process using one of the PIDs...You can see it has BOTH serial and parallel HDF5 libraries loaded.

ps -ef | grep engine_par
 3640 89593 89592   0  4:24PM ??         0:02.03 /Users/miller86/visit/visit/34rc/build/exe/engine_par -plugindir /Users/miller86/.visit/3.4.1/darwin-x86_64/plugins:/Users/miller86/visit/visit/34rc/build/plugins -visithome /Users/miller86/visit/visit/34rc/build -visitarchhome /Users/miller86/visit/visit/34rc/build/ -dv -host 127.0.0.1 -port 5600 -key 1859f91da393c08335dc
 3640 89594 89592   0  4:24PM ??         0:11.80 /Users/miller86/visit/visit/34rc/build/exe/engine_par -plugindir /Users/miller86/.visit/3.4.1/darwin-x86_64/plugins:/Users/miller86/visit/visit/34rc/build/plugins -visithome /Users/miller86/visit/visit/34rc/build -visitarchhome /Users/miller86/visit/visit/34rc/build/ -dv -host 127.0.0.1 -port 5600 -key 1859f91da393c08335dc
 3640 89595 89592   0  4:24PM ??         0:11.84 /Users/miller86/visit/visit/34rc/build/exe/engine_par -plugindir /Users/miller86/.visit/3.4.1/darwin-x86_64/plugins:/Users/miller86/visit/visit/34rc/build/plugins -visithome /Users/miller86/visit/visit/34rc/build -visitarchhome /Users/miller86/visit/visit/34rc/build/ -dv -host 127.0.0.1 -port 5600 -key 1859f91da393c08335dc
 3640 89596 89592   0  4:24PM ??         0:11.82 /Users/miller86/visit/visit/34rc/build/exe/engine_par -plugindir /Users/miller86/.visit/3.4.1/darwin-x86_64/plugins:/Users/miller86/visit/visit/34rc/build/plugins -visithome /Users/miller86/visit/visit/34rc/build -visitarchhome /Users/miller86/visit/visit/34rc/build/ -dv -host 127.0.0.1 -port 5600 -key 1859f91da393c08335dc
[scratlantis:5.5.0/darwin-x86_64/lib] miller86% lsof -p 89593 | grep hdf
engine_pa 89593 miller86  txt    REG                1,4  3910464          1282989090 /Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/hdf5/1.8.14/darwin-x86_64/lib/libhdf5.9.dylib
engine_pa 89593 miller86  txt    REG                1,4  5073584          1300738963 /Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/moab_mpi/5.5.0-hdf5-1.14.3/darwin-x86_64/lib/libMOAB_mpi.5.dylib
engine_pa 89593 miller86  txt    REG                1,4  9282624          1300717741 /Users/miller86/visit/visit/34rc/release/build-mb-3.4.1-darwin-21-x86_64-release/thirdparty_shared/third_party/hdf5_mpi/1.14.3/darwin-x86_64/lib/libhdf5_mpi.310.dylib
markcmiller86 commented 1 month ago

@iulian787 and @vijaysm one option we're considering here is to do away with serial/parallel builds of HDF5. We would build only parallel HDF5 and everything in VisIt that depended on HDF5 would be linked to that one, single parallel HDF5. The "serial" tools would be have to be linked with -lmpi for example, but would, in theory anyway, never reference the MPI symbols in them.

What would you think of this?

markcmiller86 commented 1 month ago

@cyrush if we can do that with HDF5, why can't we do that with all of VisIt and get away from building all of VisIt with _par and _ser variants of everything. We just build everything parallel and agree we never reference the mpi symbols when running in serial?

vijaysm commented 1 month ago

The "serial" tools would be have to be linked with -lmpi for example, but would, in theory anyway, never reference the MPI symbols in them.

What would you think of this?

Excellent! This was my suggestion long time back. I do this all the time in my workflows using MPI wrappers to build every library in my systems, whether it is serial code or MPI aware one.

Then we would just build MOAB+HDF5 without worrying about serial builds, with a guarantee that only MPI aware HDF5 will ever be loaded by Visit. Would also simplify builds in general and reduce distribution size :-)

cyrush commented 1 month ago

@cyrush if we can do that with HDF5, why can't we do that with all of VisIt and get away from building all of VisIt with _par and _ser variants of everything. We just build everything parallel and agree we never reference the mpi symbols when running in serial?

Hi Mark, In VisIt itself - we use a single source to produce both serial and MPI libs. Things aren't partitioned.

This is convenient for many filters that share logic and then add extra communication for the MPI case.

MPI support is controlled by compiler defines, which yield the serial and parallel libs. It would require refactoring and a runtime (instead of compile time) switch to be added.

It's possible to do, but would be a major change.

markcmiller86 commented 1 month ago

MPI support is controlled by compiler defines, which yield the serial and parallel libs.

@cyrush...right...we simply adjust those #if PARALLEL code blocks to all be run-time conditionals and then we build only MPI-enabled object files and libs. This would probably halve our compile time, halve our distribution sizes and just generally simplify a lot of things in our CMake logic, build logic, plugin logic, etc.

markcmiller86 commented 1 month ago

Summarizing...

Ok, so inputs from @cyrush, @qkoziol, and @vijaysm all suggest the right way to proceed is to do away with building dependencies in different ways (e.g. with and without MPI) and just know that running a serial VisIt will never reference any MPI enabled code blocks in VisIt itself or any dependencies. It means a serial VisIt engine is still linked with -lmpi for example to satisfy the linker.

Above, @cyrush mentioned another issue I hadn't really appreciated before digging into this in detail. In VisIt, we have a lot of code blocks of the form...

#if PARALLEL 
    // do something for parallel with MPI_Xxx() calls
#else 
   // do it the serial way
#endif

This really does mean you have to compile two different ways to get two different behaviors. To do the same thing in VisIt proper with various libavtThisOrThat.so as we are aiming with TPL libs, we would need all those to blocks to be chosen at run time, not compile time.

We're kinda forced into this situation because libraries we want to use in VisIt proper such as MFEM have an indirect dependency on HDF5. So, the engine is going to wind up getting linked with an HDF5 regardless.

But, because VisIt's HDF5 is ancient (1.8.14), we really must upgrade that asap to latest HDF5 on develop. We are ok leaving it a 1.8.14 on the RC.

Here is the work to complete for this ticket then...

qkoziol commented 1 month ago

I suggest mov8ng all the way up to HDF5 1.14.4 🙂