Closed tingyang2004 closed 3 years ago
Hi Ting, do you have the config.cfg
files available for each build? If they achieve similar results I'm guessing it's an installation inefficiency. For version 2.10
the config.cfg is located at /underworld/libUnderworld/
Another idea is that the solver tolerances maybe different in the version (@jmansour did we tweak that at some stage). Do you have the output logs available for each run?
Thanks, Julian. The comparison was done on my own desktop, and the docker version was used: docker run -v $PWD:/home/jovyan/ --rm underworldcode/underworld2:2.10.1b mpirun -np 10 python 4ASlabSubduction.py vs docker run -v $PWD:/workspace/ --rm underworldcode/underworld2:2.7.1b mpirun -np 10 python 4SlabSubduction.py
Here is the stokes solver setup:
The log files seem to suggest that the SCR RHS setup time in UW2.10.1b is longer than that of UW2.7.1b.
Here is the setup for the nonlinear part: solver.solve(nonLinearIterate=True,nonLinearMaxIterations=200,nonLinearTolerance=0.003)
Is there any progress in solving the convergence efficiency issue?
Hi Ting.
I'm unable to reproduce this. Running the standard slab subduction model against UW 2.7 & 2.10 the timings are relatively close for me.
Can you post UW 2.7 & 2.10 compatible versions of your script?
I'm able to reproduce this in serial, I'm unclear what it could be. I'm thinking it's the number of integration points (gauss points) used.
This was using the 06_SlabSubduction.ipynb
in 2.10.1b and 2.7.1b
Which model are you running @julesghub? Note that in 2.10, the slab subduction model defaults to mumps
in serial, while in 2.7 it'll use lu
.
Yeah I have seen that but I'm not sure that's significant here, try to test now. I'm running the model as mentioned above.
Thanks, both,
Below are my python scripts for 2.10.1b (4SlabSubduction.py.txt) and 2.7.1b (4ASlabSubduction.py.txt). 4ASlabSubduction.py.txt 4SlabSubduction.py.txt
I observed this issue both on my own desktop (docker) and on my uni's hpc.
Thanks for the models Ting. I'm still investigating what's going on with the example 06_SlabSubduction.ipynb
model. It shows the same behaviour using 2.10 & 2.7 when the model's inner solve methods are set to mumps (like your models), i.e. 2.7 is quicker!
Interesting - the RHS setup time presumably includes building the SCR preconditioner which seems to have suddenly become more expensive but not more effective (iterations have not changed at all). That could be something to do with gauss points v particles as the default preconditioner is built by finding the average viscosity in an element.
Using mumps
on 06_SlabSubduction. 2.10 on the right, 2.7 on the left. Pressure solve time
is the difference
Interestingly when using lu
for the inner solver (rather than mumps
) the timings are very similar.
Is it possibly a difference in versions of mumps
? Although I'll note that @tingyang2004 observed this issue also on a HPC system, for which I'd assume he was using the same version of mumps
in both UW2.7 & UW2.10 tests.
Potentially, I'm not sure on how to check the version of mumps petsc
pulls down.
@tingyang2004 did you use the dockers on HPC or compiled code? If compiled code, do the two versions use consistent petsc/mumps versions?
I did not check the versions of Petsc I used on HPC, but I assume they are different considering the one year and a half intervals between the installations. How to check the Petsc version conveniently?
After checking the petsc make.log, the version should be 3.10.5 (uw2.7.1b) and 3.12.4 (uw2.10.1b) respectively.
It looks to me that the new version of petsc used (3.12.4) has slowed down the Stokes solver. Using different versions of uw but the same version of petsc give similar solver times. So, is there any convenient way to let uw use the older version of petsc in docker?
Since petsc 3.12.4 is newer, I assume it should be faster than or at least at a similar speed to the older versions (e.g., 3.10.5 here) by deliberately tuning it?
I ran some tests using PETSc 3.10.5 against UW 2.7 & 2.10, while the results were identical for lu
, for mumps
there were definite differences, with at times the older UW being faster, and at times the new UW. It's somewhat strange, but does appear to be due to a change in how we use PETSc.
Unfortunately I don't think we can spend more time on this as it's somewhat a niche issue and very difficult to debug. So if the performance hit is too much, I'd suggest sticking with the older Underworld, or perhaps you might investigate using superludist
. This relatively recent publication suggests it does better than mumps
for their testing configuration:
https://cug.org/proceedings/cug2016_proceedings/includes/files/pap121s2-file1.pdf
Strange, how much is the MUMPS time difference between UW 2.7 and 2.10? I will stick to PETSc 3.10.5 on HPC and UW 2.7 in docker at the moment then.
It wasn't usually dramatic.. around 20% give or take from memory.
I'd suggest you at least try superludist
. It should be installed in your Docker, and possibly on your HPC too depending on how PETSc was configured. So you'd simply invoke solver.set_inner_method("superludist")
.
Thanks, John. I will check if superludist is faster in the next few weeks.
Traditionally people generally have had better luck with mumps
, but superlu_dist
seems to be actively being developed so definitely worth a try.
Let us know what you find.
Definitely, will report back when it's done.
I did not check the details, but changing MUMPS to superludist directly in 2.7.1 shows little influence on the convergence speed. However, changing MUMPS to superludist in 2.10.1 slows the convergence significantly (by around 50 times). So uw2.7.1 with MUMPS looks the best choice at present.
Tests are done in my docker.
Closing this ticket. @tingyang2004 thanks for raising this issue. We are planning on implementing performance metrics because of this kind of issue. Cheers!
Hi all,
I compared the model results and CPU time for UW2.10.1b and UW2.7.1b with a simple visco-plastic slab subduction model. The slab subduction model has a resolution of 400x120 elements and 10 cores are used with a 20 step data-saving frequency (two snapshots of the viscosity field at zero and 200 steps shown below). Although these two UW versions gave extremely close results after running for 200 steps, UW2.10.1b seems 0.6 times lower than UW2.7.1b (see the costed CPU time below). Do you know what may have caused this computational efficiency difference? Please let me know if more information is needed. Thanks a lot.
Best regards, Ting