Closed HanyMKhalil closed 5 years ago
@HanyMKhalil what is your model resolution? I'd suggest for 2d simulations, you aim for around 128x128 elements per process, while for 3d simulations around 32x32x32 elements per process.
Perhaps post your ModelC.py
file up here.
Dear John, it's a 3D model the dimensions are (250km, 250km, 40km) and I aim to 1.25km resolution, so that my element resolution is (200, 200, 32). that's why I run it with 24 core (though it will speed up the process).
I tried with different element resolution, even low resolutions like 64, but still the same issue???
Here is my model, I added txt at the end so that I can upload it here
the problem is it runs so slow without giving any error messages, and at some point its dead, usually at model coordinates or model inititation, no outputs come out, however keep running till time is out!!!!
@HanyMKhalil, try 2.5 km resolution. I am running models with dimensions 384x256x128km, at a grid resolution of 2 km it is slow but I manage to get 10 myr over 96 hours. At 1.6 km it is five times slower... Patrice
Thanks Patrice I will try to do that, its a relief that some one do similar thing to mine, can I ask what configuration you use? like how many cores and memory?
These UWGeodynamics (v2.7.7) models run on 128 cpu, and mem=700GB. You can check an example on instagram bghatlas.
Thanks a lot, for sure I cannot have these number of cores on Monarch but I will try to lower my resolution to very coarse to see if it works and then try to go up, because I guess I have a problem in defining the number of cores with my resolution grid or the way the model submitted to Monarch, like sth in parallel computing I do not know.
if there is any chance you could run my model even for 0.1 m.y. just to test whether I have a problem in the model it self or the submission to Monarch? will be very helpful because I guess you are not using MonARCH
Sure, I can test your model on Raijin. Send your input file to patrice.rey@sydney.edu.au.
Dear Romain, I attached you the errors file and the output file when run the model on MonARCH Looks like an issue with the way python is compiled in the docker image, with the associated outputs from the model. The model runs fine until this point so must be an UW problem, not a docker problem
So it ran to 100000 years. Was that the stop point? Kinda looks like it completed, but then failed to tear down cleanly, which is still not ideal, but not really a problem either.
Dear John, Yes this is the end, however with high resolution it keeps running forever without producing any output? And the error file is empty
Yep, but they're two separate issues.
Hi Hany,
have you tried to run your HR job with mycode.py>test.log? that would generate a proper log file.
On Wed, Sep 11, 2019 at 1:59 PM John Mansour notifications@github.com wrote:
Yep, but they're two separate issues.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/underworldcode/UWGeodynamics/issues/138?email_source=notifications&email_token=AFMPHXOKTBCLWSIEUT6JKY3QJBUKHA5CNFSM4IUXBSI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6NFVZY#issuecomment-530209511, or mute the thread https://github.com/notifications/unsubscribe-auth/AFMPHXOAWCCDEKO6UIPRLDDQJBUKHANCNFSM4IUXBSIQ .
Hi @HanyMKhalil,
Sorry I was away for a while.
The error is related to python not stopping cleanly at the end of the model. This is a known issue. This should disappear in the next version of UW (2.9). It is annoying but should not affect your model results.
I am closing this.
Dear Romain, I constructed a simple model in UW geodynamics of two layer a Viscoplastic layer on top of viscous layer with a seed initially in the viscous layer and it runs fine with low resolution but when I increased the resolution the model stuck either at initialising the model or even at the very beginning and it does not give any errors just keep running until time is out? what is the recommended cpu cores (ntasks) I should use with the resolution? and how the model decompose it? I use singularity to run on MonARCH this is a copy of my slurm file:
!/bin/bash
SBATCH --job-name=ModelC
SBATCH --nodes=1
SBATCH --ntasks=24
SBATCH --cpus-per-task=1
SBATCH --partition=short
SBATCH --mem=72G
SBATCH --time=20:00:00
SBATCH --mail-type=ALL
SBATCH --mail-user=Hany.Khalil@monash.edu
SBATCH --error=%j.errors
SBATCH --output=%j.output
module purge
Xvfb :0 -screen 0 1600x1200x16& export DISPLAY=:0
module load python
module load singularity
module load singularity/3.0.2
run underworld in docker
singularity exec --cleanenv /usr/local/underworld/2.8.0b/uwsingularity.simg mpirun -np ${SLURM_CPUS_ON_NODE} python ModelC.py