siesta-project / aiida_siesta_plugin

Source code for the AiiDA-Siesta package (plugin and workflows). See wiki
Other
6 stars 11 forks source link

SiestaBaseWorkChain, handle "too many nodes" error #86

Open pfebrer opened 3 years ago

pfebrer commented 3 years ago

This one is easy to solve, so I guess it could be incorporated into SiestaBaseWorkchain's error handling.

The exact error in SIESTA is:

Sparse pattern is oversubscribed with nodes, please reduce number of nodes.

I would have done it myself, but I've been looking at the code and I'm not sure how to add a new error in SiestaCalculation. How does a CalcJob know which error has happened?

pfebrer commented 3 years ago

Ok, I discovered now that this is done in the parser, which looks at the MESSAGES file, and unfortunately this error does not write anything to MESSAGES.

bosonie commented 3 years ago

It's ok to use the output file (see for instance here). Usually something is written in MESSAGES, but even if nothing is there, the fact that "INFO: Job completed" is not present will signal that an error occurred. Anyway, I can implement the logic if you want, but you have to describe me well what happens and also we should check all versions of Siesta supported by the plugin (Siesta 4.0, 4.1, MaX). Do they all behave the same for this error?

pfebrer commented 3 years ago

I'm not sure if all versions show the error in the same way.

It is very easy to reproduce the error, you can check it for yourself. Submit a calculation of a single atom in a cluster with a "normal" number of processors. I don't know the limit, if you submit it in hpcq-farm5 with 24 cores you will see it :)

I don't have compilations of the three versions, if you have them maybe you can use the Iterator :sweat_smile: