libAtoms / workflow

python workflow toolkit
GNU General Public License v2.0
35 stars 18 forks source link

Castep ft iterable_loop don't skip failed calculations #38

Closed gelzinyte closed 2 years ago

gelzinyte commented 2 years ago

Just making a note that the mechanism of "parallelise Castep single point energy/forces across multiple configs and if one of them fails return no energy/forces results for that config, but don't fail all of them" seems to be broken for Castep. Relevant to @imagdau

bernstei commented 2 years ago

We'd need to know exactly what the ase.calculator.castep.Castep object does when the calculation failed. If it's raising TypeError or CalculationFailed the mechanism should work, so if it doesn't, it's a wfl bug. If it raises some other exception, we could add it to the list that are caught by the wrapper.

bernstei commented 2 years ago

This turns out to be a fairly general problem with how the workflow handles remote jobs that fail (with something explicit like a python exception) and/or die (without a python error, e.g. being killed by the queuing system for running out of time). See #40 (and a corresponding expyre PR https://github.com/libAtoms/ExPyRe/pull/10).

bernstei commented 2 years ago

fixed in #40. DFT Calculators now skip output for any exception, and optional ability for remote job processing to skip failed jobs with new skip_failures field in remote info structure.