Closed birnstiel closed 1 month ago
Never seen this, but this is very likely a bug in the serial I/O routine, possibly related to the filesystem (or not). Could you try enabling MPI? (no need to run on several GPUs, but enabling MPI will use another output procedure, possibly more reliable for large datasets).
Thanks for the quick answer! I will try that. That would be adding the Idefix_MPI
option and running with mpirun
with just one process? This is on a lustre file system, in case this matters.
yes, Idefix_MPI=ON
in cmake
, and then run with one process. Lustre can be touchy for large files, and the serial routines are pretty basic...
There was another run that was able to write full files only every now and then. This was very puzzling. The MPI option turned out to help because it now gave an error message: I was running out of disk space! Turns out writing files every now and then worked because in the background some old data was moved away. This is embarrassing, sorry for bothering you! 🤦♂️
Well, it still points to a defect, which is that the code is unable to identify cases where outputs were unsuccesfull using serial I/Os, so I'd say there is something to be fixed here!
Describe the issue:
I am running a slightly modified version of the VSI test (in 3D, different resolution, lower scale height, dumps+outputs every 10 orbits). The outputs are written fine according to the log file:
However this is the output directory listing, note the very small sizes of the
vtk
anddmp
files after the first output.I checked the dump file, and it seems it wrote the header and coordinates fine, but fails when reading the data of the first field
Vc-RHO
.Is this something you have seen before or an issue with the file system?
Error message:
No response
runtime information:
This is how the code is built in the slurm script:
Dump file header is
Idefix 2.1.01-2f15373c Dump Data little endian
.Below I add the beginning of the log file: