Open freyso opened 6 months ago
Hm there's not a single line coming from Sphinx in the error message. Do you have a small code to reproduce the error?
Could it be that there's a stray entry in the database from a time when you deleted the job files manually outside of pyiron?
Can you also maybe try to see whether a different version of pyiron helps? It might help us figure out which changes could have caused the problem.
Changing to pyiron/2024-05-20 seemed to help. I was on pyiron/latest before, which apparently is NOT latest. Is it possible that the pyiron version used on the cluster is incompatible with the pyiron/latest on the login node?
This is a VERY frustrating experience I am having here. Loads of incomprehensible warnings. Error messages with zero information value. 'Objects can be only recovered from hdf5 if TYPE is given' is essentially a 'Something error occured'.
I close the ticket, nothing to win here any more.
Changing to pyiron/2024-05-20 seemed to help. I was on pyiron/latest before, which apparently is NOT latest. Is it possible that the pyiron version used on the cluster is incompatible with the pyiron/latest on the login node?
@niklassiemer Can you comment on this?
Hmmm to my taste the PR got closed a bit too early. If there are updates I would appreciate you guys to post them here.
Changing to pyiron/2024-05-20 seemed to help. I was on pyiron/latest before, which apparently is NOT latest. Is it possible that the pyiron version used on the cluster is incompatible with the pyiron/latest on the login node?
@niklassiemer Can you comment on this?
pyiron/latest is indeed after all the hand updated version with python3.10 which was somewhat older than the docker-stack build from yesterday. However, the version on the cluster and the one on the login node should not differ! Actually, the kernel chosen in the notebook should also be loaded on the compute node via preserving of the environment. If this is not the case, I need to know and find a solution!
Got the problem again, with the new kernel. So it's not about the python kernel.
I solved the problem again. This time, by avoiding minus-sign in the job name. I may have done this last time, too.
Is it possible that the appearance of a minus sign in the job name causes issues? It seems reproducible. E20Vnm-test - fails in hdfio E20Vnm_neutral - runs.
another thought: could be some inconsistency in the name normalization. For hdf5 file '-' seems replaced by m, in the job table, the '-' is still there. In the working directory, it becomes E20Vnmmtest_hdf/E20Vnm-test/
= some mixture.
I got confused by this at some point, that's why I had changed from minus to underscore. Yet, for me, minus is more convenient to type, so high chances I do this again.
Also, when I remove the job via pr.remove_job, the _hdf5 directory stays in place.
Thanks for coming back to this! This could indeed be a reason! I opened an issue on pyiron_base
.
Summary
A SPHInX (restart) job fails to run due to failures in hdf5io. Error message is "ValueError: Objects can be only recovered from hdf5 if TYPE is given"
I cannot tell if this is related to restart.
pyiron Version and Platform
cmti
Expected Behavior
Job runs.
Actual Behavior
Job crashes. Job execution crashes with the following error.out
Steps to Reproduce
?? Deleting and setting up the job again produces the error again.