Open oraveczcsaba opened 5 years ago
Can please try 1.7.8? We also have a merge going on that will completely change the way resuming works, so that might just fix itself.
Sorry for the late reply, I was away for a while. I've just tried latest marian-dev, which now claims to be 1.7.12 but unfortunately we get the same error.
Noted.
The issue is now resolved, we had gperftools 2.6.1 as system (Centos 7.6) default and when I replaced this with the current 2.7 compiled from source the error was gone. I don't know, however, what has been changed between the two versions there but anyway, resuming now works.
When trying to resume a previous training we get this strange error:
The only thing that helps is if we remove the model.npz.orig.npz file. I attach a logfile of the resume command in the hope of someone being able to help and figure out why this might happen. resume-error.log