marian-nmt / marian

Fast Neural Machine Translation in C++
https://marian-nmt.github.io
Other
1.22k stars 228 forks source link

Cannot resume training: Attempt to free invalid pointer error #286

Open oraveczcsaba opened 5 years ago

oraveczcsaba commented 5 years ago

When trying to resume a previous training we get this strange error:

src/tcmalloc.cc:284] Attempt to free invalid pointer 0x7f670fd8a100 
Aborted

The only thing that helps is if we remove the model.npz.orig.npz file. I attach a logfile of the resume command in the hope of someone being able to help and figure out why this might happen. resume-error.log

emjotde commented 5 years ago

Can please try 1.7.8? We also have a merge going on that will completely change the way resuming works, so that might just fix itself.

oraveczcsaba commented 5 years ago

Sorry for the late reply, I was away for a while. I've just tried latest marian-dev, which now claims to be 1.7.12 but unfortunately we get the same error.

emjotde commented 5 years ago

Noted.

oraveczcsaba commented 4 years ago

The issue is now resolved, we had gperftools 2.6.1 as system (Centos 7.6) default and when I replaced this with the current 2.7 compiled from source the error was gone. I don't know, however, what has been changed between the two versions there but anyway, resuming now works.