Closed MicaelCarvalho closed 7 years ago
I've never seen this error before. It seems related to this issue in the torch-hdf5 repo, although it does not seem to be solved.
We are using HDF5 1.8.11 Regarding torch, this is the last commit in the version we are using:
commit a58889e5289ca16b78ec7223dd8bbc2e01ef97e0 Merge: cb3ad52 8abc4ba Author: Soumith Chintala soumith@gmail.com Date: Thu Oct 27 11:45:31 2016 -0400
Merge pull request #170 from howard0su/winbuild Support build torch7 on windows
Thanks for your help.
Micael and I are 99% sure that we've found a fix, because it takes time before the error occurs (it usually crash after 24 hours of training). Anyway, since https://github.com/torralba-lab/im2recipe/commit/67da133916d51cecbebd7b46b2947fc8ea1a71f2, we did not encounter the HDF5 error anymore.
Hello,
We successfully ran the code for reproducing your results. However, we're facing an intermittent problem with HDF5. The program runs for a few hours normally, but after some time HDF5 crashes, apparently due to a memory leak — we can usually relaunch it and continue training later. I'm sending the full logs below.
We have tried different versions of HDF5 1.8, without success, and we weren't able to update the HDF5 version to 1.10, since it is incompatible with torch-hdf5.
I believe this issue is not coming from your code, but rather from a bad HDF5 integration or version. But could you please inform us whether you faced the same problem, and if you managed to solve it? If not, would you be able to disclose the HDF5 version you're using, as well as the torch version?
Thanks in advance! :-)