Open chillenzer opened 1 year ago
I remember seeing this problem.already and briefly investigating it (it was some years ago, though). There are obvious suspects, like the allocate
calls here and there in the code. I think I checked those but could not find anything wrong with them. I did not investigate further because just increasing the memory limit would solve the problem.
Okay. Thanks. Might have a look later. What memory needs should I expect?
I think that on Sunbird and on the Cambridge using (n_used_cores_on_node/n_node_cores)*node_memory
was working fine. By the way, I do agree that the situation is not nice and the issue should be fixed, it was just a matter of priorities.
Hi, is there any part of the program that is expected accumulate a significant amount of memory during a run? I had my recent jobs on sunbird killed by
oom-kill
after a few hundred configurations despite starting off fine. What particularly bothered me was the fact that simulations on very different lattices (implying very different memory consumption) started off fine with the same memory requested and all were kill after some time. This suggests that it is indeed an accumulation over time and not just insufficient resources for too big jobs. If there is no such thing expected, I'm afraid there's a memory leak and I will have to debug that at some point.