slimgroup / JUDI.jl

Julia Devito inversion.
https://slimgroup.github.io/JUDI.jl
MIT License
94 stars 29 forks source link

Memory consumption #262

Open kerim371 opened 1 week ago

kerim371 commented 1 week ago

Hi,

Recently I've done some computation on single Ubuntu node with 64 GB RAM and it finished successively.

Then I've tryied to do the same computations on small cluster (5 Centos 7 nodes) with 128 GB RAM and at some point I've noticed that sometimes I see the warning like not enogh memory, starting swapping and the RAM is about 110 GB filled. And after some time I alway get ar error that the connection lost or something, so I can't to perform even a single iteration of FWI.

That means on single Ubuntu node it was enough to have 64 GB RAM without swapping and on small CentOS 7 cluster 128 GB is not enough.

Any ideas of the possible reasons?

Julia's cluster manager is SSH based. Julia 1.9.3 JUDI v3.3.10

mloubout commented 6 days ago

I think there might be some issue with the parallel scheduler that leaves multiple workers alive on the node. I'll try to have alook