newmanne / open_spiel

Apache License 2.0
0 stars 0 forks source link

Tell if experiments failed -> needs a better solution #40

Open newmanne opened 2 years ago

newmanne commented 2 years ago

Some runs in the last batch of experiments (mar22_4, mar24) seem to say "Killed" in the logs. See e..g, large_game_3b_pricing05-mar17lstm-104. What happened to everything after t=3 million?

I believe the log message is different when killed by user, though it's not impossible I killed it. Need to investigate more...

See logs for /shared/outputs/mar22_4/slurm-15582-large_game_3b_pricing05-mar17lstm-104_mar22_4.err

newmanne commented 2 years ago

I guess if my theory is that this wasn't killed by SLURM (which would leave a message saying WHY it killed), it was an OOM error or something?