plesager / ece3-postproc

Suite of processing tools for EC-Earth3 output
5 stars 8 forks source link

remove temporary files after crash. #43

Open etiennesky opened 5 years ago

etiennesky commented 5 years ago

Hi, Pablo from BSC reports that lots of temporary files remain after running ece3-postproc. Some of the folders are quite large, which happens after the scripts crash. It seems that folder created by mktemp are not automatically deleted.

As a workaround, we could put them in a unique folder (e.g. $SCRATCH/tmp_ecearth3/$expid) and delete that folder later on.

Probably a better solution is to create the folder in $TMPDIR instead of $SCRATCH, so when the job finishes the files are deleted. But this has the drawback that in case of an error the files are not around for debugging. Also it might be platform dependent, so we could define the top-level folder in the conf/ files.

But a drawback to this is that the files might not be available for debugging (also platform dependent).

Any thoughts @plesager @pabretonniere @mcastril @aearamos ?

bsc32130@login3:/gpfs/scratch/bsc32/bsc32130/tmp_ecearth3/tmp> du -sh ./*
1.0K    ./ecmean_t034_0qIhHA
1.0K    ./ecmean_t034_5AVzcc
1.0K    ./ecmean_t034_7hlXAh
1.0K    ./ecmean_t034_aBTQwg
1.0K    ./ecmean_t034_eu1TYj
1.0K    ./ecmean_t034_h8JcGP
1.0K    ./ecmean_t034_Kp21yp
1.0K    ./ecmean_t034_NCECNC
1.0K    ./ecmean_t034_nMvzky
1.0K    ./ecmean_t034_tLcSlM
1.0K    ./ecmean_t036_9DaDrU
12M ./ecmean_t036_9Dj2kD
1.0K    ./ecmean_t036_aDGc1c
1.0K    ./ecmean_t036_j2ILmS
1.0K    ./ecmean_t036_K9GEaU
1.0K    ./ecmean_t036_moPODc
1.0K    ./ecmean_t036_PfZi18
1.0K    ./ecmean_t036_QlHJzb
1.0K    ./ecmean_t036_SLHQMb
1.0K    ./ecmean_t036_TMC2Ml
1.0K    ./ecmean_t036_wyegWO
1.0K    ./ecmean_t036_XFszDW
1.0K    ./ecmean_t036_zcbSUr
1.0K    ./ecmean_t037_Kvp64O
1.0K    ./ecmean_t037_PaWzMK
1.0K    ./ecmean_t037_pcoGjx
1.0K    ./ecmean_t037_psVNmT
1.0K    ./ecmean_t037_RhDaIq
1.0K    ./ecmean_t037_rSDgjw
1.0K    ./ecmean_t037_uOMjIr
1.0K    ./ecmean_t037_UyaO23
1.0K    ./ecmean_t037_xKNQ6Z
1.0K    ./ecmean_t038_C2kjvM
1.0K    ./ecmean_t038_gGbr4D
1.0K    ./ecmean_t038_MGIaP6
1.0K    ./ecmean_t038_OcpdnF
1.0K    ./ecmean_t039_2Bu4Lk
1.0K    ./ecmean_t039_3UUYUb
1.0K    ./ecmean_t039_AVLCMI
1.0K    ./ecmean_t039_bNGqsY
1.0K    ./ecmean_t039_eOPqkL
1.0K    ./ecmean_t039_jszxmS
1.0K    ./ecmean_t039_RtxAe4
1.0K    ./ecmean_t039_WF2IaJ
1.0K    ./ecmean_t03b_3CbPFV
1.0K    ./ecmean_t03b_7KPrwv
1.0K    ./ecmean_t03b_8oi9WQ
1.0K    ./ecmean_t03b_G9KKbE
1.0K    ./ecmean_t03b_rqTzpX
1.0K    ./ecmean_t03b_SLhIu2
1.0K    ./ecmean_t03b_UpQaN1
1.0K    ./ecmean_t03d_7otzNM
1.0K    ./ecmean_t03d_bmolP1
1.0K    ./ecmean_t03d_FogxJU
1.0K    ./ecmean_t03d_hbemTS
1.0K    ./ecmean_t03d_ihXyHO
1.0K    ./ecmean_t03d_janpqU
1.0K    ./ecmean_t03d_KzHd1V
1.0K    ./ecmean_t03d_onzYq2
1.0K    ./ecmean_t03d_sc94YQ
1.0K    ./ecmean_t03d_t7RuV2
1.0K    ./ecmean_t03d_yP492b
1.0K    ./ecmean_t03d_Z04SFu
1.0K    ./ecmean_t03w_dWgOGi
1.0K    ./ecmean_t03w_gfiTWZ
1.0K    ./ecmean_t03w_Iklu7s
1.0K    ./ecmean_t03w_lODpQD
1.0K    ./ecmean_t03w_QEGFUP
712M    ./hireclim2_t036_aECpbh
5.9G    ./hireclim2_t04h_Cpy43d
5.9G    ./hireclim2_t04h_DQIRye
1.0K    ./ts_t034_0KOyGt
1.0K    ./ts_t034_0phAMu
1.0K    ./ts_t034_18uqjz
1.0K    ./ts_t034_1tQK9T
1.0K    ./ts_t034_53ye5H
1.0K    ./ts_t034_5Oufv4
1.0K    ./ts_t034_9f7azq
1.0K    ./ts_t034_9MqHz7
1.0K    ./ts_t034_aHdHS6
1.0K    ./ts_t034_Ax1NnU
1.0K    ./ts_t034_BiGQbp
1.0K    ./ts_t034_bWUbzd
1.0K    ./ts_t034_cg4u2K
1.0K    ./ts_t034_dYodFo
1.0K    ./ts_t034_E23G8h
1.0K    ./ts_t034_fKsSxh
1.0K    ./ts_t034_FV33Xt
1.0K    ./ts_t034_GkCJ9s
1.0K    ./ts_t034_GyVqgZ
1.0K    ./ts_t034_H4ADt0
1.0K    ./ts_t034_h5Atqy
1.0K    ./ts_t034_hEWadV
1.0K    ./ts_t034_J7XxLW
1.0K    ./ts_t034_jPtAWS
1.0K    ./ts_t034_JZAtdC
1.0K    ./ts_t034_KfeRz0
1.0K    ./ts_t034_L1Nc6f
1.0K    ./ts_t034_LQ81Du
1.0K    ./ts_t034_LwfbZv
1.0K    ./ts_t034_m1xo1v
1.0K    ./ts_t034_NcPEHs
1.0K    ./ts_t034_O9pjLY
1.0K    ./ts_t034_RdYIgk
1.0K    ./ts_t034_SbBYZy
1.0K    ./ts_t034_tYl5nT
1.0K    ./ts_t034_vh4E3t
1.0K    ./ts_t034_Y6bOmc
1.0K    ./ts_t034_yIPgVg
1.0K    ./ts_t034_YMHr1F
1.0K    ./ts_t034_zV8Mvh
1.0K    ./ts_t036_0p0LTe
1.0K    ./ts_t036_5OwDj9
1.0K    ./ts_t036_5SEfdQ
1.0K    ./ts_t036_7kIrY2
1.0K    ./ts_t036_7OTw7z
1.0K    ./ts_t036_7tfCeQ
1.0K    ./ts_t036_9QOq2A
1.0K    ./ts_t036_a1ZA7D
1.0K    ./ts_t036_alfU9m
1.0K    ./ts_t036_cncSax
1.0K    ./ts_t036_CZY0Jq
1.0K    ./ts_t036_D20zEb
1.0K    ./ts_t036_ddYIYB
1.0K    ./ts_t036_DmtKuy
1.0K    ./ts_t036_DUbv2b
1.0K    ./ts_t036_E8M0iW
1.0K    ./ts_t036_H1tFYJ
1.0K    ./ts_t036_h86GtI
1.0K    ./ts_t036_hlrwlx
1.0K    ./ts_t036_ifI4Pz
1.0K    ./ts_t036_KLKOI3
1.0K    ./ts_t036_LUcN5l
1.0K    ./ts_t036_McbcRN
1.0K    ./ts_t036_n5U5Tx
1.0K    ./ts_t036_NV5Fwy
1.0K    ./ts_t036_nYy4ny
1.0K    ./ts_t036_O99ceI
1.0K    ./ts_t036_oSQtQV
1.0K    ./ts_t036_OU5R0l
1.0K    ./ts_t036_P2EAUi
1.0K    ./ts_t036_R50CSJ
1.0K    ./ts_t036_RuSmfS
1.0K    ./ts_t036_si3xB1
1.0K    ./ts_t036_T0GYDQ
1.0K    ./ts_t036_tQOovK
1.0K    ./ts_t036_W4hnWA
1.0K    ./ts_t036_WeMJWM
1.0K    ./ts_t036_WyVph7
1.0K    ./ts_t036_X4A8OI
1.0K    ./ts_t036_xbJlfZ
1.0K    ./ts_t036_xHvk8T
1.0K    ./ts_t036_xqCxMf
1.0K    ./ts_t036_XTXrXp
1.0K    ./ts_t036_Y8kt7s
1.0K    ./ts_t037_43juK2
1.0K    ./ts_t037_9Wtd3M
1.0K    ./ts_t037_cR6siS
1.0K    ./ts_t037_EI0gQ0
1.0K    ./ts_t037_iLAvQq
1.0K    ./ts_t037_IohtuW
1.0K    ./ts_t037_j5RqFq
1.0K    ./ts_t037_JBMDPd
1.0K    ./ts_t037_Jgxxlc
1.0K    ./ts_t037_jpcZte
1.0K    ./ts_t037_L0AU9C
1.0K    ./ts_t037_l38WQH
1.0K    ./ts_t037_LDjVhm
1.0K    ./ts_t037_o21jMU
1.0K    ./ts_t037_OoFKkM
1.0K    ./ts_t037_PiQeLT
1.0K    ./ts_t037_Q4ZrQk
1.0K    ./ts_t037_qQpel4
1.0K    ./ts_t037_RabU5s
1.0K    ./ts_t037_Rx452j
1.0K    ./ts_t037_s4LTMb
1.0K    ./ts_t037_sdA5Y0
1.0K    ./ts_t037_ThOBg6
1.0K    ./ts_t037_TT6IgE
1.0K    ./ts_t037_ttqQBM
1.0K    ./ts_t037_TziMbP
1.0K    ./ts_t037_uAm3Bp
1.0K    ./ts_t037_Uyw9ze
1.0K    ./ts_t037_uZIWtb
1.0K    ./ts_t037_vioNJE
1.0K    ./ts_t037_VjAGfZ
1.0K    ./ts_t037_wG2zYu
1.0K    ./ts_t037_X1aGjP
1.0K    ./ts_t037_xGIy5Q
1.0K    ./ts_t037_XkRbRv
1.0K    ./ts_t037_z4wMwz
1.0K    ./ts_t038_3PTDJw
1.0K    ./ts_t038_9wHcXc
1.0K    ./ts_t038_bB66Fc
1.0K    ./ts_t038_dd8r8I
1.0K    ./ts_t038_ddLxL5
1.0K    ./ts_t038_fJImSD
1.0K    ./ts_t038_i48hti
1.0K    ./ts_t038_j5AhJX
1.0K    ./ts_t038_JihTQQ
1.0K    ./ts_t038_q4j5Up
1.0K    ./ts_t038_Rj8ieI
1.0K    ./ts_t038_s9IAZw
1.0K    ./ts_t038_v6we9m
1.0K    ./ts_t038_wk3sN2
1.0K    ./ts_t038_YszGGi
1.0K    ./ts_t038_Zc4XpY
1.0K    ./ts_t039_1ELJ53
1.0K    ./ts_t039_4B810v
1.0K    ./ts_t039_7CWsEv
1.0K    ./ts_t039_7qKh2c
1.0K    ./ts_t039_8TSpWr
1.0K    ./ts_t039_b6came
1.0K    ./ts_t039_Dh0O8J
1.0K    ./ts_t039_E3wdu1
1.0K    ./ts_t039_faChXh
1.0K    ./ts_t039_H2qNU9
1.0K    ./ts_t039_hCknqZ
1.0K    ./ts_t039_hEkicd
1.0K    ./ts_t039_iJO5Bw
1.0K    ./ts_t039_IVw9BY
1.0K    ./ts_t039_jYQytU
1.0K    ./ts_t039_M9svFx
1.0K    ./ts_t039_qEEymd
1.0K    ./ts_t039_qRSHxj
1.0K    ./ts_t039_qvo7If
1.0K    ./ts_t039_rb2SCb
1.0K    ./ts_t039_rtFrCq
1.0K    ./ts_t039_s2PGZm
1.0K    ./ts_t039_SRGsr2
1.0K    ./ts_t039_u2TvsT
1.0K    ./ts_t039_u8hrwG
1.0K    ./ts_t039_vOn96z
1.0K    ./ts_t039_w9d7Ak
1.0K    ./ts_t039_woqX8r
1.0K    ./ts_t039_wvBoyP
1.0K    ./ts_t039_xgmrLr
1.0K    ./ts_t039_xo5IYM
1.0K    ./ts_t039_YL1p7z
1.0K    ./ts_t03b_3qRhWr
1.0K    ./ts_t03b_4prEXv
1.0K    ./ts_t03b_5FJkKH
1.0K    ./ts_t03b_6zEEMP
1.0K    ./ts_t03b_7Onqc5
1.0K    ./ts_t03b_B3WdJf
1.0K    ./ts_t03b_beY4Pf
1.0K    ./ts_t03b_bU4tdR
1.0K    ./ts_t03b_cyMoMV
1.0K    ./ts_t03b_dIaB6Z
1.0K    ./ts_t03b_iqoimb
1.0K    ./ts_t03b_jokfrg
1.0K    ./ts_t03b_LhcBq8
1.0K    ./ts_t03b_lkc1vz
1.0K    ./ts_t03b_moNAoF
1.0K    ./ts_t03b_nUGRCH
1.0K    ./ts_t03b_osl5Mj
1.0K    ./ts_t03b_pDQ1ZJ
1.0K    ./ts_t03b_Q2unn3
1.0K    ./ts_t03b_rDL7p5
1.0K    ./ts_t03b_RYmeEj
1.0K    ./ts_t03b_U0tf3o
1.0K    ./ts_t03b_ubBcwN
1.0K    ./ts_t03b_VrTMsj
1.0K    ./ts_t03b_VtryRb
1.0K    ./ts_t03b_xBtK1c
1.0K    ./ts_t03b_YrjkeX
1.0K    ./ts_t03b_zG3qh5
1.0K    ./ts_t03d_6nq8pZ
1.0K    ./ts_t03d_9vSwKc
1.0K    ./ts_t03d_dBG3uQ
1.0K    ./ts_t03d_i9eDA5
1.0K    ./ts_t03d_J0MC7J
1.0K    ./ts_t03d_mlcfhB
1.0K    ./ts_t03d_mYjrf4
1.0K    ./ts_t03d_stDcTa
1.0K    ./ts_t03d_tFsgtr
1.0K    ./ts_t03d_Tt5X9q
1.0K    ./ts_t03d_WnoHB0
1.0K    ./ts_t03d_ZF2Kyl
1.0K    ./ts_t03w_A5Yeek
1.0K    ./ts_t03w_EowQQB
1.0K    ./ts_t03w_FSAcYl
1.0K    ./ts_t03w_IGujAP
1.0K    ./ts_t03w_K21vm2
1.0K    ./ts_t03w_NmMUfh
1.0K    ./ts_t03w_o0ojw9
1.0K    ./ts_t03w_rlZrzb
1.0K    ./ts_t03w_wRSS5n
1.0K    ./ts_t03w_xT4GjF
plesager commented 5 years ago

If there is a crash, you want to be able to look at the logs. Corollary: if there is a crash, the user must cleanup manually after having examined the logs. Why not delete $SCRATCH/tmp_ecearth3 all at once? Nothing there is supposed to be kept.

It is possible that clean up upon success is not perfect. Haven't check, but should be fixed if that's the case.

Also note #39, which is (remotely) related.

etiennesky commented 5 years ago

Hi Philippe

I think the cleanup upon success is only missing one part, the removal of the directory.

about your corolarry... users are lazy so the files will usually be left behind.

you suggestion to delete $SCRATCH/tmp_ecearth3 all at once is tempting, but you might end up deleting files during an ece3-postproc process is active

Not sure what is the best approach.

etiennesky commented 5 years ago

I think the best approach is to have a tmpdir which is unique for the experiment, so you can delete it when you are finished

plesager commented 5 years ago

Yes, probably the easiest way to go.