Open jlin816 opened 5 months ago
@jlin816 Oh thanks for that!! Hmm I might leave the folder as is then! I'll add a check to not randomnly delete the folder :)
Thanks! Does this potentially cause any issues with running two jobs on the same machine (eg mixing up checkpoint data somehow)?
I'm getting the following error. I think it's probably because I'm running two training runs on the same machine which might try to create/delete the temporary file around the same time, so that the one that lags slightly behind can't find the temporary file anymore. I haven't validated that's what's happening, but hope that detail is helpful!