Closed siboehm closed 2 years ago
On LINCS we have some problems with evaluations resulting in strange [0.0, 0.0, 0.0] results. There is this line:
https://github.com/theislab/chemical_CPA/blob/main/compert/train.py#L68
So probably the 0.0 come from some NaNs in the evaluation. We should remove this line. Now that the evaluation is much faster (17min without disentangle on LINCS) we can just run it more often and save intermediate checkpoints.
Probably we should calculate the test loss (during eval) and if it's NaN, we should stop the training run and retain the last checkpoint,
Good idea! Is this already somewhere implemented?
No
On LINCS we have some problems with evaluations resulting in strange [0.0, 0.0, 0.0] results. There is this line:
https://github.com/theislab/chemical_CPA/blob/main/compert/train.py#L68
So probably the 0.0 come from some NaNs in the evaluation. We should remove this line. Now that the evaluation is much faster (17min without disentangle on LINCS) we can just run it more often and save intermediate checkpoints.