Open fsalfonzo opened 11 months ago
After spending some time diagnosing the problem, I believe I found the issue.
In Omnipose core.py, there is a line
# percentile clipping augmentation
if aug_choices[1]:
dp = .1 # changed this from 10 to .1, as usual pipleine uses 0.01, 10 was way too high for some images
dpct = np.random.triangular(left=0, mode=0, right=dp, size=2) # weighted toward 0
imgi[k] = utils.normalize99(imgi[k],upper=100-dpct[0],lower=dpct[1])
This routine is engaged on a normalized image. By normalizing a normalized image again, it creates NaN values thus affecting the rest of the code and exiting on an error. I hope this help the community if someone runs into the same issue.
@fsalfonzo thanks for reporting this. I haven't seen any issues with normalization, but I will check into it. Looks like you got this on the 5I_crop subset, so that is super helpful for debugging.
Hi I need some advice in fine tuning a model. For some reason I can train a model from scratch by using the following CLI:
If I try to explicitly write the --nclass 2, it crashes.
If I try the following command, it also crashes.
I don't have a problem using the model already provided. So It must be something I am missing.
Error description shown below: (omnipose) C:\Users\fsa>python -m omnipose --train --dir C:\Users\fsa\Desktop\bact_phase\train_sorted\5I_crop --mask_filter _masks --n_epochs 10 --pretrained_model bact_phase_omni --learning_rate 0.05 --diameter 0 --batch_size 16 --save_every 50 --RAdam !NEW LOGGING SETUP! To see cellpose progress, set --verbose No --verbose => no progress or info printed 2023-11-02 23:41:24,034 [INFO] >>>> using CPU 2023-11-02 23:41:24,034 [INFO] This model uses boundary field, setting nclasses=3. 2023-11-02 23:41:24,034 [INFO] Training omni model. Setting nclasses=3, RAdam=True 2023-11-02 23:41:24,038 [INFO] not all flows are present, will run flow generation for all images 2023-11-02 23:41:24,042 [INFO] pretrained model C:\Users\fsa.cellpose\models\bact_phase_omnitorch_0 is being used 2023-11-02 23:41:24,042 [INFO] median diameter set to 0 => no rescaling during training 2023-11-02 23:41:24,186 [INFO] No precomuting flows with Omnipose. Computed during training. 2023-11-02 23:41:24,205 [INFO] >>> Using RAdam optimizer 2023-11-02 23:41:24,206 [INFO] >>>> training network with 2 channel input <<<< 2023-11-02 23:41:24,206 [INFO] >>>> LR: 0.05000, batch_size: 16, weight_decay: 0.00001 2023-11-02 23:41:24,206 [INFO] >>>> ntrain = 5 2023-11-02 23:41:24,206 [INFO] >>>> nimg_per_epoch = 5 2023-11-02 23:41:24,206 [INFO] >>>> Start time: 23:41:24 C:\Users\fsa\anaconda3\envs\omnipose\lib\site-packages\omnipose\utils.py:220: RuntimeWarning: invalid value encountered in divide return module.clip((Y-lower_val)/(upper_val-lower_val),0,1) C:\Users\fsa\anaconda3\envs\omnipose\lib\site-packages\omnipose\utils.py:53: RuntimeWarning: invalid value encountered in cast return np.uint16(rescale(im)*(2*16-1)) 2023-11-02 23:41:27,116 [INFO] Train epoch: 0 | Time: 0.05min | last epoch: 0.00s | <sec/epoch>: 0.00s | <sec/batch>: 0.84s |: 1.140086 | : 1.140086
2023-11-02 23:41:27,117 [INFO] saving network parameters to C:\Users\fsa\Desktop\bact_phase\train_sorted\5I_crop\models/cellpose_residual_on_style_on_concatenation_off_omni_abstract_nclasses_3_nchan_2_dim_2_5I_crop_2023_11_02_23_41_24.194542
Traceback (most recent call last):
File "C:\Users\fsa\anaconda3\envs\omnipose\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\fsa\anaconda3\envs\omnipose\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\fsa\anaconda3\envs\omnipose\lib\site-packages\omnipose__main.py", line 12, in
main()
File "C:\Users\fsa\anaconda3\envs\omnipose\lib\site-packages\omnipose__main__.py", line 9, in main
cellpose_omni_main(args)
File "C:\Users\fsa\anaconda3\envs\omnipose\lib\site-packages\cellpose_omni\ main__.py", line 439, in main
cpmodel_path = model.train(images, labels, links, train_files=image_names,
File "C:\Users\fsa\anaconda3\envs\omnipose\lib\site-packages\cellpose_omni\models.py", line 1572, in train
model_path = self._train_net(train_data, train_labels, train_links,
File "C:\Users\fsa\anaconda3\envs\omnipose\lib\site-packages\cellpose_omni\core.py", line 1187, in _train_net
train_loss = self._train_step(self._to_device(np.stack(imgi)),lbl)
File "C:\Users\fsa\anaconda3\envs\omnipose\lib\site-packages\cellpose_omni\core.py", line 834, in _train_step
loss = self.loss_fn(lbl,y)
File "C:\Users\fsa\anaconda3\envs\omnipose\lib\site-packages\cellpose_omni\models.py", line 1396, in loss_fn
loss = omnipose.core.loss(self, lbl, y)
File "C:\Users\fsa\anaconda3\envs\omnipose\lib\site-packages\omnipose\core.py", line 2672, in loss
return 2 (5loss1+loss2+loss4+loss5+loss6)+self.criterion0(flow,veci) # golden?
File "C:\Users\fsa\anaconda3\envs\omnipose\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(args, **kwargs)
File "C:\Users\fsa\anaconda3\envs\omnipose\lib\site-packages\torchvf\losses\ivp_loss.py", line 105, in forward
pred_trajectories = self._compute_batched_trajectories(vf_pred)
File "C:\Users\fsa\anaconda3\envs\omnipose\lib\site-packages\torchvf\losses\ivp_loss.py", line 84, in _compute_batched_trajectories
trajectories = ivp_solver(
File "C:\Users\fsa\anaconda3\envs\omnipose\lib\site-packages\torchvf\numerics\integration\ivp_int.py", line 61, in ivpsolver
points, = f_solver.step(points, dx)
File "C:\Users\fsa\anaconda3\envs\omnipose\lib\site-packages\torchvf\numerics\integration\solvers.py", line 32, in step
k1 = self.f(points)
File "C:\Users\fsa\anaconda3\envs\omnipose\lib\site-packages\torchvf\numerics\interpolation\interp_vf.py", line 32, in _vf
out = nearest_interpolation_batched(vector_field, p)
File "C:\Users\fsa\anaconda3\envs\omnipose\lib\site-packages\torchvf\numerics\interpolation\functional.py", line 229, in nearest_interpolation_batched
return vf.gather(-1, points)
RuntimeError: index -9223372036854775808 is out of bounds for dimension 3 with size 224