zamling / PSALM

[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"
Apache License 2.0
184 stars 8 forks source link

ValueError: matrix contains invalid numeric entries #6

Open CauchyFanUpdate opened 5 months ago

CauchyFanUpdate commented 5 months ago

When training the model, I encountered a ValueError: matrix contains invalid numeric entries and I'm not sure what the reason is. I wanted to ask the author if they have encountered similar situations and how to avoid them.

zamling commented 5 months ago

Hi @CauchyFanUpdate I didn't meet such error. Could you provide more details?

nimeidi commented 2 months ago

When training the model, I encountered a ValueError: matrix contains invalid numeric entries and I'm not sure what the reason is. I wanted to ask the author if they have encountered similar situations and how to avoid them.

I had the same problem, did you solve it?

ys-zong commented 6 days ago

I got the same problem when reached around half of the total training steps. Is it because some training instability? But the training log looks fine, the losses were decreasing. Would be great if you can help @zamling

Here's the log.

[rank2]:   File "/mypath/PSALM/psalm/train/train.py", line 462, in train                             
[rank2]:     trainer.train()                                                                                                       
[rank2]:   File "/opt/conda/envs/psalm/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train                  
[rank2]:     return inner_training_loop(                                                                                           
[rank2]:   File "/opt/conda/envs/psalm/lib/python3.10/site-packages/transformers/trainer.py", line 1854, in _inner_training_loop   
[rank2]:     tr_loss_step = self.training_step(model, inputs)                                                                      
[rank2]:   File "/opt/conda/envs/psalm/lib/python3.10/site-packages/transformers/trainer.py", line 2735, in training_step          
[rank2]:     loss = self.compute_loss(model, inputs)
[rank2]:   File "/mypath/PSALM/psalm/train/llava_trainer.py", line 280, in compute_loss    [282/1479]
[rank2]:     outputs = model(**inputs)                                                                                             
[rank2]:   File "/opt/conda/envs/psalm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl  
[rank2]:     return self._call_impl(*args, **kwargs)                                                                               
[rank2]:   File "/opt/conda/envs/psalm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl          
[rank2]:     return forward_call(*args, **kwargs)                                                                                  
[rank2]:   File "/opt/conda/envs/psalm/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn               
[rank2]:     ret_val = func(*args, **kwargs)                                                                                       
[rank2]:   File "/opt/conda/envs/psalm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1735, in forward            
[rank2]:     loss = self.module(*inputs, **kwargs)                                                                                 
[rank2]:   File "/opt/conda/envs/psalm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl  
[rank2]:     return self._call_impl(*args, **kwargs)                                                                               
[rank2]:   File "/opt/conda/envs/psalm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl          
[rank2]:     return forward_call(*args, **kwargs)                                                                                  
[rank2]:   File "/mypath/PSALM/psalm/model/language_model/llava_phi.py", line 1102, in forward       
[rank2]:     mask_losses = self.criterion(mask_outputs, targets)                                                                   
[rank2]:   File "/opt/conda/envs/psalm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl  
[rank2]:     return self._call_impl(*args, **kwargs)                                                                               
[rank2]:   File "/opt/conda/envs/psalm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl          
[rank2]:     return forward_call(*args, **kwargs)                                                                                  
[rank2]:   File "/mypath/PSALM/psalm/model/mask_decoder/mask_criterion/pretrain_criterion.py", line 3
16, in forward                                                                                                                     
[rank2]:     indices = self.matcher(outputs_without_aux, targets)                                                                  
[rank2]:   File "/opt/conda/envs/psalm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl  
[rank2]:     return self._call_impl(*args, **kwargs)                                                                               
[rank2]:   File "/opt/conda/envs/psalm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl          
[rank2]:     return forward_call(*args, **kwargs)                                                                                  
[rank2]:   File "/opt/conda/envs/psalm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context     
[rank2]:     return func(*args, **kwargs)                                                                                          
[rank2]:   File "/mypath/PSALM/psalm/model/mask_decoder/Mask2Former_Simplify/utils/matcher.py", line 
208, in forward                                                                                                                    
[rank2]:     return self.memory_efficient_forward(outputs, targets)                                                                
[rank2]:   File "/opt/conda/envs/psalm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context     
[rank2]:     return func(*args, **kwargs)                                                                                          
[rank2]:   File "/mypath/PSALM/psalm/model/mask_decoder/mask_criterion/pretrain_criterion.py", line 4
53, in memory_efficient_forward                                                                                                    
[rank2]:     indices.append(linear_sum_assignment(C))                                                                              
[rank2]: ValueError: matrix contains invalid numeric entries