Closed patrick-tssn closed 2 months ago
I've implemented a temporary solution: in the files modeling_visual_encoder.py, modeling_visual_tokenizer.py, and modeling_motion_tokenizer.py, I modified the LayerNorm function to use torch.bfloat16 instead of torch.float32. This adjustment is effective for inference; however, I am uncertain about its compatibility with the training pipeline. I look forward to your feedback on this matter.
Thank you for identifying this potential issue when the apex is not installed. We use the fusedlayernorm in apex during training. Your temporal solution is right.
Following this environment setting: https://github.com/jy0205/LaVIT/tree/main/VideoLaVIT#requirements, when running this script: https://github.com/jy0205/LaVIT/blob/main/VideoLaVIT/understanding.ipynb, I encounter this Error.
Could you please reassure the environment?