Floating point exception when running finetuning

maximilianmordig commented 3 months ago

I am getting [1] 1774566 floating point exception python main_finetune.py --batch_size 16 --model vit_large_patch16 --epochs 50 when trying to run your finetuning script. I also slightly changed the code to try out non-distributed mode, but to no avail. Your dependencies are very outdated, python3.7 is no longer supported, the timm version is 4 years old. I tried running without the version requirements (so torch>=2, latest timm), but then, I get

# use a separate environment because it requires timm version 0.3.2 for loading the model, otherwise
#  x = global_pool_nlc(x, pool_type=pool_type, num_prefix_tokens=self.num_prefix_tokens)
#   File "/lustre/home/mmordig/micromamba/envs/retfound/lib/python3.10/site-packages/timm/models/vision_transformer.py", line 409, in global_pool_nlc
#     assert not pool_type, f'Unknown pool type {pool_type}'
# AssertionError: Unknown pool type True

timm0.3.2 is not compatible with recent versions of torch (due to importing torch._six).

Do you have a pretrained model that is compatible with more recent torch/timm versions?

big97kai commented 2 months ago

Same issue. After not using global pool. The result is not very well...

codevisioner commented 1 month ago

I had the same issue and fixed it by installing the same version of the packages in the requirement.txt.

rmaphoh / RETFound_MAE

Floating point exception when running finetuning #33