Training fails without CUDA

zyzzyxdonta commented 2 years ago

I know you mentioned in the README that this wasn't tested. Maybe it would still be nice to get this to work. Currently, the code tries to use CUDA and fails:

snakemake -j8 imagenette2_train

returns (among many other errors 😄):

Error in rule imagenette2_vit_small_default:
    jobid: 301
    output: outputs/vit_small/seed1328/fold-08, outputs/vit_small/seed1328/fold-08/after80/last.pth.tar, outputs/vit_small/seed1328/fold-08/after80/model_best.pth.tar
    log: outputs/vit_small/seed1328/fold-08.log (check log file(s) for error message)
    shell:
        time python timm-0.5.4-train.py data/imagenette2-320-splits/fold-08 --seed 1328 --model vit_small_patch32_224 --num-classes=10 --output outputs/vit_small/seed1328/fold-08 --checkpoint-hist 2 --epochs 80 --experiment after80 --mean 0.485 0.456 0.406 --std 0.229 0.224 0.225 -b 96 > outputs/vit_small/seed1328/fold-08.log 2>&1
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

The mentioned log file:

Training with a single process on 1 GPUs.
Model vit_small_patch32_224 created, param count:22497802
Data processing configuration for current model + dataset:
        input_size: (3, 224, 224)
        interpolation: bicubic
        mean: (0.485, 0.456, 0.406)
        std: (0.229, 0.224, 0.225)
        crop_pct: 0.9
Traceback (most recent call last):
  File "/home/pape58/Code/sota_on_uncertainties/timm-0.5.4-train.py", line 829, in <module>
    main()
  File "/home/pape58/Code/sota_on_uncertainties/timm-0.5.4-train.py", line 402, in main
    model.cuda()
  File "/home/pape58/Code/sota_on_uncertainties/venv/lib64/python3.10/site-packages/torch/nn/modules/module.py", line 688, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/pape58/Code/sota_on_uncertainties/venv/lib64/python3.10/site-packages/torch/nn/modules/module.py", line 578, in _apply
    module._apply(fn)
  File "/home/pape58/Code/sota_on_uncertainties/venv/lib64/python3.10/site-packages/torch/nn/modules/module.py", line 578, in _apply
    module._apply(fn)
  File "/home/pape58/Code/sota_on_uncertainties/venv/lib64/python3.10/site-packages/torch/nn/modules/module.py", line 601, in _apply
    param_applied = fn(param)
  File "/home/pape58/Code/sota_on_uncertainties/venv/lib64/python3.10/site-packages/torch/nn/modules/module.py", line 688, in <lambda>
    return self._apply(lambda t: t.cuda(device))
  File "/home/pape58/Code/sota_on_uncertainties/venv/lib64/python3.10/site-packages/torch/cuda/__init__.py", line 210, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

psteinb commented 2 years ago

Yes, this is something in timm. This is beyond my reach, I am afraid. I'll include a statement in the README.md and close this issue once done.

psteinb commented 2 years ago

hint landed with ca30615

psteinb / sota_on_uncertainties

Training fails without CUDA #6