zhangzjn / ADer

ADer (https://arxiv.org/abs/2406.03262) is an open source visual anomaly detection toolbox based on PyTorch, which supports multiple popular AD datasets and approaches.
https://arxiv.org/abs/2406.03262
179 stars 13 forks source link

Training is just dangling #25

Closed Jeriousman closed 2 months ago

Jeriousman commented 2 months ago

I made my data as exactlysame as format of mvtec. And then did modification on some codes to follow training my new data.

스크린샷 2024-08-30 오전 10 51 07

But after a bit of training, it is stuck and never processing forward like below showing 2/3. No more running GPU but GPU memory is still taken. Can I get some help out of this? 스크린샷 2024-08-30 오전 10 50 37

The whole thing after running the training code is shown as below

` CUDA_VISIBLE_DEVICES=0 python run.py -c configs/vitad/vitad_cj.py -m train 08/30 10:46:56 AM - ==> Logging on master GPU: 0 08/30 10:46:56 AM - ==> Running Trainer: ViTADTrainer 08/30 10:46:56 AM - ==> Using GPU: [0] for Training 08/30 10:46:56 AM - ==> Building model 08/30 10:46:56 AM - Loading pretrained weights from Hugging Face hub (timm/vit_small_patch16_224.dino) 08/30 10:46:57 AM - [timm/vit_small_patch16_224.dino] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors. 08/30 10:46:57 AM - Resized position embedding: (14, 14) to (16, 16). 08/30 10:46:57 AM - ------------------------------------ ViTAD ------------------------------------ module #parameters or shape #flops
model 38.586M 9.668G
net_t 21.689M 5.544G
net_t.cls_token (1, 1, 384)
net_t.pos_embed (1, 257, 384)
net_t.patch_embed.proj 0.295M 75.497M
net_t.patch_embed.proj.weight (384, 3, 16, 16)
net_t.patch_embed.proj.bias (384,)
net_t.blocks 21.294M 5.469G
net_t.blocks.0 1.774M 0.456G
net_t.blocks.0.norm1 0.768K 0.493M
net_t.blocks.0.norm1.weight (384,)
net_t.blocks.0.norm1.bias (384,)
net_t.blocks.0.attn 0.591M 0.152G
net_t.blocks.0.attn.qkv 0.444M 0.114G
net_t.blocks.0.attn.proj 0.148M 37.896M
net_t.blocks.0.norm2 0.768K 0.493M
net_t.blocks.0.norm2.weight (384,)
net_t.blocks.0.norm2.bias (384,)
net_t.blocks.0.mlp 1.182M 0.303G
net_t.blocks.0.mlp.fc1 0.591M 0.152G
net_t.blocks.0.mlp.fc2 0.59M 0.152G
net_t.blocks.1 1.774M 0.456G
net_t.blocks.1.norm1 0.768K 0.493M
net_t.blocks.1.norm1.weight (384,)
net_t.blocks.1.norm1.bias (384,)
net_t.blocks.1.attn 0.591M 0.152G
net_t.blocks.1.attn.qkv 0.444M 0.114G
net_t.blocks.1.attn.proj 0.148M 37.896M
net_t.blocks.1.norm2 0.768K 0.493M
net_t.blocks.1.norm2.weight (384,)
net_t.blocks.1.norm2.bias (384,)
net_t.blocks.1.mlp 1.182M 0.303G
net_t.blocks.1.mlp.fc1 0.591M 0.152G
net_t.blocks.1.mlp.fc2 0.59M 0.152G
net_t.blocks.2 1.774M 0.456G
net_t.blocks.2.norm1 0.768K 0.493M
net_t.blocks.2.norm1.weight (384,)
net_t.blocks.2.norm1.bias (384,)
net_t.blocks.2.attn 0.591M 0.152G
net_t.blocks.2.attn.qkv 0.444M 0.114G
net_t.blocks.2.attn.proj 0.148M 37.896M
net_t.blocks.2.norm2 0.768K 0.493M
net_t.blocks.2.norm2.weight (384,)
net_t.blocks.2.norm2.bias (384,)
net_t.blocks.2.mlp 1.182M 0.303G
net_t.blocks.2.mlp.fc1 0.591M 0.152G
net_t.blocks.2.mlp.fc2 0.59M 0.152G
net_t.blocks.3 1.774M 0.456G
net_t.blocks.3.norm1 0.768K 0.493M
net_t.blocks.3.norm1.weight (384,)
net_t.blocks.3.norm1.bias (384,)
net_t.blocks.3.attn 0.591M 0.152G
net_t.blocks.3.attn.qkv 0.444M 0.114G
net_t.blocks.3.attn.proj 0.148M 37.896M
net_t.blocks.3.norm2 0.768K 0.493M
net_t.blocks.3.norm2.weight (384,)
net_t.blocks.3.norm2.bias (384,)
net_t.blocks.3.mlp 1.182M 0.303G
net_t.blocks.3.mlp.fc1 0.591M 0.152G
net_t.blocks.3.mlp.fc2 0.59M 0.152G
net_t.blocks.4 1.774M 0.456G
net_t.blocks.4.norm1 0.768K 0.493M
net_t.blocks.4.norm1.weight (384,)
net_t.blocks.4.norm1.bias (384,)
net_t.blocks.4.attn 0.591M 0.152G
net_t.blocks.4.attn.qkv 0.444M 0.114G
net_t.blocks.4.attn.proj 0.148M 37.896M
net_t.blocks.4.norm2 0.768K 0.493M
net_t.blocks.4.norm2.weight (384,)
net_t.blocks.4.norm2.bias (384,)
net_t.blocks.4.mlp 1.182M 0.303G
net_t.blocks.4.mlp.fc1 0.591M 0.152G
net_t.blocks.4.mlp.fc2 0.59M 0.152G
net_t.blocks.5 1.774M 0.456G
net_t.blocks.5.norm1 0.768K 0.493M
net_t.blocks.5.norm1.weight (384,)
net_t.blocks.5.norm1.bias (384,)
net_t.blocks.5.attn 0.591M 0.152G
net_t.blocks.5.attn.qkv 0.444M 0.114G
net_t.blocks.5.attn.proj 0.148M 37.896M
net_t.blocks.5.norm2 0.768K 0.493M
net_t.blocks.5.norm2.weight (384,)
net_t.blocks.5.norm2.bias (384,)
net_t.blocks.5.mlp 1.182M 0.303G
net_t.blocks.5.mlp.fc1 0.591M 0.152G
net_t.blocks.5.mlp.fc2 0.59M 0.152G
net_t.blocks.6 1.774M 0.456G
net_t.blocks.6.norm1 0.768K 0.493M
net_t.blocks.6.norm1.weight (384,)
net_t.blocks.6.norm1.bias (384,)
net_t.blocks.6.attn 0.591M 0.152G
net_t.blocks.6.attn.qkv 0.444M 0.114G
net_t.blocks.6.attn.proj 0.148M 37.896M
net_t.blocks.6.norm2 0.768K 0.493M
net_t.blocks.6.norm2.weight (384,)
net_t.blocks.6.norm2.bias (384,)
net_t.blocks.6.mlp 1.182M 0.303G
net_t.blocks.6.mlp.fc1 0.591M 0.152G
net_t.blocks.6.mlp.fc2 0.59M 0.152G
net_t.blocks.7 1.774M 0.456G
net_t.blocks.7.norm1 0.768K 0.493M
net_t.blocks.7.norm1.weight (384,)
net_t.blocks.7.norm1.bias (384,)
net_t.blocks.7.attn 0.591M 0.152G
net_t.blocks.7.attn.qkv 0.444M 0.114G
net_t.blocks.7.attn.proj 0.148M 37.896M
net_t.blocks.7.norm2 0.768K 0.493M
net_t.blocks.7.norm2.weight (384,)
net_t.blocks.7.norm2.bias (384,)
net_t.blocks.7.mlp 1.182M 0.303G
net_t.blocks.7.mlp.fc1 0.591M 0.152G
net_t.blocks.7.mlp.fc2 0.59M 0.152G
net_t.blocks.8 1.774M 0.456G
net_t.blocks.8.norm1 0.768K 0.493M
net_t.blocks.8.norm1.weight (384,)
net_t.blocks.8.norm1.bias (384,)
net_t.blocks.8.attn 0.591M 0.152G
net_t.blocks.8.attn.qkv 0.444M 0.114G
net_t.blocks.8.attn.proj 0.148M 37.896M
net_t.blocks.8.norm2 0.768K 0.493M
net_t.blocks.8.norm2.weight (384,)
net_t.blocks.8.norm2.bias (384,)
net_t.blocks.8.mlp 1.182M 0.303G
net_t.blocks.8.mlp.fc1 0.591M 0.152G
net_t.blocks.8.mlp.fc2 0.59M 0.152G
net_t.blocks.9 1.774M 0.456G
net_t.blocks.9.norm1 0.768K 0.493M
net_t.blocks.9.norm1.weight (384,)
net_t.blocks.9.norm1.bias (384,)
net_t.blocks.9.attn 0.591M 0.152G
net_t.blocks.9.attn.qkv 0.444M 0.114G
net_t.blocks.9.attn.proj 0.148M 37.896M
net_t.blocks.9.norm2 0.768K 0.493M
net_t.blocks.9.norm2.weight (384,)
net_t.blocks.9.norm2.bias (384,)
net_t.blocks.9.mlp 1.182M 0.303G
net_t.blocks.9.mlp.fc1 0.591M 0.152G
net_t.blocks.9.mlp.fc2 0.59M 0.152G
net_t.blocks.10 1.774M 0.456G
net_t.blocks.10.norm1 0.768K 0.493M
net_t.blocks.10.norm1.weight (384,)
net_t.blocks.10.norm1.bias (384,)
net_t.blocks.10.attn 0.591M 0.152G
net_t.blocks.10.attn.qkv 0.444M 0.114G
net_t.blocks.10.attn.proj 0.148M 37.896M
net_t.blocks.10.norm2 0.768K 0.493M
net_t.blocks.10.norm2.weight (384,)
net_t.blocks.10.norm2.bias (384,)
net_t.blocks.10.mlp 1.182M 0.303G
net_t.blocks.10.mlp.fc1 0.591M 0.152G
net_t.blocks.10.mlp.fc2 0.59M 0.152G
net_t.blocks.11 1.774M 0.456G
net_t.blocks.11.norm1 0.768K 0.493M
net_t.blocks.11.norm1.weight (384,)
net_t.blocks.11.norm1.bias (384,)
net_t.blocks.11.attn 0.591M 0.152G
net_t.blocks.11.attn.qkv 0.444M 0.114G
net_t.blocks.11.attn.proj 0.148M 37.896M
net_t.blocks.11.norm2 0.768K 0.493M
net_t.blocks.11.norm2.weight (384,)
net_t.blocks.11.norm2.bias (384,)
net_t.blocks.11.mlp 1.182M 0.303G
net_t.blocks.11.mlp.fc1 0.591M 0.152G
net_t.blocks.11.mlp.fc2 0.59M 0.152G
net_t.norm 0.768K
net_t.norm.weight (384,)
net_t.norm.bias (384,)
net_fusion.fc 0.148M 37.749M
net_fusion.fc.weight (384, 384)
net_fusion.fc.bias (384,)
net_s 16.75M 4.086G
net_s.cls_token (1, 1, 384)
net_s.pos_embed (1, 256, 384)
net_s.patch_embed.proj 0.295M
net_s.patch_embed.proj.weight (384, 3, 16, 16)
net_s.patch_embed.proj.bias (384,)
net_s.blocks 15.97M 4.086G
net_s.blocks.0 1.774M 0.454G
net_s.blocks.0.norm1 0.768K 0.492M
net_s.blocks.0.norm1.weight (384,)
net_s.blocks.0.norm1.bias (384,)
net_s.blocks.0.attn 0.591M 0.151G
net_s.blocks.0.attn.qkv 0.444M 0.113G
net_s.blocks.0.attn.proj 0.148M 37.749M
net_s.blocks.0.norm2 0.768K 0.492M
net_s.blocks.0.norm2.weight (384,)
net_s.blocks.0.norm2.bias (384,)
net_s.blocks.0.mlp 1.182M 0.302G
net_s.blocks.0.mlp.fc1 0.591M 0.151G
net_s.blocks.0.mlp.fc2 0.59M 0.151G
net_s.blocks.1 1.774M 0.454G
net_s.blocks.1.norm1 0.768K 0.492M
net_s.blocks.1.norm1.weight (384,)
net_s.blocks.1.norm1.bias (384,)
net_s.blocks.1.attn 0.591M 0.151G
net_s.blocks.1.attn.qkv 0.444M 0.113G
net_s.blocks.1.attn.proj 0.148M 37.749M
net_s.blocks.1.norm2 0.768K 0.492M
net_s.blocks.1.norm2.weight (384,)
net_s.blocks.1.norm2.bias (384,)
net_s.blocks.1.mlp 1.182M 0.302G
net_s.blocks.1.mlp.fc1 0.591M 0.151G
net_s.blocks.1.mlp.fc2 0.59M 0.151G
net_s.blocks.2 1.774M 0.454G
net_s.blocks.2.norm1 0.768K 0.492M
net_s.blocks.2.norm1.weight (384,)
net_s.blocks.2.norm1.bias (384,)
net_s.blocks.2.attn 0.591M 0.151G
net_s.blocks.2.attn.qkv 0.444M 0.113G
net_s.blocks.2.attn.proj 0.148M 37.749M
net_s.blocks.2.norm2 0.768K 0.492M
net_s.blocks.2.norm2.weight (384,)
net_s.blocks.2.norm2.bias (384,)
net_s.blocks.2.mlp 1.182M 0.302G
net_s.blocks.2.mlp.fc1 0.591M 0.151G
net_s.blocks.2.mlp.fc2 0.59M 0.151G
net_s.blocks.3 1.774M 0.454G
net_s.blocks.3.norm1 0.768K 0.492M
net_s.blocks.3.norm1.weight (384,)
net_s.blocks.3.norm1.bias (384,)
net_s.blocks.3.attn 0.591M 0.151G
net_s.blocks.3.attn.qkv 0.444M 0.113G
net_s.blocks.3.attn.proj 0.148M 37.749M
net_s.blocks.3.norm2 0.768K 0.492M
net_s.blocks.3.norm2.weight (384,)
net_s.blocks.3.norm2.bias (384,)
net_s.blocks.3.mlp 1.182M 0.302G
net_s.blocks.3.mlp.fc1 0.591M 0.151G
net_s.blocks.3.mlp.fc2 0.59M 0.151G
net_s.blocks.4 1.774M 0.454G
net_s.blocks.4.norm1 0.768K 0.492M
net_s.blocks.4.norm1.weight (384,)
net_s.blocks.4.norm1.bias (384,)
net_s.blocks.4.attn 0.591M 0.151G
net_s.blocks.4.attn.qkv 0.444M 0.113G
net_s.blocks.4.attn.proj 0.148M 37.749M
net_s.blocks.4.norm2 0.768K 0.492M
net_s.blocks.4.norm2.weight (384,)
net_s.blocks.4.norm2.bias (384,)
net_s.blocks.4.mlp 1.182M 0.302G
net_s.blocks.4.mlp.fc1 0.591M 0.151G
net_s.blocks.4.mlp.fc2 0.59M 0.151G
net_s.blocks.5 1.774M 0.454G
net_s.blocks.5.norm1 0.768K 0.492M
net_s.blocks.5.norm1.weight (384,)
net_s.blocks.5.norm1.bias (384,)
net_s.blocks.5.attn 0.591M 0.151G
net_s.blocks.5.attn.qkv 0.444M 0.113G
net_s.blocks.5.attn.proj 0.148M 37.749M
net_s.blocks.5.norm2 0.768K 0.492M
net_s.blocks.5.norm2.weight (384,)
net_s.blocks.5.norm2.bias (384,)
net_s.blocks.5.mlp 1.182M 0.302G
net_s.blocks.5.mlp.fc1 0.591M 0.151G
net_s.blocks.5.mlp.fc2 0.59M 0.151G
net_s.blocks.6 1.774M 0.454G
net_s.blocks.6.norm1 0.768K 0.492M
net_s.blocks.6.norm1.weight (384,)
net_s.blocks.6.norm1.bias (384,)
net_s.blocks.6.attn 0.591M 0.151G
net_s.blocks.6.attn.qkv 0.444M 0.113G
net_s.blocks.6.attn.proj 0.148M 37.749M
net_s.blocks.6.norm2 0.768K 0.492M
net_s.blocks.6.norm2.weight (384,)
net_s.blocks.6.norm2.bias (384,)
net_s.blocks.6.mlp 1.182M 0.302G
net_s.blocks.6.mlp.fc1 0.591M 0.151G
net_s.blocks.6.mlp.fc2 0.59M 0.151G
net_s.blocks.7 1.774M 0.454G
net_s.blocks.7.norm1 0.768K 0.492M
net_s.blocks.7.norm1.weight (384,)
net_s.blocks.7.norm1.bias (384,)
net_s.blocks.7.attn 0.591M 0.151G
net_s.blocks.7.attn.qkv 0.444M 0.113G
net_s.blocks.7.attn.proj 0.148M 37.749M
net_s.blocks.7.norm2 0.768K 0.492M
net_s.blocks.7.norm2.weight (384,)
net_s.blocks.7.norm2.bias (384,)
net_s.blocks.7.mlp 1.182M 0.302G
net_s.blocks.7.mlp.fc1 0.591M 0.151G
net_s.blocks.7.mlp.fc2 0.59M 0.151G
net_s.blocks.8 1.774M 0.454G
net_s.blocks.8.norm1 0.768K 0.492M
net_s.blocks.8.norm1.weight (384,)
net_s.blocks.8.norm1.bias (384,)
net_s.blocks.8.attn 0.591M 0.151G
net_s.blocks.8.attn.qkv 0.444M 0.113G
net_s.blocks.8.attn.proj 0.148M 37.749M
net_s.blocks.8.norm2 0.768K 0.492M
net_s.blocks.8.norm2.weight (384,)
net_s.blocks.8.norm2.bias (384,)
net_s.blocks.8.mlp 1.182M 0.302G
net_s.blocks.8.mlp.fc1 0.591M 0.151G
net_s.blocks.8.mlp.fc2 0.59M 0.151G
net_s.norm 0.768K
net_s.norm.weight (384,)
net_s.norm.bias (384,)
net_s.head 0.385M
net_s.head.weight (1000, 384)
net_s.head.bias (1000,)

08/30 10:46:57 AM - ==> Creating optimizer 08/30 10:46:57 AM - ==> Loading dataset: DefaultAD 08/30 10:46:57 AM - ==> ** cfg ** fvcore_is : True
fvcore_b : 1
fvcore_c : 3
epoch_full : 100
metrics : ['mAUROC_sp_max', 'mAP_sp_max', 'mF1_max_sp_max', 'mAUPRO_px', 'mAUROC_px', 'mAP_px', 'mF1_max_px', 'mF1_px_0.2_0.8_0.1', 'mAcc_px_0.2_0.8_0.1', 'mIoU_px_0.2_0.8_0.1', 'mIoU_max_px'] use_adeval : True
evaluator.kwargs : {'metrics': ['mAUROC_sp_max', 'mAP_sp_max', 'mF1_max_sp_max', 'mAUPRO_px', 'mAUROC_px', 'mAP_px', 'mF1_max_px', 'mF1_px_0.2_0.8_0.1', 'mAcc_px_0.2_0.8_0.1', 'mIoU_px_0.2_0.8_0.1', 'mIoU_max_px'], 'pooling_ks': [16, 16], 'max_step_aupro': 100} vis : False
vis_dir : None
optim.lr : 0.0004
optim.kwargs : {'name': 'adamw', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0.0001, 'amsgrad': False} trainer.name : ViTADTrainer
trainer.checkpoint : runs
trainer.logdir_sub :
trainer.resume_dir :
trainer.cuda_deterministic : False
trainer.epoch_full : 100
trainer.scheduler_kwargs : {'name': 'step', 'lr_noise': None, 'noise_pct': 0.67, 'noise_std': 1.0, 'noise_seed': 42, 'lr_min': 4e-06, 'warmup_lr': 4.0000000000000003e-07, 'warmup_iters': -1, 'cooldown_iters': 0, 'warmup_epochs': 0, 'cooldown_epochs': 0, 'use_iters': True, 'patience_iters': 0, 'patience_epochs': 0, 'decay_iters': 0, 'decay_epochs': 80, 'cycle_decay': 0.1, 'decay_rate': 0.1} trainer.mixup_kwargs : {'mixup_alpha': 0.8, 'cutmix_alpha': 1.0, 'cutmix_minmax': None, 'prob': 0.0, 'switch_prob': 0.5, 'mode': 'batch', 'correct_lam': True, 'label_smoothing': 0.1} trainer.test_start_epoch : 100
trainer.test_per_epoch : 10
trainer.find_unused_parameters : False
trainer.sync_BN : apex
trainer.dist_BN :
trainer.scaler : none
trainer.data.batch_size : 32
trainer.data.batch_size_per_gpu : 32
trainer.data.batch_size_test : 32
trainer.data.batch_size_per_gpu_test : 32
trainer.data.num_workers_per_gpu : 4
trainer.data.drop_last : True
trainer.data.pin_memory : True
trainer.data.persistent_workers : False
trainer.data.num_workers : 4
trainer.iter : 0
trainer.epoch : 0
trainer.iter_full : 1400
trainer.metric_recorder : {'mAUROC_sp_max_pizza': [], 'mAP_sp_max_pizza': [], 'mF1_max_sp_max_pizza': [], 'mAUPRO_px_pizza': [], 'mAUROC_px_pizza': [], 'mAP_px_pizza': [], 'mF1_max_px_pizza': [], 'mF1_px_0.2_0.8_0.1_pizza': [], 'mAcc_px_0.2_0.8_0.1_pizza': [], 'mIoU_px_0.2_0.8_0.1_pizza': [], 'mIoU_max_px_pizza': []} loss.loss_terms : [{'type': 'CosLoss', 'name': 'cos', 'avg': False, 'lam': 1.0}] loss.clip_grad : 5.0
loss.create_graph : False
loss.retain_graph : False
adv : False
logging.log_terms_train : [{'name': 'batch_t', 'fmt': ':>5.3f', 'add_name': 'avg'}, {'name': 'data_t', 'fmt': ':>5.3f'}, {'name': 'optim_t', 'fmt': ':>5.3f'}, {'name': 'lr', 'fmt': ':>7.6f'}, {'name': 'cos', 'suffixes': [''], 'fmt': ':>5.3f', 'add_name': 'avg'}] logging.log_terms_test : [{'name': 'batch_t', 'fmt': ':>5.3f', 'add_name': 'avg'}, {'name': 'cos', 'suffixes': [''], 'fmt': ':>5.3f', 'add_name': 'avg'}] logging.train_reset_log_per : 50
logging.train_log_per : 50
logging.test_log_per : 50
data.sampler : naive
data.loader_type : pil
data.loader_type_target : pil_L
data.type : DefaultAD
data.root : data/cj
data.meta : meta.json
data.cls_names : []
data.train_transforms : [{'type': 'Resize', 'size': (256, 256), 'interpolation': <InterpolationMode.BILINEAR: 'bilinear'>}, {'type': 'CenterCrop', 'size': (256, 256)}, {'type': 'ToTensor'}, {'type': 'Normalize', 'mean': (0.485, 0.456, 0.406), 'std': (0.229, 0.224, 0.225), 'inplace': True}] data.test_transforms : [{'type': 'Resize', 'size': (256, 256), 'interpolation': <InterpolationMode.BILINEAR: 'bilinear'>}, {'type': 'CenterCrop', 'size': (256, 256)}, {'type': 'ToTensor'}, {'type': 'Normalize', 'mean': (0.485, 0.456, 0.406), 'std': (0.229, 0.224, 0.225), 'inplace': True}] data.target_transforms : [{'type': 'Resize', 'size': (256, 256), 'interpolation': <InterpolationMode.BILINEAR: 'bilinear'>}, {'type': 'CenterCrop', 'size': (256, 256)}, {'type': 'ToTensor'}] data.train_size : 14
data.test_size : 3
data.train_length : 477
data.test_length : 93
model_t.name : vit_small_patch16_224_dino
model_t.kwargs : {'pretrained': True, 'checkpoint_path': '', 'pretrained_strict': False, 'strict': True, 'img_size': 256, 'teachers': [3, 6, 9], 'neck': [12]} model_f.name : fusion
model_f.kwargs : {'pretrained': False, 'checkpoint_path': '', 'strict': False, 'dim': 384, 'mul': 1} model_s.name : de_vit_small_patch16_224_dino
model_s.kwargs : {'pretrained': False, 'checkpoint_path': '', 'strict': False, 'img_size': 256, 'students': [3, 6, 9], 'depth': 9} model.name : vitad
model.kwargs : {'pretrained': False, 'checkpoint_path': '', 'strict': True, 'model_t': Namespace(name='vit_small_patch16_224_dino', kwargs={'pretrained': True, 'checkpoint_path': '', 'pretrained_strict': False, 'strict': True, 'img_size': 256, 'teachers': [3, 6, 9], 'neck': [12]}), 'model_f': Namespace(name='fusion', kwargs={'pretrained': False, 'checkpoint_path': '', 'strict': False, 'dim': 384, 'mul': 1}), 'model_s': Namespace(name='de_vit_small_patch16_224_dino', kwargs={'pretrained': False, 'checkpoint_path': '', 'strict': False, 'img_size': 256, 'students': [3, 6, 9], 'depth': 9})} seed : 42
size : 256
warmup_epochs : 0
test_start_epoch : 100
test_per_epoch : 10
batch_train : 32
batch_test_per : 32
lr : 0.0004
weight_decay : 0.0001
cfg_path : configs.vitad.vitad_cj
mode : train
sleep : 0
memory : -1
dist_url : env://
logger_rank : 0
opts : []
command : python3 -m torch.distributed.launch --nproc_per_node=$nproc_per_node --nnodes=$nnodes --node_rank=$node_rank --master_addr=$master_addr --master_port=$master_port --use_env run.py -c configs.vitad.vitad_cj -m train --sleep 0 --memory -1 --dist_url env:// --logger_rank 0 task_start_time : 5769699.826605973
dist : False
world_size : 1
rank : 0
local_rank : 0
ngpus_per_node : 1
nnodes : 1
master : True
logdir : runs/ViTADTrainer_configs_vitad_vitad_cj_20240830-104656 logger.filters : []
logger.name : root
logger.level : 20
logger.parent : None
logger.propagate : True
logger.disabled : False
logdir_train : runs/ViTADTrainer_configs_vitad_vitad_cj_20240830-104656/show_train logdir_test : runs/ViTADTrainer_configs_vitad_vitad_cj_20240830-104656/show_test 08/30 10:46:57 AM - ==> Starting training with 1 nodes x 1 GPUs 08/30 10:47:01 AM - ==> Total time: 0:00:04 Eta: 0:07:55 Logged in 'runs/ViTADTrainer_configs_vitad_vitad_cj_20240830-104656' 08/30 10:47:05 AM - ==> Total time: 0:00:09 Eta: 0:07:26 Logged in 'runs/ViTADTrainer_configs_vitad_vitad_cj_20240830-104656' 08/30 10:47:10 AM - ==> Total time: 0:00:13 Eta: 0:07:10 Logged in 'runs/ViTADTrainer_configs_vitad_vitad_cj_20240830-104656' 08/30 10:47:12 AM - Train: 3.57% [50/1400] [3.6/100.0] [batch_t 0.053 (0.265)] [data_t 0.002] [optim_t 0.051] [lr 0.000400] [cos 0.592 (0.631)] 08/30 10:47:14 AM - ==> Total time: 0:00:17 Eta: 0:07:01 Logged in 'runs/ViTADTrainer_configs_vitad_vitad_cj_20240830-104656' 08/30 10:47:18 AM - ==> Total time: 0:00:21 Eta: 0:06:57 Logged in 'runs/ViTADTrainer_configs_vitad_vitad_cj_20240830-104656' 08/30 10:47:22 AM - ==> Total time: 0:00:26 Eta: 0:06:48 Logged in 'runs/ViTADTrainer_configs_vitad_vitad_cj_20240830-104656' 08/30 10:47:27 AM - ==> Total time: 0:00:30 Eta: 0:06:44 Logged in 'runs/ViTADTrainer_configs_vitad_vitad_cj_20240830-104656' 08/30 10:47:28 AM - Train: 7.14% [100/1400] [7.1/100.0] [batch_t 0.053 (0.530)] [data_t 0.002] [optim_t 0.050] [lr 0.000400] [cos 0.348 (0.351)] 08/30 10:47:31 AM - ==> Total time: 0:00:34 Eta: 0:06:41 Logged in 'runs/ViTADTrainer_configs_vitad_vitad_cj_20240830-104656' 08/30 10:47:35 AM - ==> Total time: 0:00:39 Eta: 0:06:36 Logged in 'runs/ViTADTrainer_configs_vitad_vitad_cj_20240830-104656' 2/3 `

Thank you very much.

Jeriousman commented 2 months ago

I have only one class. 'pizza' Not multi class. Thank you.

Jeriousman commented 2 months ago

I had one contaminated image. After sorting that out, things work alright.