Open Kyle-fang opened 1 year ago
Here is my configuration file:
batch_size: 64 # batch size for training epochs: 100 # number of epochs to train for warmup_epochs: 10 # number of warmup epochs (without learning rate decay) log_every: 100 # frequency of logging (steps) eval_every: 1 # frequency of evaluating on val set (epochs) n_workers: 8 # number of workers for dataloader fp16: True # whether to use fp16 precision accumulate_grad_batches: 1 # number of accumulation steps normalize_z: False # whether to normalize z in encoder
fine_tune_from: # path to pre-trained model to fine-tune from
std_margin: 1 # margin for the std values in the loss
optimizer: adam # optimizer to use. Options: adam, adamw wd: 1e-6 # weight decay lr: 1e-4 # learning rate scheduler: warmup_cosine # type of scheduler to use. Options: cosine, multistep, warmup_cosine
dataset: # dataset parameters name: stl10 # dataset name size: 96 # image size n_classes: 10 # number of classes path: # path to dataset (leave empty) aug_policy: custom # augmentation policy. Choices: autoaugment, randaugment, custom
scatnet: # scatnet parameters J: 2 # number of scales shape: # shape of the input image
encoder_type: resnet # choices: resnet, deit encoder: # encoder parameters out_dim: 8192-8192-8192 # number of neurons in the projection layer small_kernel: True # whether to use small kernels. Small kernels are used for STL10 dataset
hog: # histogram of oriented gradients parameters nbins: 24 # number of bins pool: 8 # pooling size
comment: stl10_self_supervised_scale_z
The dataset has also been set up!
You are using 16bit precision, while training on CPU. I do not think that is possible to do. Try changing the precision to 32
python main_self_supervised.py --config configs\stl10_self_supervised.yaml
E:\Anaconda\envs\mv_mr\lib\site-packages\torchvision\models_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead. warnings.warn( E:\Anaconda\envs\mv_mr\lib\site-packages\torchvision\models_utils.py:223: UserWarning: Arguments other than a weight enum or
None
for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passingweights=None
. warnings.warn(msg) E:\Anaconda\envs\mv_mr\lib\site-packages\pytorch_lightning\trainer\connectors\accelerator_connector.py:898: UserWarning: You are running on single node with no parallelization, so distributed has no effect. rank_zero_warn("You are running on single node with no parallelization, so distributed has no effect.") E:\Anaconda\envs\mv_mr\lib\site-packages\pytorch_lightning\trainer\connectors\accelerator_connector.py:658: UserWarning: You passedTrainer(accelerator='cpu', precision=16)
but native AMP is not supported on CPU. Usingprecision='bf16'
instead. rank_zero_warn( Using bfloat16 Automatic Mixed Precision (AMP) GPU available: False, used: False TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs E:\Anaconda\envs\mv_mr\lib\site-packages\pytorch_lightning\trainer\configuration_validator.py:291: LightningDeprecationWarning: BaseCallback.on_train_batch_end
hook signature has changed in v1.5. Thedataloader_idx
argument will be removed in v1.7. rank_zero_deprecation( Files already downloaded and verified Files already downloaded and verified Missing logger folder: lightning_logs\2023-09-19-15-52-00_stl10_self_supervised_scale_z Files already downloaded and verified Files already downloaded and verified| Name | Type | Params
0 | _encoder | ResnetMultiProj | 174 M 1 | _loss_dc | DistanceCorrelation | 0 2 | _scatnet | Scattering2D | 0 3 | _hog | HOGLayer | 0 4 | _identity | Identity | 0 5 | online_finetuner | Linear | 20.5 K
174 M Trainable params 0 Non-trainable params 174 M Total params 698.194 Total estimated model params size (MB) Validation sanity check: 0it [00:00, ?it/s]Files already downloaded and verified Validation sanity check: 0%| | 0/2 [00:00<?, ?it/s]Traceback (most recent call last): File "F:\Fangweijie\mv-mr-main\main_self_supervised.py", line 92, in
main(args)
File "F:\Fangweijie\mv-mr-main\main_self_supervised.py", line 74, in main
trainer.fit(module, ckpt_path=path_checkpoint)
File "E:\Anaconda\envs\mv_mr\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 740, in fit
self._call_and_handle_interrupt(
File "E:\Anaconda\envs\mv_mr\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, kwargs)
File "E:\Anaconda\envs\mv_mr\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "E:\Anaconda\envs\mv_mr\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1199, in _run
self._dispatch()
File "E:\Anaconda\envs\mv_mr\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1279, in _dispatch
self.training_type_plugin.start_training(self)
File "E:\Anaconda\envs\mv_mr\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 202, in start_training
self._results = trainer.run_stage()
File "E:\Anaconda\envs\mv_mr\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1289, in run_stage
return self._run_train()
File "E:\Anaconda\envs\mv_mr\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1311, in _run_train
self._run_sanity_check(self.lightning_module)
File "E:\Anaconda\envs\mv_mr\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1375, in _run_sanity_check
self._evaluation_loop.run()
File "E:\Anaconda\envs\mv_mr\lib\site-packages\pytorch_lightning\loops\base.py", line 145, in run
self.advance(*args, *kwargs)
File "E:\Anaconda\envs\mv_mr\lib\site-packages\pytorch_lightning\loops\dataloader\evaluation_loop.py", line 110, in advance
dl_outputs = self.epoch_loop.run(dataloader, dataloader_idx, dl_max_batches, self.num_dataloaders)
File "E:\Anaconda\envs\mv_mr\lib\site-packages\pytorch_lightning\loops\base.py", line 145, in run
self.advance(args, kwargs)
File "E:\Anaconda\envs\mv_mr\lib\site-packages\pytorch_lightning\loops\epoch\evaluation_epoch_loop.py", line 122, in advance
output = self._evaluation_step(batch, batch_idx, dataloader_idx)
File "E:\Anaconda\envs\mv_mr\lib\site-packages\pytorch_lightning\loops\epoch\evaluation_epoch_loop.py", line 217, in _evaluation_step
output = self.trainer.accelerator.validation_step(step_kwargs)
File "E:\Anaconda\envs\mv_mr\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 239, in validation_step
return self.training_type_plugin.validation_step(step_kwargs.values())
File "E:\Anaconda\envs\mv_mr\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 219, in validation_step
return self.model.validation_step(args, *kwargs)
File "F:\Fangweijie\mv-mr-main\src\model\self_supervised_module.py", line 277, in validation_step
return self.step(batch, batch_idx, stage='val')
File "F:\Fangweijie\mv-mr-main\src\model\self_supervised_module.py", line 260, in step
loss_dc = self._step_dc(im_orig, z_scaled, representation)
File "F:\Fangweijie\mv-mr-main\src\model\self_supervised_module.py", line 194, in _step_dc
im_orig_hog = self._hog(im_orig_r)
File "E:\Anaconda\envs\mv_mr\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(input, **kwargs)
File "F:\Fangweijie\mv-mr-main\src\model\hog.py", line 39, in forward
out.scatter_(1, phase_int.floor().long() % self.nbins, norm)
RuntimeError: scatter(): Expected self.dtype to be equal to src.dtype