An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
[Bug]: RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)` #762
python tools/train.py --model patchcore
2022-12-05 20:33:20,501 - anomalib.data - INFO - Loading the datamodule
2022-12-05 20:33:20,502 - anomalib.pre_processing.pre_process - WARNING - Transform configs has not been provided. Images will be normalized using ImageNet statistics.
2022-12-05 20:33:20,502 - anomalib.pre_processing.pre_process - WARNING - Transform configs has not been provided. Images will be normalized using ImageNet statistics.
2022-12-05 20:33:20,502 - anomalib.models - INFO - Loading the model.
2022-12-05 20:33:20,507 - torch.distributed.nn.jit.instantiator - INFO - Created a temporary directory at /tmp/tmp5ojcalbj
2022-12-05 20:33:20,507 - torch.distributed.nn.jit.instantiator - INFO - Writing /tmp/tmp5ojcalbj/_remote_module_non_scriptable.py
2022-12-05 20:33:20,514 - anomalib.models.components.base.anomaly_module - INFO - Initializing PatchcoreLightning model.
/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Metric `PrecisionRecallCurve` will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint.
warnings.warn(*args, **kwargs)
2022-12-05 20:33:21,594 - timm.models.helpers - INFO - Loading pretrained weights from url (https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/wide_resnet50_racm-8234f177.pth)
2022-12-05 20:33:24,078 - anomalib.utils.loggers - INFO - Loading the experiment logger(s)
2022-12-05 20:33:24,078 - anomalib.utils.callbacks - INFO - Loading the callbacks
/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/anomalib/utils/callbacks/__init__.py:143: UserWarning: Export option: None not found. Defaulting to no model export
warnings.warn(f"Export option: {config.optimization.export_mode} not found. Defaulting to no model export")
2022-12-05 20:33:24,082 - pytorch_lightning.utilities.rank_zero - INFO - GPU available: True, used: True
2022-12-05 20:33:24,082 - pytorch_lightning.utilities.rank_zero - INFO - TPU available: False, using: 0 TPU cores
2022-12-05 20:33:24,082 - pytorch_lightning.utilities.rank_zero - INFO - IPU available: False, using: 0 IPUs
2022-12-05 20:33:24,082 - pytorch_lightning.utilities.rank_zero - INFO - HPU available: False, using: 0 HPUs
2022-12-05 20:33:24,082 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
2022-12-05 20:33:24,082 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer(limit_val_batches=1.0)` was configured so 100% of the batches will be used..
2022-12-05 20:33:24,082 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer(limit_test_batches=1.0)` was configured so 100% of the batches will be used..
2022-12-05 20:33:24,082 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer(limit_predict_batches=1.0)` was configured so 100% of the batches will be used..
2022-12-05 20:33:24,082 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
2022-12-05 20:33:24,082 - anomalib - INFO - Training the model.
2022-12-05 20:33:24,086 - anomalib.data.mvtec - INFO - Found the dataset.
2022-12-05 20:33:24,087 - anomalib.data.mvtec - INFO - Setting up train, validation, test and prediction datasets.
/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Metric `ROC` will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint.
warnings.warn(*args, **kwargs)
2022-12-05 20:33:24,661 - pytorch_lightning.accelerators.gpu - INFO - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/pytorch_lightning/core/optimizer.py:184: UserWarning: `LightningModule.configure_optimizers` returned `None`, this fit will run with no optimizer
"`LightningModule.configure_optimizers` returned `None`, this fit will run with no optimizer",
2022-12-05 20:33:24,665 - pytorch_lightning.callbacks.model_summary - INFO -
| Name | Type | Params
-------------------------------------------------------------------
0 | image_threshold | AnomalyScoreThreshold | 0
1 | pixel_threshold | AnomalyScoreThreshold | 0
2 | model | PatchcoreModel | 24.9 M
3 | image_metrics | AnomalibMetricCollection | 0
4 | pixel_metrics | AnomalibMetricCollection | 0
5 | normalization_metrics | MinMax | 0
-------------------------------------------------------------------
24.9 M Trainable params
0 Non-trainable params
24.9 M Total params
99.450 Total estimated model params size (MB)
Epoch 0: 0%| | 0/10 [00:00<?, ?it/s]/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py:137: UserWarning: `training_step` returned `None`. If this was on purpose, ignore this warning...
self.warning_cache.warn("`training_step` returned `None`. If this was on purpose, ignore this warning...")
Epoch 0: 70%|████████████████2022-12-05 20:33:27,516 - anomalib.models.patchcore.lightning_model - INFO - Aggregating the embedding extracted from the training set..46it/s, loss=nan]
2022-12-05 20:33:27,523 - anomalib.models.patchcore.lightning_model - INFO - Applying core-set subsampling to get the embedding.
Traceback (most recent call last):
File "tools/train.py", line 76, in <module>
train()
File "tools/train.py", line 65, in train
trainer.fit(model=model, datamodule=datamodule)
File "/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit
self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
File "/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
results = self._run_stage()
File "/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
return self._run_train()
File "/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1353, in _run_train
self.fit_loop.run()
File "/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/pytorch_lightning/loops/fit_loop.py", line 266, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 205, in run
self.on_advance_end()
File "/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 255, in on_advance_end
self._run_validation()
File "/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 311, in _run_validation
self.val_loop.run()
File "/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 199, in run
self.on_run_start(*args, **kwargs)
File "/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 136, in on_run_start
self._on_evaluation_start()
File "/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 253, in _on_evaluation_start
self.trainer._call_lightning_module_hook("on_validation_start", *args, **kwargs)
File "/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1595, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/anomalib/models/patchcore/lightning_model.py", line 94, in on_validation_start
self.model.subsample_embedding(embeddings, self.coreset_sampling_ratio)
File "/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/anomalib/models/patchcore/torch_model.py", line 142, in subsample_embedding
coreset = sampler.sample_coreset()
File "/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/anomalib/models/components/sampling/k_center_greedy.py", line 131, in sample_coreset
idxs = self.select_coreset_idxs(selected_idxs)
File "/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/anomalib/models/components/sampling/k_center_greedy.py", line 95, in select_coreset_idxs
self.features = self.model.transform(self.embedding)
File "/home/ai/anaconda3/envs/PFM/lib/python3.7/site-packages/anomalib/models/components/dimensionality_reduction/random_projection.py", line 132, in transform
projected_embedding = embedding @ self.sparse_random_matrix.T.float()
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
Epoch 0: 70%|███████ | 7/10 [00:03<00:01, 2.17it/s, loss=nan]
Dataset
MVTec
Model
PatchCore
Steps to reproduce the behavior
python tools/train.py --model patchcore
OS information
OS information:
OS: [e.g. Ubuntu 20.04]
Python version: [e.g. 3.7]
Anomalib version: [e.g. 0.3.7]
PyTorch version: [e.g. 1.120]
CUDA/cuDNN version: [e.g. 11.6] cudnn 8.4.0
GPU models and configuration: [e.g. 2x GeForce RTX 3060]
Any other relevant information: [e.g. mvtec dataset]
Expected behavior
...
Screenshots
No response
Pip/GitHub
pip
What version/branch did you use?
No response
Configuration YAML
...
Logs
...
Code of Conduct
[X] I agree to follow this project's Code of Conduct
Describe the bug
Dataset
MVTec
Model
PatchCore
Steps to reproduce the behavior
python tools/train.py --model patchcore
OS information
OS information:
Expected behavior
...
Screenshots
No response
Pip/GitHub
pip
What version/branch did you use?
No response
Configuration YAML
Logs
Code of Conduct