openvinotoolkit / anomalib

An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
https://anomalib.readthedocs.io/en/latest/
Apache License 2.0
3.68k stars 654 forks source link

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. #532

Closed alevangel closed 2 years ago

alevangel commented 2 years ago

Describe the bug Trying to do an inference on CPU, with a model trained on GPU (Colab). But I get this error: RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU. Where I can safely set the map_location?

To Reproduce

       model = get_model(config)
        callbacks = get_callbacks(config)

        trainer = Trainer(callbacks=callbacks, **config.trainer)

        transform_config = config.dataset.transform_config.val if "transform_config" in config.dataset.keys() else None
        dataset = InferenceDataset(
            my_args['input'], image_size=tuple(config.dataset.image_size), transform_config=transform_config
        )
        dataloader = DataLoader(dataset)
        trainer.predict(model=model, dataloaders=[dataloader])
samet-akcay commented 2 years ago

@alevangel, do you set the device to cpu in your config file?

alevangel commented 2 years ago

@samet-akcay is set to 'auto'.

This is the models/patchcore/config.yaml

dataset:
  name: mvtec #options: [mvtec, btech, folder]
  format: mvtec
  path: ./datasets/PRtg
  task: segmentation
  category: rocks
  image_size: 448
  train_batch_size: 32
  test_batch_size: 1
  num_workers: 8
  transform_config:
    train: null
    val: null
  create_validation_set: false
  tiling:
    apply: false
    tile_size: 64
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

model:
  name: patchcore
  backbone: wide_resnet50_2
  pre_trained: true
  layers:
    - layer1
    - layer2
    - layer3
  coreset_sampling_ratio: 0.1
  num_neighbors: 9
  normalization_method: min_max # options: [null, min_max, cdf]

metrics:
  image:
    - F1Score
    - AUROC
  pixel:
    - F1Score
    - AUROC
  threshold:
    image_default: 0
    pixel_default: 0
    adaptive: true

visualization:
  show_images: False # show images on the screen
  save_images: True # save images to the file system
  log_images: True # log images to the available loggers (if any)
  image_save_path: null # path to which images will be saved
  mode: full # options: ["full", "simple"]

project:
  seed: 42
  path: ./results

logging:
  logger: [] # options: [tensorboard, wandb, csv] or combinations.
  log_graph: false # Logs the model graph to respective logger.

optimization:
  export_mode: null # options: onnx, openvino

# PL Trainer Args. Don't add extra parameter here.
trainer:
  accelerator: auto # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
  accumulate_grad_batches: 1
  amp_backend: native
  auto_lr_find: false
  auto_scale_batch_size: false
  auto_select_gpus: false
  benchmark: false
  check_val_every_n_epoch: 1 # Don't validate before extracting features.
  default_root_dir: null
  detect_anomaly: false
  deterministic: false
  devices: 1
  enable_checkpointing: true
  enable_model_summary: true
  enable_progress_bar: true
  fast_dev_run: false
  gpus: null # Set automatically
  gradient_clip_val: 0
  ipus: null
  limit_predict_batches: 1.0
  limit_test_batches: 1.0
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  log_every_n_steps: 50
  log_gpu_memory: null
  max_epochs: 1
  max_steps: -1
  max_time: null
  min_epochs: null
  min_steps: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle
  num_nodes: 1
  num_processes: null
  num_sanity_val_steps: 0
  overfit_batches: 0.0
  plugins: null
  precision: 32
  profiler: null
  reload_dataloaders_every_n_epochs: 0
  replace_sampler_ddp: true
  strategy: null
  sync_batchnorm: false
  tpu_cores: null
  track_grad_norm: -1
  val_check_interval: 1.0 # Don't validate before extracting features.
alevangel commented 2 years ago

The error rises from this line: https://github.com/openvinotoolkit/anomalib/blob/bd369190cdb49f22a22ebd058ea4af46f50aee26/anomalib/utils/callbacks/model_loader.py#L38

Where it attemps to load a model from its weights, but they where trained on GPU I guess.

I solved this loading error modifying the function like that:

    def on_predict_start(self, _trainer, pl_module: AnomalyModule) -> None:
        """Call when inference begins.

        Loads the model weights from ``weights_path`` into the PyTorch module.
        """
        device = torch.device('cpu') if not torch.cuda.is_available() else torch.device('cuda')
        logger.info("Loading the model from %s", self.weights_path)
        pl_module.load_state_dict(torch.load(self.weights_path, map_location=device)["state_dict"])
samet-akcay commented 2 years ago

@alevangel, thanks for your suggestion. We could maybe add a fix using map_location=pl_modue.device, which would ensure that model and device is always consistent.