onnx converted to trt problem

montensorrt commented 1 year ago

when i converted onnx model to trt, always getting a reshape error， error are as follows： 10_27_1638[TRT: ERR][graphShapeAnalyzer.cpp::nvinfer1::builder::`anonymous-namespace'::ShapeNodeRemover::analyzeShapes::1285] Error Code 4: Miscellaneous (IShuffleLayer Reshape_170: reshape changes volume. Reshaping [14763] to [1].) 10_27_1638[TRT: ERR]ModelImporter.cpp:773: While parsing node number 170 [Reshape -> "489"]: 10_27_1638[TRT: ERR]ModelImporter.cpp:774: --- Begin node --- 10_27_1638[TRT: ERR]ModelImporter.cpp:775: input: "487" input: "488" output: "489" name: "Reshape_170" op_type: "Reshape"

10_27_1638[TRT: ERR]ModelImporter.cpp:776: --- End node --- 10_27_1638[TRT: ERR]ModelImporter.cpp:779: ERROR: ModelImporter.cpp:179 In function parseGraph: [6] Invalid Node - Reshape_170 [graphShapeAnalyzer.cpp::nvinfer1::builder::`anonymous-namespace'::ShapeNodeRemover::analyzeShapes::1285] Error Code 4: Miscellaneous (IShuffleLayer Reshape_170: reshape changes volume. Reshaping [14763] to [1].)

when i remove anomaly_score item in output ,such as: anomaly_map = self.anomaly_map_generator(patch_scores) output = (anomaly_map)

and the error changed, 10_27_1511[TRT: ERR][shuffleNode.cpp::nvinfer1::builder::ShuffleNode::symbolicExecute::387] Error Code 4: Internal Error (Reshape_183: IShuffleLayer applied to shape tensor must have 0 or 1 reshape dimensions: dimensions were [-1,2]) 10_27_1511[TRT: ERR]ModelImporter.cpp:773: While parsing node number 193 [Pad -> "521"]: 10_27_1511[TRT: ERR]ModelImporter.cpp:774: --- Begin node --- 10_27_1511[TRT: ERR]ModelImporter.cpp:775: input: "486" input: "520" output: "521" name: "Pad_193" op_type: "Pad" attribute { name: "mode" s: "reflect" type: STRING }

10_27_1511[TRT: ERR]ModelImporter.cpp:776: --- End node --- 10_27_1511[TRT: ERR]ModelImporter.cpp:779: ERROR: ModelImporter.cpp:179 In function parseGraph: [6] Invalid Node - Pad_193 [shuffleNode.cpp::nvinfer1::builder::ShuffleNode::symbolicExecute::387] Error Code 4: Internal Error (Reshape_183: IShuffleLayer applied to shape tensor must have 0 or 1 reshape dimensions: dimensions were [-1,2])

finally, i find it's problem always appear inBelow the ConstantOfShape node。

what should i do, for this problem?

samet-akcay commented 1 year ago

@montensorrt, thanks for reporting this. In order for us to be able to reproduce this issue, can you provide your config file, CLI command that you ran etc? It would be easier to identify the issue. Thanks

montensorrt commented 1 year ago

@samet-akcay ,yes , i think just need onnx model ,it‘s could be able to reproduce this issue ,i use train.py script,producted onnx model,and Use trt official conversion script to convert onnx model. config such as:

dataset:
  name: folder #options: [mvtec, btech, folder]
  format: folder
  path: ./dataset/
  normal_dir: normal # name of the folder containing normal images.
  abnormal_dir: abnormal # name of the folder containing abnormal images.
  normal_test_dir: null # name of the folder containing normal test images.
  task: segmentation
  mask: null #optional
  extensions: null
  split_ratio: 0.2 # ratio of the normal images that will be used to create a test split
  image_size: [883,1061]  # options: [256, 256, 448, 384] - for each supported backbone
  train_batch_size: 16
  test_batch_size: 1
  num_workers: 1
  transform_config:
    train: null
    val: null
  create_validation_set: false
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

model:
  name: patchcore
  backbone: wide_resnet50_2
  pre_trained: true
  layers:
    - layer2 #2
    - layer3 #3
  coreset_sampling_ratio: 0.05
  num_neighbors: 9
  normalization_method: min_max # options: [null, min_max, cdf]

metrics:
  image:
    - F1Score
    - AUROC
  pixel:
    - F1Score
    - AUROC
  threshold:
    image_default: 0
    pixel_default: 0
    adaptive: true

visualization:
  show_images: False # show images on the screen
  save_images: False # save images to the file system
  log_images: False # log images to the available loggers (if any)
  image_save_path: null # path to which images will be saved
  mode: full # options: ["full", "simple"]

project:
  seed: 0
  path: ./all

logging:
  logger: [] # options: [comet, tensorboard, wandb, csv] or combinations.
  log_graph: false # Logs the model graph to respective logger.

optimization:
  export_mode: onnx # options: null, onnx, openvino

# PL Trainer Args. Don't add extra parameter here.
trainer:
  accelerator: auto # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
  accumulate_grad_batches: 1
  amp_backend: native
  auto_lr_find: false
  auto_scale_batch_size: false
  auto_select_gpus: false
  benchmark: false
  check_val_every_n_epoch: 1 # Don't validate before extracting features.
  default_root_dir: null
  detect_anomaly: false
  deterministic: false
  devices: 1
  enable_checkpointing: true
  enable_model_summary: true
  enable_progress_bar: true
  fast_dev_run: false
  gpus: null # Set automatically
  gradient_clip_val: 0
  ipus: null
  limit_predict_batches: 1.0
  limit_test_batches: 1.0
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  log_every_n_steps: 50
  log_gpu_memory: null
  max_epochs: 1
  max_steps: -1
  max_time: null
  min_epochs: null
  min_steps: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle
  num_nodes: 1
  num_processes: null
  num_sanity_val_steps: 0
  overfit_batches: 0.0
  plugins: null
  precision: 32
  profiler: null
  reload_dataloaders_every_n_epochs: 0
  replace_sampler_ddp: true
  strategy: null
  sync_batchnorm: false
  tpu_cores: null
  track_grad_norm: -1
  val_check_interval: 1.0 # Don't validate before extracting features.

montensorrt commented 1 year ago

@samet-akcay I guess whether there are some dynamic parameters in the middle part of the model, I noticed that the final output is not fixed：

montensorrt commented 1 year ago

@samet-akcay Whether it can be set by parameters or how to make it a fixed parameter

montensorrt commented 1 year ago

@samet-akcay I guess it's because of inheriting the DynamicBufferModule module, such as: torch_model.py file, 49 lines such as PatchcoreModel(DynamicBufferModule, nn.Module):

montensorrt commented 1 year ago

@samet-akcay ，I found and solved this problem, It is a problem due to padding operation, replace this operation.

samet-akcay commented 1 year ago

@montensorrt, for a future reference, can you elaborate how exactly you fixed the problem? It could be useful for other users. We might also potentially create a PR to fix it.

Leonardo0325 commented 1 year ago

，I found and solved this problem, It is a problem due to padding operation, replace this operation.

Hi，Can you explain how to solve this problem，Please show the code, thank you！

MhdKAT commented 1 year ago

@samet-akcay ，I found and solved this problem, It is a problem due to padding operation, replace this operation.

Came across the same problem. It would be helpful to explain more how this can be solved! Thank you!

shandongchong commented 1 year ago

Hi，Can you explain how to solve this problem，Please show the code, thank you！

Eliza-and-black commented 1 year ago

I solve the same bug in another project, by switch the version of timm from 0.9.6 to 0.9.2 .

openvinotoolkit / anomalib

onnx converted to trt problem #653