xuebinqin / DIS

This is the repo for our new project Highly Accurate Dichotomous Image Segmentation
Apache License 2.0
2.27k stars 269 forks source link

Ideal Image size for inference #54

Open ArielReplicate opened 2 years ago

ArielReplicate commented 2 years ago

Hi,

Thanks for this great work and for uploading the trained weights.

I'm using your pretrained model for inference and I don't really get the purpose of the cache_size parameter.

I figured it means resizing the image berfore inference and this could be usefull for running on big images with low GPU memory.

The thing is for some smaller images the results look better when upscaling them.. In this example of size 450x450 the first (bad) results are when leaving the cache_size parameter blank (no resize?) and the second (Good) results are when using cache_size=[1024,1024]

1 2 3

Can you explain the purpose of this parameter?

chrbruckmann commented 1 year ago

It think we need to distinguish between two parameters.

  1. “input_size=[1024,1024]” used in "Inference.py". This is the size the image will be resized to for inference. Generally it is recommended it should be the same size the images had, when the weights you are using were trained on, for best results. In this case it is 1024.

  2. The "cache_size=[1024,1024]" parameter is used in the "train_valid_inference_main.py" file. There is written about the chache_size: "2.3. cache data spatial size -- To handle large size input images, which take a lot of time for loading in training, we introduce the cache mechanism for pre-convering and resizing the jpg and png images into .pt file hypar["cache_size"] = [1024, 1024] ## cached input spatial resolution, can be configured into different size"

The cache speeds up the training process because the images are preprocessed in a form that can be loaded faster into the GPU during training. For example with this mechanism I can train about 1 image per 0.2 seconds. Without this mechanism it takes 1-2 seconds.

For inference you don’t need the cache mechanism since you only feed an image one time into the net, while you would only profit from preprocessing if you feed it multiple times.