joaopauloschuler / two-branch-plant-disease

Source code for the paper "Color-aware two-branch DCNN for efficient plant disease classification".
GNU General Public License v3.0
14 stars 4 forks source link

Fixed 1 value in inference tensor #3

Closed steve-stnhbr closed 3 months ago

steve-stnhbr commented 3 months ago

Hi there,

I have a problem regarding the output of the inference in the network. When i load the model using the load_model function, passing the pretrained model from this repo (raw/two-paths) provided in cai and try to predict the class of a given image, using model.predict(...), I always get a similar tensor with the element at index 19 being always 1 regardless of the input.

Is there something I am overlooking when running inference?

Example output:

[7.69824893e-30 1.08461551e-09 1.99305201e-37 1.57645165e-32
  1.98332970e-21 1.83693305e-22 2.46658694e-16 9.50285888e-34
  5.02652572e-11 0.00000000e+00 6.84679358e-34 0.00000000e+00
  4.29332381e-24 0.00000000e+00 0.00000000e+00 0.00000000e+00
  1.30981194e-35 6.60092120e-26 1.05387604e-23 1.00000000e+00
  2.94641964e-35 0.00000000e+00 1.36357227e-31 1.06264589e-35
  4.19193849e-29 0.00000000e+00 2.49844988e-34 1.04456460e-34
  0.00000000e+00 2.85831717e-28 2.88274711e-28 1.40890558e-27
  3.22510951e-23 1.35650107e-23 0.00000000e+00 0.00000000e+00
  4.55988952e-33 1.80344949e-33]
joaopauloschuler commented 3 months ago

@steve-stnhbr , Great to find you testing!

This kind of problem usually happens with a problem with data preparation. This is how I loaded the images for the raw baseline:

  def read_from_paths(paths):
    x=[]
    for path in paths:
      img = load_img(path, target_size=(224,224))
      img = img_to_array(img, dtype='float16')
      x.append(img)
    return x

  print ("loading training images")
  train_x = read_from_paths(train_path)
  train_x = np.array(train_x, dtype='float16')/255.
  train_y = np.array(train_y)

For the two-paths approach (https://github.com/joaopauloschuler/two-branch-plant-disease/blob/main/raw/two-paths-plant-village/two_path_inception.ipynb), the data preparation involves other steps:

lab=True

 train_x = np.array(read_from_paths(train_path), dtype='float16')

            train_x /= 255
            val_x /= 255
            test_x /= 255
            if (verbose):
                print("Converting training.")
            cai.datasets.skimage_rgb2lab_a(train_x,  verbose)
            if (verbose):
                print("Converting validation.")
            cai.datasets.skimage_rgb2lab_a(val_x,  verbose)
            if (verbose):
                print("Converting test.")
            cai.datasets.skimage_rgb2lab_a(test_x,  verbose)
            gc.collect()
            if (bipolar):
                # JP prefers bipolar input [-2,+2]
                train_x[:,:,:,0:3] /= [25, 50, 50]
                train_x[:,:,:,0] -= 2
                val_x[:,:,:,0:3] /= [25, 50, 50]
                val_x[:,:,:,0] -= 2
                test_x[:,:,:,0:3] /= [25, 50, 50]
                test_x[:,:,:,0] -= 2
            else:
                train_x[:,:,:,0:3] /= [100, 200, 200]
                train_x[:,:,:,1:3] += 0.5
                val_x[:,:,:,0:3] /= [100, 200, 200]
                val_x[:,:,:,1:3] += 0.5
                test_x[:,:,:,0:3] /= [100, 200, 200]
                test_x[:,:,:,1:3] += 0.5

If you like, you can share your code here I and I can try to locate the problem.

steve-stnhbr commented 3 months ago

Thank you for your response! I will try out your shared code shortly.

The code I was using was sourced from the cai.datasets module. This is the code for transforming a single image:

def transform_image(img, target_size=(224,224), smart_resize=False, lab=False, rescale=False, bipolar=False):
    def local_rescale(img,  lab):
        if (lab):
            # JP prefers bipolar input [-2,+2]
            if (bipolar):
                img[:,:,0:3] /= [25, 50, 50]
                img[:,:,0] -= 2
            else:
                img[:,:,0:3] /= [100, 200, 200]
                img[:,:,1:3] += 0.5
        else:
            if (bipolar):
                img /= 64
                img -= 2
            else:
                img /= 255

    if (smart_resize):
        img = img_to_array(img, dtype='float32')
        if (lab):
            img /= 255
            img = skimage_color.rgb2lab(img)
        if(rescale):
            local_rescale(img,  lab)
        img = add_padding_to_make_img_array_squared(img)
        if ((img.shape[0] != target_size[0]) or (img.shape[1] != target_size[1])):
            img = cv2.resize(img, dsize=target_size, interpolation=cv2.INTER_NEAREST)
    else:
        img = img_to_array(img, dtype='float32')
        if (lab):
            img /= 255
            img = skimage_color.rgb2lab(img)
        if(rescale):
            local_rescale(img,  lab)
    return img

This function is then called by my training script like this:

img = cv2.imread(os.path.join(TEST_DATA_PATH, class_name, file))
imm_array = transform_image(img, smart_resize=True, lab=True)

Thank you for your support!! :)

joaopauloschuler commented 3 months ago

Glad to help.

This neural network expects small input values in the range [0..1]. You'll need to add the parameter "rescale=True" when calling transform_image. This will force:

                img[:,:,0:3] /= [100, 200, 200]
                img[:,:,1:3] += 0.5

Does it solve the problem?

steve-stnhbr commented 3 months ago

Yes this seems to do the trick, thank you very much! :)

joaopauloschuler commented 3 months ago

Glad to help!