hnjzbss / EKAGen

[CVPR 2024]Instance-level Expert Knowledge and Aggregate Discriminative Attention for Radiology Report Generation
Apache License 2.0
15 stars 0 forks source link

Dimension mismatch problem encountered by "mask_arr_ass += mask_arr" and "img = mask * np.asarray(img)" codes #4

Open dyy1201 opened 1 month ago

dyy1201 commented 1 month ago

"mask_arr_ass" parameter shape is (300, 300), "mask_arr_ass" parameter shape is (300, 300, 3) which results in the "mask_arr_ass += mask_arr" not broadcast here. So I added this code: "if mask_arr.shape! = mask_arr_ass.shape: mask_arr = cv2.resize(mask_arr, (mask_arr_ass.shape[1], mask_arr_ass.shape[0]))“ Make both of them (300, 300, 3) so that they can be added together. I also have the same problem in the "show_cam_onimage" method of ", heatmap = show_cam_on_image" when I perform "cam = heatmap + img". I do the same way. And you did a grayscale processing in mask = Image.fromarray((mask_arr_ass 255).astype(np.uint8)).convert('L'), And then we do img = mask np.asarray(img). But img still seems to be a three-channel, so I added a bit of code (img = img.convert('L')) to make img a grayscale as well. Do these two actions have any impact?

hnjzbss commented 1 month ago

"mask_arr_ass" parameter shape is (300, 300), "mask_arr_ass" parameter shape is (300, 300, 3) which results in the "mask_arr_ass += mask_arr" not broadcast here. So I added this code: "if mask_arr.shape! = mask_arr_ass.shape: mask_arr = cv2.resize(mask_arr, (mask_arr_ass.shape[1], mask_arr_ass.shape[0]))“ Make both of them (300, 300, 3) so that they can be added together. I also have the same problem in the "show_cam_onimage" method of ", heatmap = show_cam_on_image" when I perform "cam = heatmap + img". I do the same way. And you did a grayscale processing in mask = Image.fromarray((mask_arr_ass 255).astype(np.uint8)).convert('L'), And then we do img = mask np.asarray(img). But img still seems to be a three-channel, so I added a bit of code (img = img.convert('L')) to make img a grayscale as well. Do these two actions have any impact?

I didn't quite understand your question. Generally, CXR image data is single-channel. To accommodate the CNN backbone network's pre-trained weights that require three-channel images, we converted the CXR images to three channels before inputting them into the model, referencing a lot of existing medical image methods (including classification and segmentation). Additionally, based on my previous experience with image classification and segmentation tasks, if the original image is single-channel, converting it to RGB or removing two channels from the pre-trained weights of the first convolutional layer in the CNN shows little difference in performance metrics between the two approaches.