scu-zjz / IMDLBenCo

[NeurIPS'24 Spotlight] A comprehensive benchmark & codebase for Image manipulation detection/localization.
https://scu-zjz.github.io/IMDLBenCo-doc
Creative Commons Attribution 4.0 International
63 stars 12 forks source link

script to test on single image [Trufor Model] Not working #47

Open kshitij005 opened 3 days ago

kshitij005 commented 3 days ago

I have written below script to test on single image

Code Block

import torch
from PIL import Image
import json
import os
import cv2
import numpy as np
from IMDLBenCo.registry import MODELS
from IMDLBenCo.transforms import get_albu_transforms

# Load the model
def load_trufor_model(model_path, pretrain_weights, mit_b2_pretrain_weights, device, config_path):
    model_class = MODELS.get("Trufor")
    model = model_class(np_pretrain_weights=pretrain_weights, mit_b2_pretrain_weights=mit_b2_pretrain_weights, config_path=config_path)
    model.to(device)

    # Load the checkpoint if provided
     if model_path:
         checkpoint = torch.load(model_path, map_location=device)
         model.load_state_dict(checkpoint['model'])

    model.eval()
    return model

def preprocess_image(image_path, image_size=512):
    transform = get_albu_transforms('test')  # Get the same test transformations
    image = Image.open(image_path).convert('RGB')

    # Resize the image to the desired size
    image = image.resize((image_size, image_size))

    # Apply the albumentation transforms (outputs NumPy array)
    image = transform(image=np.array(image))['image']

    # Convert the image to a float32 tensor and normalize to range [0, 1]
    image = torch.tensor(image, dtype=torch.float32).div(255.0)

    # Permute the image from (H, W, C) to (C, H, W)
    image = image.permute(2, 0, 1).unsqueeze(0)  # Add batch dimension

    return image

def test_single_image(model, image_tensor, device):
    model.eval()

    # Create dummy mask and label, adjust dimensions if necessary
    mask = torch.zeros((1, *image_tensor.shape[2:]))  # Assuming mask matches image size
    label = torch.tensor([0])  # Assuming label is a single integer for classification

    with torch.no_grad():
        image_tensor = image_tensor.to(device)
        mask = mask.to(device)
        label = label.to(device)

        # Pass image_tensor, mask, and label to the model
        output = model(image_tensor, mask, label)

    return output

if __name__ == "__main__":
    # Set paths and device
    image_path = "155109225_CustPhoto.png"  # Replace with your image path
    mit_b2_pretrain_weights = "mit_b2.pth"
    checkpoint_path = "checkpoint-50.pth"
    pretrain_weights = "noiseprint.pth"
    config_path = "trufor.yaml"
    device = "cpu" # torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # Load model and preprocess the image
    model = load_trufor_model(checkpoint_path, pretrain_weights,mit_b2_pretrain_weights, device, config_path)
    image_tensor = preprocess_image(image_path)

# Get the model output
model_output = test_single_image(model, image_tensor, device)

# Print or process the output as needed

out_img = ((model_output['pred_mask'][0][:, :512, :512].permute(1, 2, 0).cpu().numpy()) * 255).astype(np.uint8)
cv2.imwrite("test_image.jpg",out_img)

In Above code I am not getting the proper masked output image though my training done properly, I have checked the results without checkpoint, that too is not giving proper results. test_image

The output image from the test is not satisfactory. Could you please help me identify if there is an issue with the code? Additionally, we receive a pred_label in the output—how can I use this to determine whether the image has been tampered with?

I am new to this field, so please bear with me if I am missing something.

SunnyHaze commented 3 days ago

Thanks for your attention to our work. Firstly I have an question that is the visualization in Tensorboard is correct or not? This is a key to verify the bug comes from the training stage or the saving-loading stage.

kshitij005 commented 2 days ago

@SunnyHaze,

I trained the model on an Azure GPU server, but I don't have access to TensorBoard since all the ports are blocked. log.txt

I've attached the log file—please let me know if it works. Also, could you explain the functionality of the pred_label (which we receive in the model output along with pred_mask)?

Lastly, could you please review my test script to check if it's coded correctly, so I can use it to test on other pre-trained models like MVSS-Net, CAT-Net, etc.?

dddb11 commented 2 days ago

@SunnyHaze,

I trained the model on an Azure GPU server, but I don't have access to TensorBoard since all the ports are blocked. log.txt

I've attached the log file—please let me know if it works. Also, could you explain the functionality of the pred_label (which we receive in the model output along with pred_mask)?

Lastly, could you please review my test script to check if it's coded correctly, so I can use it to test on other pre-trained models like MVSS-Net, CAT-Net, etc.?

Hi,kshitij005. You can download the tensorboard files and visualize them on your computer. In your log, the F1 score doesn’t seem to improve with training. You can try training a few more times and monitor the reason for the lack of improvement. In your test script, it seems like you haven’t used ImageNet’s normalization for image processing. I’m not sure if this is the cause of the issue, but you can try improving that and then debug further.

kshitij005 commented 2 days ago

I am currently testing this script using pre-trained weights. When I used the older script from the [IML-ViT repo], which includes the TruFor weights[iml-vit_checkpoint_trufor_20231104.pth] you provided, I achieved very good results with those pre-trained weights.

However, when I run the updated script [above script which I have made for testing] with the added normalization, as you suggested, I am getting the same distorted results I shared earlier. updated code Block:

# Preprocess the image with ImageNet normalization
def preprocess_image(image_path, image_size=512):
    transform = get_albu_transforms('test')  # Get the same test transformations
    image = Image.open(image_path).convert('RGB')

    # Resize the image to the desired size
    image = image.resize((image_size, image_size))

    # Apply the albumentation transforms (outputs NumPy array)
    image = transform(image=np.array(image))['image']

    # Convert the image to a float32 tensor and normalize to range [0, 1]
    image = torch.tensor(image, dtype=torch.float32).div(255.0)

    # Permute the image from (H, W, C) to (C, H, W)
    image = image.permute(2, 0, 1).unsqueeze(0)  # Add batch dimension

    # Define ImageNet's mean and std
    mean = torch.tensor([0.485, 0.456, 0.406]).view(1, 3, 1, 1)
    std = torch.tensor([0.229, 0.224, 0.225]).view(1, 3, 1, 1)

    # Normalize the image using ImageNet's mean and std
    image = (image - mean) / std

    return image

Could you explain why there is such a discrepancy? Is there something wrong with my test script?

SunnyHaze commented 2 days ago

Do you mean you run IML-VIT check point with the image loading process above? Since IML-ViT requires an image with resolution of 1024x1024 as input. This may not fit the desire input.

kshitij005 commented 1 day ago

Do you mean you run IML-VIT check point with the image loading process above? Since IML-ViT requires an image with resolution of 1024x1024 as input. This may not fit the desire input.

I used your demo.ipynb to run those weights, utilizing the test_image function with a 1024x1024 resolution input image. However, running the normal pretrained weights on a single image is functioning as expected (In the older vesrion IML-Vit Repo), whereas the newer version [ IMDLBenCo Repo] does not works correctly/ gives improper masked output. Could this issue stem from a problem in my script or another factor?

SunnyHaze commented 1 day ago

So do you have social media like whatsapp or wechat so we can discuss this properly?