Problem with model classification of Mimic CXR

danielemolino commented 2 months ago

Hi! Thank you so much for your amazing works.

I need to use your trained model for a project. In order to understand how to use the code, I've tried to evaluate the weights densenet121-res224-all on the official test split of the Mimic CXR-JPEG.

But I'm getting terrible performances, so I'm pretty sure I'm doing something wrong, but I can't find out what it is.

Here is an example of how I'm making the predictions:

import torch
import torchvision
import torchxrayvision as xrv

# Check if CUDA is available and set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Define transformations
transforms = torchvision.transforms.Compose([
    xrv.datasets.XRayCenterCrop(),
    xrv.datasets.XRayResizer(224)
])

# No data augmentation for evaluation
data_aug = None

# Load the dataset
dataset = xrv.datasets.MIMIC_Dataset(
    imgpath="/mimer/NOBACKUP/groups/naiss2023-6-336/dmolino/TestSet_Mimic",
    csvpath="labels_test.csv",  # The original file with only test split rows
    metacsvpath="metadata_test.csv",
    transform=transforms,
    data_aug=data_aug,
    unique_patients=False,
    views=["PA", "AP"]
)

# Create a DataLoader
dataloader = torch.utils.data.DataLoader(dataset, batch_size=64, shuffle=False)

# Load the model with the specified weights
model = xrv.models.DenseNet(weights="densenet121-res224-all")
model.to(device)
model.eval()

# Make predictions
for batch in dataloader:
    with torch.no_grad():
        im = batch.to(device)
        outputs = model(im)
        pred = outputs.cpu().detach().numpy()
        # Process predictions as needed

I get for every class (the one which are part of the Mimic Split) an Auc of almost 0.5. Any idea on there is the mistake?

ieee8023 commented 2 months ago

My guesses so far:

Are you indexing into the correct pathologies in the model output? They are in order based on model.targets
Some misalignment in the dataset labels. Maybe take a specific image and see if the outputs match the labels? And then process the image with this script to see if the predictions are the same: https://github.com/mlmed/torchxrayvision/blob/master/scripts/process_image.py
It's really weird that the predictions are 0.5. Which could be that the predicted values are all the same. So maybe something wrong in post processing?

Another thing you can do is just process the entire dataset and then select the test samples from that. Then you can use the default MIMIC metadata files to load the dataloader.

danielemolino commented 2 months ago

Yes in the end I was able to solve the problem, but still thank you for the answer!

One other question, are the models trained only on Frontal Views?

ieee8023 commented 2 months ago

Yes the models were only trained on Frontal views (PA, AP)

mlmed / torchxrayvision

Problem with model classification of Mimic CXR #156