johko / computer-vision-course

This repo is the homebase of a community driven course on Computer Vision with Neural Networks. Feel free to join us on the Hugging Face discord: hf.co/join/discord
MIT License
365 stars 121 forks source link

Multi-label Image Classification Colab Notebook Error #299

Open TonyAssi opened 1 month ago

TonyAssi commented 1 month ago

This notebook doesn't work https://colab.research.google.com/github/johko/computer-vision-course/blob/main/notebooks/Unit%203%20-%20Vision%20Transformers/fine-tuning-multilabel-image-classification.ipynb

When running this line of code: notebook_launcher(train, (model_name,8,5,5e-5), num_processes = 2)

The following error: File "", line 10, in train_transforms labels = torch.tensor(batch['classes']) ValueError: expected sequence of length 1 at dim 1 (got 2)

TonyAssi commented 1 month ago

I think I figured it out...

Error (original code)

def train_transforms(batch):
    # convert all images in batch to RGB to avoid grayscale or transparent images
    batch['image'] = [x.convert('RGB') for x in batch['image']]
    # apply torchvision.transforms per sample in the batch
    inputs = [train_tfms(x) for x in batch['image']]
    batch['pixel_values'] = inputs

    # one-hot encoding the labels
    labels = torch.tensor(batch['classes'])
    batch['labels'] = nn.functional.one_hot(labels,num_classes=20).sum(dim=1)

    return batch

def valid_transforms(batch):
    # convert all images in batch to RGB to avoid grayscale or transparent images
    batch['image'] = [x.convert('RGB') for x in batch['image']]
    # apply torchvision.transforms per sample in the batch
    inputs = [valid_tfms(x) for x in batch['image']]
    batch['pixel_values'] = inputs

    # one-hot encoding the labels
    labels = torch.tensor(batch['classes'])
    batch['labels'] = nn.functional.one_hot(labels,num_classes=20).sum(dim=1)

    return batch

Working code

import torch.nn.functional as F

def train_transforms(batch):
    # convert all images in batch to RGB to avoid grayscale or transparent images
    batch['image'] = [x.convert('RGB') for x in batch['image']]
    # apply torchvision.transforms per sample in the batch
    inputs = [train_tfms(x) for x in batch['image']]
    batch['pixel_values'] = inputs

     # Find the maximum sequence length
    max_length = max(len(seq) for seq in batch['classes'])

    # Pad sequences to have the same length
    padded_sequences = [seq + [0] * (max_length - len(seq)) for seq in batch['classes']]

    # Convert to PyTorch tensor
    padded_tensor = torch.tensor(padded_sequences)

    # Mask padded positions
    mask = (padded_tensor != 0).float()

    # Apply one-hot encoding with masking
    one_hot_encoded = F.one_hot(padded_tensor, num_classes=20)  
    masked_one_hot_encoded = one_hot_encoded * mask.unsqueeze(-1)

    # Sum along the sequence dimension
    batch['labels'] = masked_one_hot_encoded.sum(dim=1)

    return batch

def valid_transforms(batch):
    # convert all images in batch to RGB to avoid grayscale or transparent images
    batch['image'] = [x.convert('RGB') for x in batch['image']]
    # apply torchvision.transforms per sample in the batch
    inputs = [valid_tfms(x) for x in batch['image']]
    batch['pixel_values'] = inputs

    # Find the maximum sequence length
    max_length = max(len(seq) for seq in batch['classes'])

    # Pad sequences to have the same length
    padded_sequences = [seq + [0] * (max_length - len(seq)) for seq in batch['classes']]

    # Convert to PyTorch tensor
    padded_tensor = torch.tensor(padded_sequences)

    # Mask padded positions
    mask = (padded_tensor != 0).float()

    # Apply one-hot encoding with masking
    one_hot_encoded = F.one_hot(padded_tensor, num_classes=20) 
    masked_one_hot_encoded = one_hot_encoded * mask.unsqueeze(-1)

    # Sum along the sequence dimension
    batch['labels'] = masked_one_hot_encoded.sum(dim=1)

    return batch
shreydan commented 4 weeks ago

@TonyAssi hello, sorry for the delay in the reply. I was the author of this notebook, I think I made a silly error causing ValueError: expected sequence of length 1 at dim 1 (got 2) for line labels = torch.tensor(batch['classes']), it is because we cannot create a torch.tensor([[1],[2,3]]) due to unequal lengths. Your idea of padding is nice!

I have updated the code as follows though:

def one_hot(labels,num_classes):
    # labels: list of classes per sample: [[6],[10,14]]
    one_hot_labels = []
    for l in labels:
      encoded = nn.functional.one_hot(torch.tensor(l),num_classes=num_classes).sum(dim=0)
      one_hot_labels.append(encoded)
    return one_hot_labels

def train_transforms(batch):
    # convert all images in batch to RGB to avoid grayscale or transparent images
    batch['image'] = [x.convert('RGB') for x in batch['image']]
    # apply torchvision.transforms per sample in the batch
    inputs = [train_tfms(x) for x in batch['image']]
    batch['pixel_values'] = inputs

    # one-hot encoding the labels
    batch['labels'] = one_hot(batch['classes'],num_classes=20)

    return batch

def valid_transforms(batch):
    # convert all images in batch to RGB to avoid grayscale or transparent images
    batch['image'] = [x.convert('RGB') for x in batch['image']]
    # apply torchvision.transforms per sample in the batch
    inputs = [valid_tfms(x) for x in batch['image']]
    batch['pixel_values'] = inputs

    # one-hot encoding the labels
    batch['labels'] = one_hot(batch['classes'],num_classes=20)

    return batch

this should fix the issue, instead of making batch['classes'] a tensor directly, I loop through the batch and apply nn.functional.one_hot per sample and then return a list of those one-hot encodings for a batch.

It will work on individual and batch-sampling as well:

print(train_dataset[:3]['labels'])
>>> [tensor([0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0]), tensor([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0])]
print(train_dataset[0]['labels'])
>>> tensor([0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
print(train_dataset[[4,5]]['labels'])
>>> [tensor([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]), tensor([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])]

Again sorry for the delay, glad you came up with a solution as well!

P.S. I had created this notebook on Kaggle (multi-gpu Nvidia 2x T4), please set dataloader num_workers=2 and accelerator num_processes=2 for 1 Colab T4.

I'll create a PR to fix these in issues in the notebook ASAP! (cc: @johko)

Thank you!