Open TonyAssi opened 1 month ago
I think I figured it out...
Error (original code)
def train_transforms(batch):
# convert all images in batch to RGB to avoid grayscale or transparent images
batch['image'] = [x.convert('RGB') for x in batch['image']]
# apply torchvision.transforms per sample in the batch
inputs = [train_tfms(x) for x in batch['image']]
batch['pixel_values'] = inputs
# one-hot encoding the labels
labels = torch.tensor(batch['classes'])
batch['labels'] = nn.functional.one_hot(labels,num_classes=20).sum(dim=1)
return batch
def valid_transforms(batch):
# convert all images in batch to RGB to avoid grayscale or transparent images
batch['image'] = [x.convert('RGB') for x in batch['image']]
# apply torchvision.transforms per sample in the batch
inputs = [valid_tfms(x) for x in batch['image']]
batch['pixel_values'] = inputs
# one-hot encoding the labels
labels = torch.tensor(batch['classes'])
batch['labels'] = nn.functional.one_hot(labels,num_classes=20).sum(dim=1)
return batch
Working code
import torch.nn.functional as F
def train_transforms(batch):
# convert all images in batch to RGB to avoid grayscale or transparent images
batch['image'] = [x.convert('RGB') for x in batch['image']]
# apply torchvision.transforms per sample in the batch
inputs = [train_tfms(x) for x in batch['image']]
batch['pixel_values'] = inputs
# Find the maximum sequence length
max_length = max(len(seq) for seq in batch['classes'])
# Pad sequences to have the same length
padded_sequences = [seq + [0] * (max_length - len(seq)) for seq in batch['classes']]
# Convert to PyTorch tensor
padded_tensor = torch.tensor(padded_sequences)
# Mask padded positions
mask = (padded_tensor != 0).float()
# Apply one-hot encoding with masking
one_hot_encoded = F.one_hot(padded_tensor, num_classes=20)
masked_one_hot_encoded = one_hot_encoded * mask.unsqueeze(-1)
# Sum along the sequence dimension
batch['labels'] = masked_one_hot_encoded.sum(dim=1)
return batch
def valid_transforms(batch):
# convert all images in batch to RGB to avoid grayscale or transparent images
batch['image'] = [x.convert('RGB') for x in batch['image']]
# apply torchvision.transforms per sample in the batch
inputs = [valid_tfms(x) for x in batch['image']]
batch['pixel_values'] = inputs
# Find the maximum sequence length
max_length = max(len(seq) for seq in batch['classes'])
# Pad sequences to have the same length
padded_sequences = [seq + [0] * (max_length - len(seq)) for seq in batch['classes']]
# Convert to PyTorch tensor
padded_tensor = torch.tensor(padded_sequences)
# Mask padded positions
mask = (padded_tensor != 0).float()
# Apply one-hot encoding with masking
one_hot_encoded = F.one_hot(padded_tensor, num_classes=20)
masked_one_hot_encoded = one_hot_encoded * mask.unsqueeze(-1)
# Sum along the sequence dimension
batch['labels'] = masked_one_hot_encoded.sum(dim=1)
return batch
@TonyAssi hello, sorry for the delay in the reply. I was the author of this notebook, I think I made a silly error causing ValueError: expected sequence of length 1 at dim 1 (got 2)
for line labels = torch.tensor(batch['classes'])
, it is because we cannot create a torch.tensor([[1],[2,3]]) due to unequal lengths. Your idea of padding is nice!
I have updated the code as follows though:
def one_hot(labels,num_classes):
# labels: list of classes per sample: [[6],[10,14]]
one_hot_labels = []
for l in labels:
encoded = nn.functional.one_hot(torch.tensor(l),num_classes=num_classes).sum(dim=0)
one_hot_labels.append(encoded)
return one_hot_labels
def train_transforms(batch):
# convert all images in batch to RGB to avoid grayscale or transparent images
batch['image'] = [x.convert('RGB') for x in batch['image']]
# apply torchvision.transforms per sample in the batch
inputs = [train_tfms(x) for x in batch['image']]
batch['pixel_values'] = inputs
# one-hot encoding the labels
batch['labels'] = one_hot(batch['classes'],num_classes=20)
return batch
def valid_transforms(batch):
# convert all images in batch to RGB to avoid grayscale or transparent images
batch['image'] = [x.convert('RGB') for x in batch['image']]
# apply torchvision.transforms per sample in the batch
inputs = [valid_tfms(x) for x in batch['image']]
batch['pixel_values'] = inputs
# one-hot encoding the labels
batch['labels'] = one_hot(batch['classes'],num_classes=20)
return batch
this should fix the issue, instead of making batch['classes'] a tensor directly, I loop through the batch and apply nn.functional.one_hot
per sample and then return a list of those one-hot encodings for a batch.
It will work on individual and batch-sampling as well:
print(train_dataset[:3]['labels'])
>>> [tensor([0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0]), tensor([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0])]
print(train_dataset[0]['labels'])
>>> tensor([0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
print(train_dataset[[4,5]]['labels'])
>>> [tensor([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]), tensor([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])]
Again sorry for the delay, glad you came up with a solution as well!
P.S. I had created this notebook on Kaggle (multi-gpu Nvidia 2x T4), please set dataloader num_workers=2 and accelerator num_processes=2 for 1 Colab T4.
I'll create a PR to fix these in issues in the notebook ASAP! (cc: @johko)
Thank you!
This notebook doesn't work https://colab.research.google.com/github/johko/computer-vision-course/blob/main/notebooks/Unit%203%20-%20Vision%20Transformers/fine-tuning-multilabel-image-classification.ipynb
When running this line of code: notebook_launcher(train, (model_name,8,5,5e-5), num_processes = 2)
The following error: File "", line 10, in train_transforms
labels = torch.tensor(batch['classes'])
ValueError: expected sequence of length 1 at dim 1 (got 2)