ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.72k stars 16.34k forks source link

Training v9 with transformer from v5 #13090

Closed gchinta1 closed 4 months ago

gchinta1 commented 4 months ago

Search before asking

Question

Hi Glenn , I hope you are good . I am trying to training yolo with transformer just to see the difference but I am getting nan values on epochs.. it startes calculating the loss in the first one but I get 0 final val values. And in the other epochs all are nan numbers . What is the issue for this? Thank you

Additional

Hi Glenn , I hope you are good . I am trying to training yolo with transformer just to see the difference but I am getting nan values on epochs.. it startes calculating the loss in the first one but I get 0 final val values. And in the other epochs all are nan numbers . What is the issue for this? Thank you

glenn-jocher commented 4 months ago

@gchinta1 hello,

Thank you for reaching out and for your interest in experimenting with YOLOv5 and transformers! To assist you effectively, we need a bit more information.

  1. Minimum Reproducible Example: Could you please provide a minimum reproducible code example? This will help us understand your setup and reproduce the issue on our end. You can refer to our guide on creating a minimum reproducible example here: Minimum Reproducible Example.

  2. Environment and Versions: Ensure you are using the latest versions of torch and the YOLOv5 repository. You can update your packages using the following commands:

    pip install --upgrade torch
    git pull https://github.com/ultralytics/yolov5

    After updating, please try running your training again to see if the issue persists.

  3. Additional Details: If the problem continues, please provide additional details such as:

    • The specific transformer model you are integrating.
    • Any modifications you have made to the YOLOv5 codebase.
    • The command you are using to start the training.

These details will help us diagnose the issue more accurately.

Looking forward to your response so we can help you resolve this!

gchinta1 commented 4 months ago

@gchinta1 hello,

Thank you for reaching out and for your interest in experimenting with YOLOv5 and transformers! To assist you effectively, we need a bit more information.

  1. Minimum Reproducible Example: Could you please provide a minimum reproducible code example? This will help us understand your setup and reproduce the issue on our end. You can refer to our guide on creating a minimum reproducible example here: Minimum Reproducible Example.

  2. Environment and Versions: Ensure you are using the latest versions of torch and the YOLOv5 repository. You can update your packages using the following commands:

    pip install --upgrade torch
    git pull https://github.com/ultralytics/yolov5

    After updating, please try running your training again to see if the issue persists.

  3. Additional Details: If the problem continues, please provide additional details such as:

    • The specific transformer model you are integrating.
    • Any modifications you have made to the YOLOv5 codebase.
    • The command you are using to start the training.

These details will help us diagnose the issue more accurately.

Looking forward to your response so we can help you resolve this!

I am trying to use for transformer layers and block in other yolo algorithm just find the difference in that yolo .. that's why I trying to understand the architecture and how I can make it without C3 module . So I am trying to make the transformer to already use the c3 module c3tr all of them so it will be good at calculations . Thank you

glenn-jocher commented 4 months ago

Hello @gchinta1,

Thank you for providing more context on your experiment with integrating transformer layers into YOLOv5. It sounds like an exciting project! To help you further, let's address a few key points:

  1. Minimum Reproducible Example: To effectively diagnose the issue, we still need a minimum reproducible code example. This will allow us to understand your modifications and reproduce the issue on our end. Please refer to our guide on creating a minimum reproducible example here: Minimum Reproducible Example. This step is crucial for us to investigate and provide a solution.

  2. Environment and Versions: Ensure that you are using the latest versions of torch and the YOLOv5 repository. You can update your packages using the following commands:

    pip install --upgrade torch
    git pull https://github.com/ultralytics/yolov5

    After updating, please try running your training again to see if the issue persists.

  3. Transformer Integration: It sounds like you are replacing the C3 module with a transformer-based module. This is a complex modification, and there are a few things to consider:

    • Initialization: Ensure that your transformer layers are properly initialized. Improper initialization can lead to NaN values during training.
    • Learning Rate: Transformers often require different learning rates compared to convolutional layers. You might need to adjust the learning rate or use a learning rate scheduler.
    • Loss Function: Verify that the loss function is compatible with the output of your transformer layers.

Here is a basic example of how you might integrate a transformer block into the YOLOv5 architecture:

import torch
import torch.nn as nn
from models.common import TransformerBlock

class CustomYOLOv5(nn.Module):
    def __init__(self):
        super(CustomYOLOv5, self).__init__()
        # Define your transformer block
        self.transformer = TransformerBlock(dim=256, num_heads=8, ff_dim=512, dropout=0.1)
        # Other layers...

    def forward(self, x):
        x = self.transformer(x)
        # Forward pass through other layers...
        return x

# Example usage
model = CustomYOLOv5()

Please provide the specific transformer model you are integrating and any modifications you have made to the YOLOv5 codebase. This will help us give more targeted advice.

Looking forward to your response so we can assist you further!

gchinta1 commented 4 months ago

hi again, this my work `class TransformerLayer(nn.Module): def init(self, c, num_heads): super().init() self.q = nn.Linear(c, c, bias=False) self.k = nn.Linear(c, c, bias=False) self.v = nn.Linear(c, c, bias=False) self.ma = nn.MultiheadAttention(embed_dim=c, num_heads=num_heads, batch_first=True) self.fc1 = nn.Linear(c, c, bias=False) self.fc2 = nn.Linear(c, c, bias=False)

def forward(self, x):
    q, k, v = self.q(x), self.k(x), self.v(x)
    attn_output, _ = self.ma(q, k, v)
    x = x + attn_output
    x = x + self.fc2(self.fc1(x))
    return x

class TransformerBlock(nn.Module): def init(self, c1, c2, num_heads, num_layers): super().init() self.conv = Conv(c1, c2) if c1 != c2 else nn.Identity() self.linear = nn.Linear(c2, c2) # learnable position embedding self.tr = nn.Sequential(*(TransformerLayer(c2, numheads) for in range(num_layers))) self.c2 = c2

def forward(self, x):
    x = self.conv(x)
    b, c, w, h = x.shape
    x = x.flatten(2).permute(2, 0, 1)  # shape (wh, b, c)
    x = self.tr(x + self.linear(x))
    x = x.permute(1, 2, 0).reshape(b, self.c2, w, h)
    return x`

instraead of c3 `class RepNCSPELAN4(nn.Module): def init(self, c1, c2, c3, c4, num_heads=4, num_layers=1): """ Initializes the RepNCSPELAN4 module with TransformerBlock for enhanced feature extraction.

    Args:
        c1: Number of input channels.
        c2: Number of output channels.
        c3: Number of intermediate channels.
        c4: Number of channels in Transformer block.
        num_heads: Number of heads in MultiheadAttention.
        num_layers: Number of Transformer layers.
    """
    super().__init__()
    self.c = c3 // 2
    self.cv1 = Conv(c1, c3, 1, 1)
    self.transformer1 = TransformerBlock(c3 // 2, c4, num_heads, num_layers)
    self.conv1 = Conv(c4, c4, 3, 1)
    self.transformer2 = TransformerBlock(c4, c4, num_heads, num_layers)
    self.conv2 = Conv(c4, c4, 3, 1)
    self.cv4 = Conv(c3 + 2 * c4, c2, 1, 1)

def forward(self, x):
    """Performs forward propagation."""
    y = list(self.cv1(x).chunk(2, 1))
    y.append(self.conv1(self.transformer1(y[-1])))
    y.append(self.conv2(self.transformer2(y[-1])))
    return self.cv4(torch.cat(y, 1))

def forward_split(self, x):
    """Performs forward propagation with splitting."""
    y = list(self.cv1(x).split(self.c, 1))
    y.append(self.conv1(self.transformer1(y[-1])))
    y.append(self.conv2(self.transformer2(y[-1])))
    return self.cv4(torch.cat(y, 1))`

and my yaml file `# YOLOv9

parameters

nc: 80 # number of classes depth_multiple: 1.0 # model depth multiple width_multiple: 1.0 # layer channel multiple

activation: nn.LeakyReLU(0.1)

activation: nn.ReLU() learning_rate: 0.001

anchors

anchors: 3

gelan backbone

backbone: [

conv down

[-1, 1, Conv, [64, 3, 2]], # 0-P1/2

conv down

[-1, 1, Conv, [128, 3, 2]], # 1-P2/4

elan-1 block

[-1, 1, RepNCSPELAN4, [256, 128, 64, 1]], # 2

avg-conv down

[-1, 1, Conv, [256, 3, 2]], # 3-P3/8

elan-2 block

[-1, 1, RepNCSPELAN4, [512, 256, 128, 1]], # 4

avg-conv down

[-1, 1, Conv, [512, 3, 2]], # 5-P4/16

elan-2 block

[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 6

avg-conv down

[-1, 1, Conv, [512, 3, 2]], # 7-P5/32

elan-2 block

[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 8 ]

gelan head

head: [

elan-spp block

[-1, 1, SPPELAN, [512, 256]], # 9

up-concat merge

[-1, 1, nn.Upsample, [None, 2, 'nearest']], [[-1, 6], 1, Concat, [1]], # cat backbone P4

elan-2 block

[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 12

up-concat merge

[-1, 1, nn.Upsample, [None, 2, 'nearest']], [[-1, 4], 1, Concat, [1]], # cat backbone P3

elan-2 block

[-1, 1, RepNCSPELAN4, [256, 256, 128, 1]], # 15 (P3/8-small)

avg-conv-down merge

[-1, 1, Conv, [256, 3, 2]], [[-1, 12], 1, Concat, [1]], # cat head P4

elan-2 block

[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 18 (P4/16-medium)

avg-conv-down merge

[-1, 1, Conv, [512, 3, 2]], [[-1, 9], 1, Concat, [1]], # cat head P5

elan-2 block

[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 21 (P5/32-large)

detect

[[15, 18, 21], 1, DDetect, [nc]], # Detect(P3, P4, P5) ]`

when i start training teh epochs and loss numbers starts normaly and and then when it finishing is making them nan and no val values

glenn-jocher commented 4 months ago

Hello @gchinta1,

Thank you for sharing your detailed implementation and YAML configuration. It looks like you've put a lot of effort into integrating transformer layers into the YOLOv5 architecture. Let's try to diagnose the issue with the NaN values during training.

Steps to Diagnose and Resolve the Issue

  1. Check for Initialization Issues: Ensure that all layers, especially the transformer layers, are properly initialized. Improper initialization can lead to NaN values during training.

  2. Gradient Clipping: Sometimes, gradients can explode, leading to NaN values. You can try gradient clipping to mitigate this issue. Add the following lines to your training script:

    torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
  3. Learning Rate: Transformers often require different learning rates compared to convolutional layers. You might need to adjust the learning rate or use a learning rate scheduler. Start with a lower learning rate and see if the issue persists.

  4. Loss Function: Verify that the loss function is compatible with the output of your transformer layers. Ensure that the loss values are not becoming NaN due to invalid operations.

  5. Debugging NaN Values: Add debugging statements to check for NaN values in the intermediate outputs. For example:

    def forward(self, x):
       x = self.conv(x)
       if torch.isnan(x).any():
           print("NaN detected after conv")
       b, c, w, h = x.shape
       x = x.flatten(2).permute(2, 0, 1)  # shape (wh, b, c)
       x = self.tr(x + self.linear(x))
       if torch.isnan(x).any():
           print("NaN detected after transformer")
       x = x.permute(1, 2, 0).reshape(b, self.c2, w, h)
       return x
  6. Verify Environment and Versions: Ensure you are using the latest versions of torch and the YOLOv5 repository. Update your packages using the following commands:

    pip install --upgrade torch
    git pull https://github.com/ultralytics/yolov5

Example Code with Debugging Statements

Here's an example of how you might integrate debugging statements into your TransformerLayer and TransformerBlock:

import torch
import torch.nn as nn

class TransformerLayer(nn.Module):
    def __init__(self, c, num_heads):
        super().__init__()
        self.q = nn.Linear(c, c, bias=False)
        self.k = nn.Linear(c, c, bias=False)
        self.v = nn.Linear(c, c, bias=False)
        self.ma = nn.MultiheadAttention(embed_dim=c, num_heads=num_heads, batch_first=True)
        self.fc1 = nn.Linear(c, c, bias=False)
        self.fc2 = nn.Linear(c, c, bias=False)

    def forward(self, x):
        q, k, v = self.q(x), self.k(x), self.v(x)
        attn_output, _ = self.ma(q, k, v)
        x = x + attn_output
        x = x + self.fc2(self.fc1(x))
        if torch.isnan(x).any():
            print("NaN detected in TransformerLayer")
        return x

class TransformerBlock(nn.Module):
    def __init__(self, c1, c2, num_heads, num_layers):
        super().__init__()
        self.conv = Conv(c1, c2) if c1 != c2 else nn.Identity()
        self.linear = nn.Linear(c2, c2)  # learnable position embedding
        self.tr = nn.Sequential(*(TransformerLayer(c2, num_heads) for _ in range(num_layers)))
        self.c2 = c2

    def forward(self, x):
        x = self.conv(x)
        if torch.isnan(x).any():
            print("NaN detected after conv")
        b, c, w, h = x.shape
        x = x.flatten(2).permute(2, 0, 1)  # shape (wh, b, c)
        x = self.tr(x + self.linear(x))
        if torch.isnan(x).any():
            print("NaN detected after transformer")
        x = x.permute(1, 2, 0).reshape(b, self.c2, w, h)
        return x

Next Steps

  1. Run the Training: With the debugging statements added, run your training script again and monitor the output for any NaN detection messages.
  2. Adjust Hyperparameters: If NaN values are detected, try adjusting the learning rate, adding gradient clipping, or modifying the initialization of your layers.

If the issue persists, please provide any additional error messages or observations from the debugging statements. This will help us further diagnose and resolve the issue.

Thank you for your patience and collaboration. Let's work together to get your model training successfully! 🚀

gchinta1 commented 4 months ago

Hello @gchinta1,

Thank you for sharing your detailed implementation and YAML configuration. It looks like you've put a lot of effort into integrating transformer layers into the YOLOv5 architecture. Let's try to diagnose the issue with the NaN values during training.

Steps to Diagnose and Resolve the Issue

  1. Check for Initialization Issues: Ensure that all layers, especially the transformer layers, are properly initialized. Improper initialization can lead to NaN values during training.

  2. Gradient Clipping: Sometimes, gradients can explode, leading to NaN values. You can try gradient clipping to mitigate this issue. Add the following lines to your training script:

    torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
  3. Learning Rate: Transformers often require different learning rates compared to convolutional layers. You might need to adjust the learning rate or use a learning rate scheduler. Start with a lower learning rate and see if the issue persists.

  4. Loss Function: Verify that the loss function is compatible with the output of your transformer layers. Ensure that the loss values are not becoming NaN due to invalid operations.

  5. Debugging NaN Values: Add debugging statements to check for NaN values in the intermediate outputs. For example:

    def forward(self, x):
       x = self.conv(x)
       if torch.isnan(x).any():
           print("NaN detected after conv")
       b, c, w, h = x.shape
       x = x.flatten(2).permute(2, 0, 1)  # shape (wh, b, c)
       x = self.tr(x + self.linear(x))
       if torch.isnan(x).any():
           print("NaN detected after transformer")
       x = x.permute(1, 2, 0).reshape(b, self.c2, w, h)
       return x
  6. Verify Environment and Versions: Ensure you are using the latest versions of torch and the YOLOv5 repository. Update your packages using the following commands:

    pip install --upgrade torch
    git pull https://github.com/ultralytics/yolov5

Example Code with Debugging Statements

Here's an example of how you might integrate debugging statements into your TransformerLayer and TransformerBlock:

import torch
import torch.nn as nn

class TransformerLayer(nn.Module):
    def __init__(self, c, num_heads):
        super().__init__()
        self.q = nn.Linear(c, c, bias=False)
        self.k = nn.Linear(c, c, bias=False)
        self.v = nn.Linear(c, c, bias=False)
        self.ma = nn.MultiheadAttention(embed_dim=c, num_heads=num_heads, batch_first=True)
        self.fc1 = nn.Linear(c, c, bias=False)
        self.fc2 = nn.Linear(c, c, bias=False)

    def forward(self, x):
        q, k, v = self.q(x), self.k(x), self.v(x)
        attn_output, _ = self.ma(q, k, v)
        x = x + attn_output
        x = x + self.fc2(self.fc1(x))
        if torch.isnan(x).any():
            print("NaN detected in TransformerLayer")
        return x

class TransformerBlock(nn.Module):
    def __init__(self, c1, c2, num_heads, num_layers):
        super().__init__()
        self.conv = Conv(c1, c2) if c1 != c2 else nn.Identity()
        self.linear = nn.Linear(c2, c2)  # learnable position embedding
        self.tr = nn.Sequential(*(TransformerLayer(c2, num_heads) for _ in range(num_layers)))
        self.c2 = c2

    def forward(self, x):
        x = self.conv(x)
        if torch.isnan(x).any():
            print("NaN detected after conv")
        b, c, w, h = x.shape
        x = x.flatten(2).permute(2, 0, 1)  # shape (wh, b, c)
        x = self.tr(x + self.linear(x))
        if torch.isnan(x).any():
            print("NaN detected after transformer")
        x = x.permute(1, 2, 0).reshape(b, self.c2, w, h)
        return x

Next Steps

  1. Run the Training: With the debugging statements added, run your training script again and monitor the output for any NaN detection messages.
  2. Adjust Hyperparameters: If NaN values are detected, try adjusting the learning rate, adding gradient clipping, or modifying the initialization of your layers.

If the issue persists, please provide any additional error messages or observations from the debugging statements. This will help us further diagnose and resolve the issue.

Thank you for your patience and collaboration. Let's work together to get your model training successfully! 🚀

Thank you for help Glenn the line in training script fix the issue 😃.. talk to you next time I will need something 😅

glenn-jocher commented 4 months ago

Hello @gchinta1,

I'm thrilled to hear that the solution worked for you! 😃 Your persistence and detailed information made it easier for us to diagnose and resolve the issue. If you have any more questions or need further assistance in the future, don't hesitate to reach out. The YOLO community and the Ultralytics team are always here to help.

Happy training and best of luck with your project! 🚀

Talk to you next time! 😊