Train on .npy file dataset

bach05 commented 2 years ago

Search before asking

[X] I have searched the YOLOv5 issues and found no similar feature requests.

Description

I would like to be able to train on .npy files. Each files contains a multi-channel image (4 or more channels). Currently the train.py script does not support this format.

Use case

Multichannel image processing

Additional

Just a suggestion on how to modify the source code is enough.

Are you willing to submit a PR?

[X] Yes I'd like to help by submitting a PR!

github-actions[bot] commented 2 years ago

👋 Hello @bach05, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab and Kaggle notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 2 years ago

@bach05 python train.py --cache disk will cache any image format to *.npy files for faster reads during training.

bach05 commented 2 years ago

@glenn-jocher Yes, that's ok. But I have a dataset of multichannel images saved in .npy format. Is it possible to train on this dataset?

If I try to launch python train.py, it gives me the error 'No images or videos found in {p}. Supported formats are:\nimages: {IMG_FORMATS}\nvideos: {VID_FORMATS}'.

I tried to add to IMG_FORMATS the extention .npy, but it gives me another error '{prefix}No labels in {cache_path}. Can not train without labels. See {HELP_URL}'.

glenn-jocher commented 2 years ago

@bach05 👋 Hello! Thanks for asking about YOLOv5 🚀 dataset formatting. Yes, you'll need labels to train. Full guide below.

To train correctly your data must be in YOLOv5 format. Please see our Train Custom Data tutorial for full documentation on dataset setup and all steps required to start training your first model. A few excerpts from the tutorial:

1.1 Create dataset.yaml

COCO128 is an example small tutorial dataset composed of the first 128 images in COCO train2017. These same 128 images are used for both training and validation to verify our training pipeline is capable of overfitting. data/coco128.yaml, shown below, is the dataset config file that defines 1) the dataset root directory path and relative paths to train / val / test image directories (or *.txt files with image paths), 2) the number of classes nc and 3) a list of class names:

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco128  # dataset root dir
train: images/train2017  # train images (relative to 'path') 128 images
val: images/train2017  # val images (relative to 'path') 128 images
test:  # test images (optional)

# Classes
nc: 80  # number of classes
names: [ 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
         'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
         'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
         'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
         'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
         'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
         'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
         'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
         'hair drier', 'toothbrush' ]  # class names

1.2 Create Labels

After using a tool like Roboflow Annotate to label your images, export your labels to YOLO format, with one *.txt file per image (if no objects in image, no *.txt file is required). The *.txt file specifications are:

One row per object
Each row is class x_center y_center width height format.
Box coordinates must be in normalized xywh format (from 0 - 1). If your boxes are in pixels, divide x_center and width by image width, and y_center and height by image height.
Class numbers are zero-indexed (start from 0).

The label file corresponding to the above image contains 2 persons (class 0) and a tie (class 27):

1.3 Organize Directories

Organize your train and val images and labels according to the example below. YOLOv5 assumes /coco128 is inside a /datasets directory next to the /yolov5 directory. YOLOv5 locates labels automatically for each image by replacing the last instance of /images/ in each image path with /labels/. For example:

../datasets/coco128/images/im0.jpg  # image
../datasets/coco128/labels/im0.txt  # label

Good luck 🍀 and let us know if you have any other questions!

bach05 commented 2 years ago

@glenn-jocher Thanks, actually I have followed your guidelines and I have my labels. In fact, the training works with .jpg files. But I would like to train on .npy files.

The problem arise when I save images on .npy files. The train.py script seems not able to handle them. Here a minimum example of a dataset to reproduce the problem.

I would like to launch tryining directly on .npy files (each file contains a WxHxC matrix which is a multichannel image). Is it possible? Where do I need to modify the code?

cyber-sys commented 2 years ago

Hey! I would also like to train on ".npy" files with an array of numbers instead of images. My dataset is a 1 channel float64 array (WxHx1). The data structure is close to example @bach05.

MattiaMolon commented 2 years ago

Hey! I am also trying to do the same.

My dataset images have 6 channels and I can not save them in any image format. Therefore, I have saved them in .npy format. However, no method is provided to train on such files. If someone can give a hint on how or where to modify the code to make it work it would be awesome.

bach05 commented 2 years ago

Hi @MattiaMolon , the only work aroud I found is saving the multichannel files in TIFF format which support "multi-page" images. In the same way @cyber-pizdec can save his mono-channel images in JPG or PNG format.

cyber-sys commented 2 years ago

I can't save my uint32 array to png, in PIL it is saved as uint16. While reading tiff uint32 I am facing this error: opencv-python\opencv-python\opencv\modules\imgcodecs\src\grfmt_tiff.cpp (462) cv::TiffDecoder::readData OpenCV TIFF: TIFFRGBAImageOK: Sorry, can not handle images with 32-bit samples Please advise how to get around this limitation.

MattiaMolon commented 2 years ago

Hi @bach05, Thank you very much for the suggestion. I am now able to start the training loop. However, I now have another problem: the number of channels seems to be hardcoded to 3 in the train.py file and in the Dataloader. Even if I upload a custom model with:

model = torch.hub.load(
                "ultralytics/yolov5", model_name, autoshape=False, classes=n_classes, channels=n_channels,
            )

The Dataloader seems to drop the 3 additional channels automatically. Therefore I get the following error while forwarding an image into the model:

File "my_train.py", line 670, in <module>
    main(opt)
  File "my_train.py", line 565, in main
    train(opt.hyp, opt, device, callbacks)
  File "my_train.py", line 351, in train
    pred = model(imgs)  # forward
  File "yolov5/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "yolov5/models/yolo.py", line 135, in forward
    return self._forward_once(x, profile, visualize)  # single-scale inference, train
  File "yolov5/models/yolo.py", line 158, in _forward_once
    x = m(x)  # run
  File "yolov5/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "yolov5/models/common.py", line 50, in forward_fuse
    return self.act(self.conv(x))
  File "yolov5/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "yolov5/.venv/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 447, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "yolov5/.venv/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 443, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [32, 6, 6, 6], expected input[16, 3, 128, 128] to have 6 channels, but got 3 channels instead

Did you run into the same issue? I have found #2177 suggesting modifying the Dataloader. Was this the solution for you?

bach05 commented 2 years ago

@MattiaMolon I was wondering to solve that problem, but now I am stopped due to other duties. I hope I can resume soon working on it!

github-actions[bot] commented 2 years ago

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Wiki – https://github.com/ultralytics/yolov5/wiki
Tutorials – https://docs.ultralytics.com/yolov5
Docs – https://docs.ultralytics.com

Access additional Ultralytics ⚡ resources:

Ultralytics HUB – https://ultralytics.com/hub
Vision API – https://ultralytics.com/yolov5
About Us – https://ultralytics.com/about
Join Our Team – https://ultralytics.com/work
Contact Us – https://ultralytics.com/contact

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

Monibsediqi commented 1 year ago

Following this as well. I would like to load the data from .npy file. how and where do I need to change in the code? Any help please?

glenn-jocher commented 1 year ago

@Monibsediqi to load data from .npy files in YOLOv5, you can make the following modifications:

In the data.yaml file, update the train and val paths to point to your .npy files instead of image directories.
In the datasets.py file, update the LoadImagesAndLabels class. Replace the image loading logic with code to load .npy files using np.load().
Modify the load_image() function in the utils\general.py file to handle .npy files appropriately. You can use np.load() to load the .npy file and convert it to the required tensor format.

Note that you might also need to make changes to the input dimensions and the number of channels in the model architecture to match your .npy file format.

Hope this helps! Let me know if you have any further questions.

Monibsediqi commented 1 year ago

Thank you so much for your prompt reply. Really appreciate it. I will follow the steps that you've mentioned and get back to you with the outcome.

glenn-jocher commented 1 year ago

@Monibsediqi you're welcome! I'm glad I could help. Feel free to reach out if you have any further questions or need any assistance. Good luck with your modifications, and I look forward to hearing about the outcome.

VaishviShah commented 1 year ago

I have a dataset containing pairs of images taken simultaneously by IR and Thermal cameras. I need to modify the model architecture to take these two images as input and detect the object present in them. The model should then provide two images as output in response. How do I make changes?

glenn-jocher commented 1 year ago

@VaishviShah hi,

To modify the YOLOv5 model architecture to take pairs of images from IR and Thermal cameras as input, you can follow these steps:

Update the data pipeline: Modify the datasets.py file to load both IR and Thermal images as input. You can use the cv2 library or any other image processing library to read both images and concatenate them together.
Modify the model architecture: In the models\common.py file, update the network architecture to take the concatenated input images. You can modify the YoloV5 class and its forward method to handle the two input images appropriately.
Update the training script: In the train.py script, make sure to pass both IR and Thermal images to the model during training and evaluation. You can modify the Dataloader to load the images accordingly.

By following these steps, you should be able to modify the YOLOv5 model architecture to handle pairs of IR and Thermal images as input and provide two images as output. Let me know if you need further help.

Happy coding!

VaishviShah commented 1 year ago

Do you have any method to change YOLO v8 architecture?

On Tue, Aug 1, 2023, 17:38 Glenn Jocher @.***> wrote:

@VaishviShah https://github.com/VaishviShah hi,

To modify the YOLOv5 model architecture to take pairs of images from IR and Thermal cameras as input, you can follow these steps:

1.

Update the data pipeline: Modify the datasets.py file to load both IR and Thermal images as input. You can use the cv2 library or any other image processing library to read both images and concatenate them together. 2.

Modify the model architecture: In the models\common.py file, update the network architecture to take the concatenated input images. You can modify the YoloV5 class and its forward method to handle the two input images appropriately. 3.

Update the training script: In the train.py script, make sure to pass both IR and Thermal images to the model during training and evaluation. You can modify the Dataloader to load the images accordingly.

By following these steps, you should be able to modify the YOLOv5 model architecture to handle pairs of IR and Thermal images as input and provide two images as output. Let me know if you need further help.

Happy coding!

— Reply to this email directly, view it on GitHub https://github.com/ultralytics/yolov5/issues/7316#issuecomment-1660182532, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUAIBMAFEFTLC3VAABYQVULXTDWVNANCNFSM5SWQWC5A . You are receiving this because you were mentioned.Message ID: @.***>

glenn-jocher commented 1 year ago

@VaishviShah to modify the YOLOv5 model architecture and create YOLOv8, you will need to make significant modifications to the codebase.

Here is a high-level overview of the steps you can follow:

Update the backbone network: You can change the backbone network architecture, such as using a different feature extractor or using a different backbone altogether.
Modify the detection head: Depending on the specific changes you want to make, you may need to modify the detection head to accommodate the new architecture. This includes changing the number of anchor boxes, adjusting the number of output channels, and modifying the prediction layers.
Adjust the architecture configuration: Make sure to update the model configuration file to reflect the changes you made in the backbone network and detection head. This includes updating the number of output channels, anchor sizes, and strides.
Train the model: Once you have modified the architecture, you can train the model using your dataset. Make sure to handle any changes in image preprocessing, data loading, and data augmentation.

Please note that creating a YOLOv8 model involves significant modifications and could require substantial effort. It is recommended to have a good understanding of the YOLOv5 codebase and deep learning principles before attempting such modifications.

I hope this guidance helps you in your modifications. Feel free to reach out if you have any further questions.

ultralytics / yolov5