Attach LSTM with Yolov5n

qazi112 commented 2 years ago

Hello folks,

I want to attach LSTM with Yolov5n, can i get few starting pointers how can i do this?

Thanks in advance :)

github-actions[bot] commented 2 years ago

👋 Hello @qazi112, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab and Kaggle notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

qazi112 commented 2 years ago

Sir @glenn-jocher can you please guide me on this?

zhiqwang commented 2 years ago

@qazi112 LSTM is a little slow, as an alternative you can switch to transformer in https://github.com/ultralytics/yolov5/blob/master/models/hub/yolov5s-transformer.yaml

qazi112 commented 2 years ago

@zhiqwang Thanks :)

But i need LSTM first, can you help me in this?

qazi112 commented 2 years ago

@glenn-jocher Can you please guide me or provide and resource of how to attach LSTM with YOLO and modify it for Object Tracking as well.

glenn-jocher commented 2 years ago

@qazi112 👋 Hello! Thanks for asking about object tracking in computer vision. YOLOv5 🚀 is an object detector that detect, localizes and classifies objects in a single image. It does not connect objects across multiple images, for this you need a tracking solution. A few possible tracking solutions are:

Extended Kalman Filter (EKF): https://en.wikipedia.org/wiki/Kalman_filter
KLT tracker: https://en.wikipedia.org/wiki/Kanade%E2%80%93Lucas%E2%80%93Tomasi_feature_tracker
YOLOv5 DeepSort tracker: https://github.com/mikel-brostrom/Yolov5_DeepSort_Pytorch

Good luck 🍀 and let us know if you have any other questions!

qazi112 commented 2 years ago

@glenn-jocher Thanks alot for replying :)

Basically i want to use LSTM with Yolov5, as yolov5 do detections, what i want is to somehow attach LSTM and modify this to spatio-temporal pipeline. If you have any idea what to explore in yolov5 architecture of i want to make this change. As i am a bit new ( Bscs student last year) to this. I do have a good understanding of yolov3 architecture but yolov5n suits my needs and that is fast detector.

github-actions[bot] commented 2 years ago

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Wiki – https://github.com/ultralytics/yolov5/wiki
Tutorials – https://docs.ultralytics.com/yolov5
Docs – https://docs.ultralytics.com

Access additional Ultralytics ⚡ resources:

Ultralytics HUB – https://ultralytics.com/hub
Vision API – https://ultralytics.com/yolov5
About Us – https://ultralytics.com/about
Join Our Team – https://ultralytics.com/work
Contact Us – https://ultralytics.com/contact

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

SafwenNaimi commented 2 years ago

Hello @qazi112, I am also interested in merging LSTM with Yolov5. Did you find any useful starting point to do that?

Devendune commented 1 year ago

Hi @qazi112 , as commented by @SafwenNaimi have you found any method to connect LSTM with Yolov5

Wetu-Vexo commented 1 year ago

Hi @qazi112 , as commented by @SafwenNaimi have you found any method to connect LSTM with Yolov5 (2)

cuiyong127 commented 1 year ago

Hi @qazi112 , as commented by @SafwenNaimi have you found any method to connect LSTM with Yolov5 (3)

glenn-jocher commented 1 year ago

Hello @cuiyong127,

Merging LSTM with YOLOv5 is definitely an interesting topic. However, YOLOv5, as an object detector, primarily aims to detect, localize, and classify objects in real-time, achieving state-of-the-art performance with improved speed and accuracy. YOLOv5 does not inherently include an LSTM module, so building a spatio-temporal pipeline, as you mentioned, would require extensive architectural modifications.

However, there are few possible ways to connect YOLOv5 with LSTM:

Use YOLOv5 output as input to LSTM for tracking
Use YOLOv5 on output from LSTM model

Overall, the best approach to achieve object tracking is to use already existing multi-object tracking solutions such as YOLOv5 with DeepSORT, EKF or KLT tracker, which I had previously mentioned.

Please let us know if you have any other concerns or questions about YOLOv5.

developer-gurpreet commented 1 year ago

Can you explain how to pass YOLO output to LSTM input?

glenn-jocher commented 1 year ago

@developer-gurpreet to pass the YOLOv5 output to an LSTM (Long Short-Term Memory) model as input, you would typically need to reshape and preprocess the YOLO output to fit the input requirements of the LSTM.

Here's a high-level overview of the process:

Obtain the bounding box coordinates, class probabilities, and objectness scores from the YOLOv5 output.
Depending on the LSTM implementation, you may need to convert the bounding box coordinates to a suitable representation, such as center coordinates and width/height ratios or relative coordinates.
Prepare the input sequence for the LSTM by selecting a fixed number of previous YOLO outputs. The number of previous outputs could represent a specific temporal window for the LSTM's input.
Reshape the selected YOLO outputs into an input tensor compatible with the LSTM model. The exact reshaping process will depend on the LSTM architecture and input requirements.
Pass the reshaped tensor as input to the LSTM model for further processing.

The details of the implementation will vary depending on the specific LSTM model and framework you are using. It is recommended to refer to the documentation and examples of the LSTM library or framework you are working with for more specific instructions.

Let us know if you have any further questions or need more assistance with integrating YOLOv5 with LSTM.

bachir172 commented 6 months ago

Hi, I have a yolov8 module trained to detected static sign language letters, and i need a way to detection dynamic sign language words, but yolo supports indiviual images. I thinking of using LSTM but i don't know how it works or how it connects with yolo can i get few starting pointers how can i do this ?

glenn-jocher commented 6 months ago

@bachir172 hi there!

To integrate LSTM with YOLO for dynamic sign language recognition, you'd primarily be working on creating a temporal model that utilizes sequences of frames (images) rather than individual frames. Here's a simplified way to begin:

Extract Features: Use YOLO to detect sign language letters in each frame and extract the bounding box coordinates and class probabilities.
Sequence Preparation: Organize these detected features into sequence data that can be used as input for the LSTM.
LSTM Model: Feed these sequences into an LSTM network that predicts the dynamic sign language words based on the temporal sequence of signs.

You'll need to familiarize yourself with LSTM networks. They process sequences of data and are powerful in handling time-series data, which is what you essentially create when linking frames.

For implementation:

PyTorch and TensorFlow are good frameworks to set up both the YOLO model and LSTM.

Remember, combining these models can get complex, but it's a great learning curve. Keep experimenting! 😊

ultralytics / yolov5