ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
49.49k stars 16.08k forks source link

Mergeing Yolov5 with LSTM to Human Activity Recognition #8849

Closed moahaimen closed 1 year ago

moahaimen commented 2 years ago

Search before asking

Question

hi I need your help in Merging Yolov5 with LSTM, to Recognize Certain Human Activity

First, my Yolov5 succeeded to Detection the Objects, but I need to make the system Recognize the way Human Acts when holding the Object so I searched for the better way I s to merge Yolov5 with Lstm But I don Know How so it can go as pipeline first detecting the objects man holding then LSTM decides the Human Activity of The person holding the object Please advice me

Thank You

Additional

No response

moahaimen commented 2 years ago

@glenn-jocher please help thank You

glenn-jocher commented 2 years ago

@moahaimen LSTM models must be trained on video datasets. So as I tell everybody the first step is to have labelled data of the type you want to be able to predict.

Zephyr69 commented 2 years ago

Why would you need temporal information for this tho? I thought it was pretty clear whether a person is holding an object from just one frame? Is this for cases where the object cannot be seen?

moahaimen commented 2 years ago

@moahaimen LSTM models must be trained on video datasets. So as I tell everybody the first step is to have labelled data of the type you want to be able to predict.

thank you for answering I have a video dataset for the certain Actions i need, but i need to understand how to put LSTM with Yolo, so when I make The Detection on a Youtube Video for Example Yolo gives me the Bound box of the object and LSTM gives in same video the Human Action @glenn-jocher

moahaimen commented 2 years ago

need I need to make recognition of a certain pose for human carry an object

suki1504 commented 2 years ago

@moahaimen LSTM models must be trained on video datasets. So as I tell everybody the first step is to have labelled data of the type you want to be able to predict.

thank you for answering I have a video dataset for the certain Actions i need, but i need to understand how to put LSTM with Yolo, so when I make The Detection on a Youtube Video for Example Yolo gives me the Bound box of the object and LSTM gives in same video the Human Action @glenn-jocher

@moahaimen

Hai, i am also having same usecase can you please share any idea or did you got any answer reg this ??

moahaimen commented 2 years ago

@moahaimen LSTM models must be trained on video datasets. So as I tell everybody the first step is to have labelled data of the type you want to be able to predict.

thank you for answering I have a video dataset for the certain Actions i need, but i need to understand how to put LSTM with Yolo, so when I make The Detection on a Youtube Video for Example Yolo gives me the Bound box of the object and LSTM gives in same video the Human Action @glenn-jocher

@moahaimen

Hai, i am also having same usecase can you please share any idea or did you got any answer reg this ??

i didnt have an answer that helps me, till this moment i am trying to find a solution but couldnt find any , i wish to get any help

github-actions[bot] commented 1 year ago

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

developer-gurpreet commented 12 months ago

@moahaimen Hi, I'm working on same project, Pass YOLO output to LSTM input, Have you implemented this? If yes can you share source code? I'm stuck on "Pass YOLO output as input to LSTM".

moahaimen commented 12 months ago

@moahaimen Hi, I'm working on same project, Pass YOLO output to LSTM input, Have you implemented this? If yes can you share source code? I'm stuck on "Pass YOLO output as input to LSTM".

unfortunately i couldn't if you know how to do that i would be grateful if you show me how

glenn-jocher commented 12 months ago

@developer-gurpreet

Thank you for reaching out. While YOLO and LSTM can be used together for certain applications like video analysis, directly passing YOLO output to LSTM can be a bit challenging. Integration between different models usually requires some pre-processing and data formatting.

Here's a general approach you can follow:

  1. Obtain YOLO detections: Use YOLOv5 or any other YOLO implementation to detect and obtain bounding boxes of objects in each frame of the video.

  2. Extract features: Extract features or representations from the detected objects using YOLO. These features can be the bounding box coordinates, class probabilities, or other relevant information.

  3. Pre-process the features: Before passing the features to the LSTM, you might need to pre-process them based on the requirements of your LSTM model. For example, you may need to normalize or scale the features, reshape them, or convert them to a time sequence format.

  4. Format data for LSTM: Your LSTM model expects a certain input format, typically a sequence of time steps. You will need to organize the pre-processed features into sequential data, with each time step representing a frame.

  5. Train LSTM model: Once the data is properly formatted, you can feed it into your LSTM model for training. You may need to adjust the architecture and hyperparameters of the LSTM model based on your specific use case.

Unfortunately, I do not have a specific code example for passing YOLO output to LSTM. However, I recommend referring to research papers, tutorials, or example projects that combine object detection and activity recognition to get a better understanding of the implementation details.

If you have any more specific questions or issues during the process, feel free to ask. Best of luck with your project!

caibird9996 commented 4 months ago

嗨,我正在做同一个项目,将 YOLO 输出传递给 LSTM 输入,你实现这个了吗?如果是,您可以分享源代码吗?我卡在“将 YOLO 输出作为输入传递到 LSTM”上。

Hello, have you completed it now? I am also stuck in this step

caibird9996 commented 4 months ago

嗨,我正在做同一个项目,将 YOLO 输出传递给 LSTM 输入,你实现这个了吗?如果是,您可以分享源代码吗?我卡在“将 YOLO 输出作为输入传递到 LSTM”上。

不幸的是,如果你知道如何做到这一点,我不能,如果你告诉我如何做,我将不胜感激

Hello, have you completed it now? I am also stuck in this step

glenn-jocher commented 4 months ago

@caibird9996 hello! It seems many of you are interested in integrating YOLOv5 detected objects as input to an LSTM for analyzing sequences or activities in videos. Unfortunately, as of now, there isn't a direct, out-of-the-box solution for this in the YOLOv5 documentation or repository. However, let me guide you on a basic approach that might help you progress:

  1. Detection with YOLOv5: Use YOLOv5 to detect objects in each frame of your video. This will give you information such as bounding boxes, class IDs, and confidence scores.

  2. Pre-process Data: Extract and process the output from YOLOv5 to create a suitable input for the LSTM. This could mean selecting specific features such as bounding box coordinates or even using feature extraction techniques to get more comprehensive feature vectors.

  3. Sequence Formation: Since LSTMs process sequential data, organize your processed YOLOv5 output into sequences that make sense for your particular use case. The sequences should be of a fixed length compatible with your LSTM.

  4. LSTM Training: Train your LSTM on these sequences. Your model should learn to understand temporal dynamics based on the object's positions and other features over time.

Please understand this is a very high-level overview, and integrating YOLOv5 with LSTM models involves several steps where details matter. You might need to dive into specifics based on your project needs, like optimizing data preprocessing or tweaking the LSTM model for better performance.

Here's a simple pseudocode idea to get you started:

# Pseudocode for integrating YOLOv5 outputs with LSTM
# Step 1: Detect with YOLOv5 and collect data
yolo_outputs = []
for frame in video_frames:
    detections = yolo.detect(frame)
    yolo_outputs.append(process_detections(detections))

# Step 2: Pre-process data and form sequences
sequences = create_sequences(yolo_outputs)

# Step 3: Prepare sequences for LSTM input - specifics depend on your LSTM setup
lstm_input = prepare_for_lstm(sequences)

# Step 4: Feed into LSTM for further processing/training
lstm_output = lstm_model.predict(lstm_input)

Remember, the success of your project depends on how well your LSTM can capture temporal relationships in your data. Experimenting with different approaches to utilizing YOLOv5 output for LSTM input is key. Good luck with your project! 🚀