tattle-made / feluda

A configurable engine for analysing multi-lingual and multi-modal content.
https://tattle.co.in/products/feluda/
GNU General Public License v3.0
14 stars 16 forks source link

[DMP 2024]: Clustering large amount of videos #81

Closed dennyabrain closed 1 month ago

dennyabrain commented 9 months ago

Ticket Contents

Description

Feluda allows researchers, factcheckers and journalists to explore and analyze large quantity of multimeda content. One important modality on Indian social media is video. The scope of this task is to explore various automated techniques suited for this task and after consultation with the team, implement an end to end workflow that can be used to surface visual or temporal trends in a large collection of videos.

Goals

Expected Outcome

Feluda's goal is to provide a simple CLI or scriptable interface for Analysing multimodal social media data. In that vein, all the work that you do should be executable and configurable via scripts and config files. The solution should look at feluda's architecture and its various components to identify best ways to enable this. The solution should have a way to configure data source (database with file IDs or a S3 bucket with files), specify and implement the data processing pipeline and where the result will be stored. Our current implementation uses S3 and SQL database for data source and Elasticsearch for storing result but additional sources or stores can be added if apt for this project.

Acceptance Criteria

Implementation Details

One way we have approached this is by using Vector Embeddings. We have done this to great success to surface visual trends in Images. We used ResNet model to generate vector embeddings and store them in elasticsearch. We also used t-sne to reduce the dimensions of the vector embeddings to then display them in a 2D visualization. It can be viewed here A detailed report over feluda's usage in a project to analyze images can be read here The relevant feluda operator can be studied here The code for tsne is here A prior study of various ways to get insights out of images has been documented here

Mockups/Wireframes

This is an interactive visualization of Image clustering done using Feluda. Screenshot 2024-02-16 at 08-16-56 Tattle - articles Doing UI development or integrating with any UI software is not part of this project but it might help to see what sort of downstream applications we use Feluda for.

Product Name

Feluda

Organisation Name

Tattle

Domain

Open Source Library

Tech Skills Needed

Computer Vision, Docker, Machine Learning, Performance Improvement, Python

Mentor(s)

@dennyabrain @duggalsu

Category

Data Science, Machine Learning

Sayanjones commented 7 months ago

Hey @dennyabrain I'm Sayan, am interested in contributing to the video analysis project! My skills in computer vision, machine learning, and Python are a great fit. I'm eager to explore video analysis using techniques like vector embeddings.

Proficient in Docker and performance optimization, I can ensure the solution scales efficiently. I value open-source development and look forward to contributing demos.

Is there a way you prefer for me to reach out? I'm looking forward to exploring how I can contribute.

dennyabrain commented 7 months ago

Hi @Sayanjones we can use this issue to communicate approaches. If you start concretely implementing something, you can make a new issue specific to your approach and we can take the conversation there.

Ris-code commented 7 months ago

Hi @dennyabrain

I'm Rishav Aich, pursuing my BTech in artificial intelligence and data science from IIT Jodhpur. Being a student of AI, I have done courses on deep learning, machine learning, and AI. I am proficient in C++, Python, and R programming languages. I have a strong background in development, more specifically, backend development. I have used Docker in various projects.

This project completely aligns with my skills. It would be great to contribute to this.

Please advise me on how to get started with the project.

AbhimanyuSamagra commented 7 months ago

Do not ask process related questions about how to apply and who to contact in the above ticket. The only questions allowed are about technical aspects of the project itself. If you want help with the process, you can refer instructions listed on Unstop and any further queries can be taken up on our Discord channel titled DMP queries.

Aryankb commented 7 months ago

Hey @dennyabrain , This is Aryan from IIIT - Naya Raipur, I am currently persuing my B. Tech in DATA SCIENCE AND ARTIFICIAL INTELLIGENCE. I have good experience in deep learning , computer vision, and NLP. I've worked on several projects, such as self-driving cars using camera input. I am really excited to work on this project as I feel this is a perfect match for me. Also, I am going to learn Docker in the future.

dennyabrain commented 7 months ago

Hi everyone,

Thank you for expressing interest in this issue. Depending on your interests and skills, you can take ANY ONE of the following approaches :

  1. Look at the problem statement and propose your approach Remember the main problem statement - Given a large number of video files, find a way to group identical and similar video files. This approach would be ideal for anyone who is interested in or studies ML and/or DSP. By thinking about the problem statement, reviewing existing literature on it and proposing your approach here, we would all learn something from it and the mentors should be able to nudge you in the right direction.

  2. Try getting feluda working on your machine Feluda is a moderately complex software and has many moving parts. Getting it working on your machine itself can be a challenge. We have a guide on it here. If you are is a software developer/tinkerer, this might be a good place to start because once you have Feluda working locally and you can see the various existing functionalities, that might give you an idea of how to proceed.

  3. Recreate our code on a jupyter notebook or google collab notebook We already have some code that takes video files and converts them into vectors. We also have code that takes these vectors and clusters them. I would take this approach if you are a software engineer with some ML engineering skills and you know your way around using ML models. Once you get this working on your notebook we can try out different pretrained models to evaluate performance.

You'll have me or members from our team to guide if you get stuck on any of these approaches. Taking some conrete steps on any of these 3 steps would help us know what your interests and skills are and give you concrete feedback when you get stuck.

All the best!

AbhimanyuSamagra commented 7 months ago

Do not ask process related questions about how to apply and who to contact in the above ticket. The only questions allowed are about technical aspects of the project itself. If you want help with the process, you can refer instructions listed on Unstop and any further queries can be taken up on our Discord channel titled DMP queries. Here's a Video Tutorial on how to submit a proposal for a project.

Aryankb commented 7 months ago

Hey @dennyabrain , i have some queries regarding the project :-

aatmanvaidya commented 7 months ago

Hey @dennyabrain , i have some queries regarding the project :-

  • what will be the length of videos?
  • Is there any available dataset with pre-defined classes ?
  • A video is a combination of audio, images and texts. what should be the most important classification criteria out of these?
  • How many classes should be there for classification? please give some examples.

Hi @Aryankb 1) Generally, expect the length to be anywhere between 30sec - 20mins. 2) Currently we don't have a dataset with pre-defined classes, but feel free to look for such datasets 3) To the best of my knowledge, a video is just a series of images, so to answer your question, the most important classification criteria would be image. Please investigate a bit deeper into this. Also take a look at the 3rd point in @dennyabrain comment. That is an example of clustering images using a certain type of embedding. 4) There is no specific number of classes, but think of classes as metadata to these videos in the context of social media. Some examples could be - memes, political, health, paper documents, news etc, these are very broad labels, you can think of some specific ones too.

Hope this helps

Mithilesh1609 commented 7 months ago

Hey @dennyabrain, Mithilesh here, I have experience and passion for creating end-to-end, highly scalable computer vision pipelines, I am working with a young start-up as a machine learning engineer, I have led a similar project implementation for one of the largest edTech companies in the world, where we worked on clustering on a similar type of video(avg length of 10 mins) and then recommended video based on user mistake in the test, where we work with embedding creation and efficient search algorithm, apart from this I have lead creating and scaling of the computer vision based exam grading tool from 50 users to 4 million users with docker and AWS, and bring down the running time by 70% over three iterations and that help government organize world's largest AI graded examination. I am very eager to contribute in this project and make clustering of video more efficient and scale it fast.

Aryankb commented 6 months ago

Hey @dennyabrain , I am Aryan Kumar Baghel, from IIIT - NAYA RAIPUR I was exploring the ways to extract unique frames from the video. I tried to extract unique keyframes from some videos using ffmpeg - to extract keyframes from the video , k-means- to extract unique keyframes from keyframes extracted by ffmpeg, and here are the results :-

(We can select one image from each cluster, as the representation of that cluster, then further we can use some image captioning models to generate small captions for each image. Next we can combine all captions to generate the final caption for the video or use them to classify the video accurately.)

Google Collab Notebook

Video Link : https://drive.google.com/file/d/1Qr08m4Bf0JjTszExDLoey2LCqcJjJl3n/view?usp=drive_link Clusters : image image image

**Video 2 link : https://drive.google.com/file/d/1QnupjsK7ILQUYrqlPT2pTdTAzoy8Wi-C/view?usp=drive_link Clusters** : image image image

I'll be now working on ways to cluster the images such that it selects the no. of clusters automatically, Please give your reviews and directions for the future work.

aaradhyasinghgaur commented 6 months ago

Hi everyone,

Thank you for expressing interest in this issue. Depending on your interests and skills, you can take ANY ONE of the following approaches :

1. Look at the problem statement and propose your approach
   Remember the main problem statement - Given a large number of video files, find a way to group identical and similar video files. This approach would be ideal for anyone who is interested in or studies ML and/or DSP. By thinking about the problem statement, reviewing existing literature on it and proposing your approach here, we would all learn something from it and the mentors should be able to nudge you in the right direction.

2. Try getting feluda working on your machine
   Feluda is a moderately complex software and has many moving parts. Getting it working on your machine itself can be a challenge. We have a guide on it [here](https://github.com/tattle-made/feluda/wiki/Setup-Feluda-Locally). If you are is a software developer/tinkerer, this might be a good place to start because once you have Feluda working locally and you can see the various existing functionalities, that might give you an idea of how to proceed.

3. Recreate our code on a jupyter notebook or google collab notebook
   We already have some code that takes [video files and converts them into vectors](https://github.com/tattle-made/feluda/blob/main/src/core/operators/vid_vec_rep_resnet.py). We also have code that takes these vectors and [clusters them](https://github.com/tattle-made/data-experiments/blob/master/tSNE-clustering.ipynb). I would take this approach if you are a software engineer with some ML engineering skills and you know your way around using ML models. Once you get this working on your notebook we can try out different pretrained models to evaluate performance.

You'll have me or members from our team to guide if you get stuck on any of these approaches. Taking some conrete steps on any of these 3 steps would help us know what your interests and skills are and give you concrete feedback when you get stuck.

All the best!

Hey @dennyabrain , I'm Aaradhya Singh , currently a 2nd year undergrad of computer science and engineering , proficcient in C/C++ , python , deep learning and machine learning and a researcher and learner for various upcoming technlogies and tech stacks...after reading at your suggested approches ......I might be able to fine tune some models to the efficiency which are mostly built upon CNN/RNN architectures and use pipeline/heirarchical approach to solve the complex problem of the classification or creating clusters of the content....looking forward to work on it and updating on my findings

dennyabrain commented 5 months ago

@Snehil-Shah can you comment here, so I can assign the issue to you?

Snehil-Shah commented 5 months ago

@dennyabrain Yes.

aatmanvaidya commented 5 months ago

Weekly Goals

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week 8

Week 9

Week 10

Week 11

Week 12

Week 13

Snehil-Shah commented 5 months ago

Weekly Learnings & Updates

Week 1