ioos / gsoc

Information about IOOS activities for Google Summer of Code
BSD 3-Clause "New" or "Revised" License
22 stars 10 forks source link

Machine Learning with Sea Floor Sampling Video #4

Closed mwengren closed 1 year ago

mwengren commented 3 years ago

Project Description:

Traditionally, surverys of the sea floor are conducted via vessel-mounted cameras which record video as the vessel moves in the water. Hundreds of hours of video are recorded and are often manually processed to determine which species are present on the locations in the video. This project seeks to automate the process using image processing. The intern will help prepare machine learning models, such as artificial neural networks, using available video footage from benthic surverying missions. The intern will partner with biology staff and software staff to train the model and perform data validation.

Expected Outcomes: A capable machine learning model which can be used to identify species from video transect data of the sea floor.

Skills required:

Familiarity with a programming language (Python, R) and a general understanding of how machine learning models operate. Experience with image processing is a bonus.

Difficulty:

Moderately difficult

Mentor(s):

@daltonkell Dalton Kell (Software Engineer), Matt Iannucci (Software Engineer), Tara Franey (GIS Specialist), Stephanie Berkman (Biologist), Joe Zottoli (Biologist)

Rohan-cod commented 3 years ago

Venerated Sir, I hope you are safe and in good health in the wake of prevailing COVID-19. My name is Rohan Gupta and I am a 3rd-year Computer Science undergraduate student at Shri Mata Vaishno Devi University. I have been working with Python and deep learning for a couple of years now and have in-depth knowledge of it. I look forward to contributing to this idea as part of this year's GSoC. It would be a great assistance if you could suggest how to get started. My Linkedin Profile:- https://www.linkedin.com/in/rohang4837b4124/

lohithmunakala commented 3 years ago

@mwengren @daltonkell, I am Munakala Lohith, a 3rd Year Undergrad from Indian Institue of Information Technology Kalyani. I have been working on Deep Learning and Object detection for 2 years now and my research work focuses on objected detection for cellular images.

I would like to work on this particular project as it aligns perfectly well with my research interests as well as my projects. I had enquired on the mailing list regarding a sample video and the details of the diverse set of species we will be looking at. This would help me have a better understanding of the project and the roadmap ahead for GSoC 21 and how the contributions are to be made.

lohithmunakala commented 3 years ago

@daltonkell are we looking at coral reefs specifically or are we going for all the species that are present ?

daltonkell commented 3 years ago

Hi @lohithmunakala, we're mostly looking at the very numerous species that were found, like sea stars and sand dollars.

lohithmunakala commented 3 years ago

Hey @daltonkell, I had looked at VIAME as suggested by @/jermy on the mailing list. I had also gone through this dataset titled National Coral Reef Monitoring Program: Benthic Cover Derived from Analysis of Benthic Images Collected during Stratified Random Surveys (StRS) across the Main Hawaiian Islands

I was going through how the dataset was developed by manual segmentation of the different species found in the video timestamps.

Can you let me know if this is what we are exactly targeting? The creation of databases by automating species recognition. Since VIAME has a GUI and is a full-blown object detector, would we be focused on only marine life and a simple application that generates the database of all the marine life found in the videos?

Lastly, would you let me know the formal communication process through which we could communicate further?

Thanks!

adityaagarwal1710 commented 3 years ago

@daltonkell , I am Aditya Agarwal completing my graduation and specialization(Artificial Intelligence and Machine Learning) from Jaypee University Of Engineering And Technology, Guna. I wanted to work on this project because it aligns with my field of specialization and interest. I promise you that i will be able to complete the Machine Learning Model to identify species from video provided. I also have interest in knowing more about the underwater aquatic species as it has been one of the favorite shows on discovery and Sony BBC Earth channel.

nitishsinghal1 commented 3 years ago

Hi @mwengren @daltonkell,Hope this message finds you in good health , I am Nitish Singhal, a 3rd Year Undergrad from The Northcap University , Gurgaon ,India with a specialization in Data Science . Since past 2.5 years working with python , machine learning , Deep learning and recently image processing also .

What drives me to this project is the innovative idea and its huge real life applications . It aligns with my research as well as project interests too.

It would be very helpful if you can further guide us about the dataset . As mentioned that we have to identify species from the video footage so do we first have to do the manual segmentation to get the images or we could first try the shape identification on the video part and then get the images from footage through segmentation and then identify the species.

Please provide your valuable guidance and brief the project a little more .

lohithmunakala commented 3 years ago

Hey @mwengren, @daltonkell!

I would like to know how to make a commit/PR for this particular project as there is no existing repo for this. Would it be advisable to create a repo on my GitHub profile and proceed further?

Please help me out with this.

Thanks!

harshshaw commented 3 years ago

hi @mwengren @daltonkell I'm harsh shaw from SRM chennai a second year undegraduate student , i will like to contribute to this project as it seems very interesting topic to work on even for the DSC google challenge we took a topic on marine life poaching solution ... i will like to discuss further on this topic along with the mentors below i have attached my linkedin and github account Thank you! Linkedin:- https://www.linkedin.com/in/harsh-shaw-070105174 Github :-https://github.com/harshshaw

Simardeep27 commented 3 years ago

Hello @daltonkell @mwengren , I am Simardeep Singh Sethi. I am a third-year undergraduate student pursuing Bachelors in Technology (Computer Science) from Guru Gobind Singh Indraprastha University (GGSIPU), New Delhi, India. I have a great interest in Machine Learning/Deep Learning and really passionate in learning new topics introduced in the industry. I have a good command over Python language and understand how different models and CNN architecture’s function. I have a good experience in image pre-processing, which I think would be a benefit for this project. I have worked as a deep learning intern previously at MAVOIX solution private limited and created two models, an object recognition and text recognition, for real-time detection, this feature was to be used in an Android Application for the health-tech company. Hence, I am looking forward to work on this project.

Guidance Required:

  1. Doubt regarding the data that would be provided, will it be annotated or raw data?
  2. How many different species will need to be identified using the model?
kunalshah03 commented 3 years ago

@daltonkell @mwengren I am a Pre-final year computer Engineering student studying at Pune Institute of Computer Technology. My CGPA is 9.5/10 for the first four semesters. I am excited about advances in Machine Learning and have been exploring this field lately. I am interested in this project. I read the abstract provided on the projects page of GSOC'21. I would like to get more insights on this topic as to what is expected for the project and would be glad to get your guidance regarding how to proceed further. Regards, Kunal Shah.

geetikaahuja commented 3 years ago

@daltonkell Hope you are doing good! I am Geetika Ahuja, currently pursuing coursework in data science. I have approximately 4 years of experience on working on Machine Learning and Deep Learning projects.

I would like to work on this particular project as it aligns perfectly well with my area of interest.

I have gone through dataset titled National Coral Reef Monitoring Program: Benthic Cover Derived from Analysis of Benthic Images Collected during Stratified Random Surveys (StRS) across the Main Hawaiian Islands. It has 10 different classes / categories of objects identified.

Some of my queries:

  1. Are we intending to go ahead with the same data I mentioned above?
  2. Are we planning to use VIAME. Do I need to setup the same?

Also if you can guide me through the process ahead or point me to already existing repository if any.

Thanks, Geetika

daltonkell commented 3 years ago

Hi all,

Thank you for your interest in this project! Here is some more information:

Traditionally, seafloor habitat visual characterization surveys are conducted via towed or vessel-mounted cameras which record video and still images as the vessel travels along a transect. Hundreds of hours of video are recorded and are often manually processed to determine which species and habitat are present in the video and stills. This project seeks to automate the process using image processing. Specifically, we aim to:

A. Successfully identify and enumerate (at least very abundant) organisms larger than a minimum threshold (e.g. 4 cm). B. Identify the habitat down to its specific components (e.g., sand, gravel, shells, biotic) in still images and calculate percent composition. C. Incorporate color correction as necessary. D. Calculate the total area surveyed (or in each image) using parallel-mounted calibration lasers

We currently review entire videos manually for species and calculate percent bottom composition by randomly assigning 50-100 points and manually identifying the bottom type under each point to classify individual images. An effort to make this process more efficient included training a deep learning model and then performing object detection and tracking to count unique individuals for a couple highly numerous species. Interns would be asked to help improve and expand this effort, optimally in multiple of the areas listed below:

  1. Identification and enumeration of species: a. Improving object detection: by identification and use of a more suitable model, image processing, alternative training method, or other method b. Improve tracking: by use or generation of a more suitable tracking algorithm c. Expand detection to additional species
  2. Identification of habitat down to it specific components a. Develop or apply a model to classify regions of the video or still frames of the video based on the specific components (e.g. sand, gravel, shells).
  3. Detect additional information from the video, such as calibration laser points or camera-added text, in order to support the above analysis or assist in managing the resulting data.
  4. Define and generate metrics to benchmark and assess the effectiveness of different approaches.
  5. Optimize output tables for easy integration with reports (standardized information and format).

The process will include working with biologists and other staff to create a workflow that is easy to use, repeatable, and produces output that allows the overall aim (seafloor habitat visual characterization) to be conducted more efficiently.

lohithmunakala commented 3 years ago

Hi all,

Thank you for your interest in this project! Here is some more information:

Traditionally, seafloor habitat visual characterization surveys are conducted via towed or vessel-mounted cameras which record video and still images as the vessel travels along a transect. Hundreds of hours of video are recorded and are often manually processed to determine which species and habitat are present in the video and stills. This project seeks to automate the process using image processing. Specifically, we aim to:

A. Successfully identify and enumerate (at least very abundant) organisms larger than a minimum threshold (e.g. 4 cm). B. Identify the habitat down to its specific components (e.g., sand, gravel, shells, biotic) in still images and calculate percent composition. C. Incorporate color correction as necessary. D. Calculate the total area surveyed (or in each image) using parallel-mounted calibration lasers

We currently review entire videos manually for species and calculate percent bottom composition by randomly assigning 50-100 points and manually identifying the bottom type under each point to classify individual images. An effort to make this process more efficient included training a deep learning model and then performing object detection and tracking to count unique individuals for a couple highly numerous species. Interns would be asked to help improve and expand this effort, optimally in multiple of the areas listed below:

1. Identification and enumeration of species:
   a. Improving object detection: by identification and use of a more suitable model, image processing, alternative training method, or other method
   b. Improve tracking: by use or generation of a more suitable tracking algorithm
   c. Expand detection to additional species

2. Identification of habitat down to it specific components
   a. Develop or apply a model to classify regions of the video or still frames of the video based on the specific components (e.g. sand, gravel, shells).

3. Detect additional information from the video, such as calibration laser points or camera-added text, in order to support the above analysis or assist in managing the resulting data.

4. Define and generate metrics to benchmark and assess the effectiveness of different approaches.

5. Optimize output tables for easy integration with reports (standardized information and format).

The process will include working with biologists and other staff to create a workflow that is easy to use, repeatable, and produces output that allows the overall aim (seafloor habitat visual characterization) to be conducted more efficiently.

@daltonkell , thanks for all the information! Is there any headstart with regards to the data that will be provided for training or will it be provided only after the program starts?

A sample dataset would help get an overall idea about how to write the proposal.

Thanks!

geetikaahuja commented 3 years ago

Hi all,

Thank you for your interest in this project! Here is some more information:

Traditionally, seafloor habitat visual characterization surveys are conducted via towed or vessel-mounted cameras which record video and still images as the vessel travels along a transect. Hundreds of hours of video are recorded and are often manually processed to determine which species and habitat are present in the video and stills. This project seeks to automate the process using image processing. Specifically, we aim to:

A. Successfully identify and enumerate (at least very abundant) organisms larger than a minimum threshold (e.g. 4 cm). B. Identify the habitat down to its specific components (e.g., sand, gravel, shells, biotic) in still images and calculate percent composition. C. Incorporate color correction as necessary. D. Calculate the total area surveyed (or in each image) using parallel-mounted calibration lasers

We currently review entire videos manually for species and calculate percent bottom composition by randomly assigning 50-100 points and manually identifying the bottom type under each point to classify individual images. An effort to make this process more efficient included training a deep learning model and then performing object detection and tracking to count unique individuals for a couple highly numerous species. Interns would be asked to help improve and expand this effort, optimally in multiple of the areas listed below:

  1. Identification and enumeration of species: a. Improving object detection: by identification and use of a more suitable model, image processing, alternative training method, or other method b. Improve tracking: by use or generation of a more suitable tracking algorithm c. Expand detection to additional species
  2. Identification of habitat down to it specific components a. Develop or apply a model to classify regions of the video or still frames of the video based on the specific components (e.g. sand, gravel, shells).
  3. Detect additional information from the video, such as calibration laser points or camera-added text, in order to support the above analysis or assist in managing the resulting data.
  4. Define and generate metrics to benchmark and assess the effectiveness of different approaches.
  5. Optimize output tables for easy integration with reports (standardized information and format).

The process will include working with biologists and other staff to create a workflow that is easy to use, repeatable, and produces output that allows the overall aim (seafloor habitat visual characterization) to be conducted more efficiently.

@daltonkell Thank you for detailed description. I have some questions on the same:

Thanks, Geetika

lohithmunakala commented 3 years ago

Hey @daltonkell!

I have submitted the proposal. Please let me know any changes that you would like me to implement to make the proposal more effective.

Thanks!

geetikaahuja commented 3 years ago

@daltonkell

I have submitted my proposal for the project. If you could please review the same and let me know in case any changes are required.

Thanks & Regards, Geetika

daltonkell commented 3 years ago

Thanks all to have sent in proposals - we will be reviewing shortly and providing feedback. Thanks!

7yl4r commented 2 years ago

Does anyone have thoughts on biigle? A collegue is exploring it for her similar use case.

ocefpaf commented 1 year ago

Closing all past GSoC issues. Please open a new issue if you want to participate in GSoC23.