orcasound / orcagsoc

Google Summer of Code projects and products related to Orcasound & orca sounds
http://orcasound.net
MIT License
19 stars 14 forks source link

End-to-end Github Action Workflows #33

Open valentina-s opened 2 years ago

valentina-s commented 2 years ago

The goal of this project is to develop and end-to-end scientific use case building upon the Github action workflows developed in the 2021 GSoC project Orcasound Github Actions workflows. A suite of advancements could support this mission, and can be built in whatever order is of greatest interest to the GSoC contributor, including:

Expected outcomes: Improve automated analysis of Orcasound data through Github actions.

Required skills: Experience with Github Actions, Python

Bonus skills: Machine learning, Postgres, Docker, Signal Processing, Visualization

Mentors: Valentina, Jesse

Advisors: Dmitry

Difficulty level: Hard

Project Size: 175 or 350 h

Getting Started

Look through the existing Github Issues

Apoorvgarg-creator commented 2 years ago

@yosoyjay @valentina-s @pmdnhd Hello, I am Apoorv Garg, a 3rd-year student at the Netaji Subhas University of Technology, Delhi, and I am interested in working on this project, I have gone through the repository orca-action-workflow. I have experience in writing tests in Python and CI/CD. I have also experience in Machine learning, Docker, and visualization.

yosoyjay commented 2 years ago

Hi @Apoorvgarg-creator! In looking through the open issues, did anything you see look interesting? Did you notice any gaps that are not currently addressed by the issues?

Apoorvgarg-creator commented 2 years ago

Hi @Apoorvgarg-creator! In looking through the open issues, did anything you see look interesting? Did you notice any gaps that are not currently addressed by the issues?

For now, you will have to manually change the timestamp in the workflow file. read this in the README File, but I couldn't find the Open issue for this.
I read an Issue - Click detection, I find this one interesting, went through the link mentioned in the issue and was learning more about Harbor Porpoises' clicks.

I also read about the humpback detection algorithm issue, and read an article to understand the algorithm.

yosoyjay commented 2 years ago

For now, you will have to manually change the timestamp in the workflow file.

@Apoorvgarg-creator Can you point me to where that text is? I'm afraid to say I can't find it.

And, yes, accurately identifying harbor porpoises clicks and the humpback whale calls to eliminate as potential orca calls would be quite useful. Fortunately, as you've noticed there is a lot of prior art for those problems, so it is very much an implementation and integration issue.

Apoorvgarg-creator commented 2 years ago

For now, you will have to manually change the timestamp in the workflow file.

@Apoorvgarg-creator Can you point me to where that text is? I'm afraid to say I can't find it.

@yosoyjay

44016CD1-1BF7-4F64-A51E-80D8257A21FB

Apoorvgarg-creator commented 2 years ago

@valentina-s @yosoyjay I just want to confirm that the selected candidate will have to work on all the listed tasks -

  • Applying a noise calculation and visualization
  • Applying a humpback detection algorithm
  • Applying an echolocation detector
  • Visualizing results
  • Running automatically when a community scientists adds manual annotation via the live-listening web app (Heroku-based Postgres database)
  • Building dockerized versions
  • Documenting how others can run their algorithms
  • Adding scoring comparison functionality
valentina-s commented 2 years ago

@Apoorvgarg-creator It does not need to be all at once. As a priority, I think it will be best to have one well functioning scientific pipeline showing the usability. Then the actual algorithm hopefully can be easily swapped. In some sense this is the idea: that people focus on the algorithms and can easily apply them to new data.

Apoorvgarg-creator commented 2 years ago

@valentina-s Thank you for clearing the doubt, also I read this blog which helped a lot in understanding the outcome of the project.

Just to ensure I have not got the task wrong, Documenting how others can run their algorithms In this task, what basically means "their algorithm", as you have stated We have to build a one well functioning pipeline. To explain where I am confused right now, let's take an example I built the pipeline applying the humpback whale algorithm using this model MODEL, So others can be given the chance to change this model and test their result, Is that's what it's meant by others can run their algorithms?

valentina-s commented 2 years ago

@Apoorvgarg-creator Yes, we will have to give them instructions how to organize their inputs and outputs, and possibly if there are different requirements we can guide them how to set up a docker image that the github action can run against the data. One would need to account for that different people may be using different Deep Learning frameworks, or more generally some people may not be even using deep learning algorithms at all, but some other scientific signal processing procedure.

Apoorvgarg-creator commented 2 years ago

Thank you @valentina-s.