technosaby / RedHenAudioTagger

MIT License
1 stars 4 forks source link

Project Set Up #2

Open technosaby opened 2 years ago

technosaby commented 2 years ago

The idea is to prepare the project set up in the Singularity inside Redhen's infrastructure

technosaby commented 2 years ago

@brucearctor I need one help here. I am not able to run the script (to generate the audio files from video) in Case HPC. Do I need to create a Docker env for the same ?

brucearctor commented 2 years ago

Ultimately, yes, everything needs to be able to run in the infra -- likely just a bit of specific containerizing/packaging to get things working [ python/tensorflow/etc will run in that environment ]. Check with the community ( ex: slack ) or hop on one of the calls Wednesday or Friday for some preliminary tips, if needed.

turnermarkb commented 2 years ago

For clips, see https://sites.google.com/case.edu/techne-data-requests/home

On Jun 21, 2022, at 11:10 AM, brucearctor @.***> wrote:

Ultimately, yes, everything needs to be able to run in the infra -- likely just a bit of specific containerizing/packaging to get things working [ python/tensorflow/etc will run in that environment ]. Check with the community ( ex: slack ) or hop on one of the calls Wednesday or Friday for some preliminary tips, if needed.

— Reply to this email directly, view it on GitHub https://github.com/technosaby/gsoc2022/issues/2#issuecomment-1161895672, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACTVVWMZZDGUJPF4X5GXUD3VQHLQBANCNFSM5YGRVSIA. You are receiving this because you are subscribed to this thread.

technosaby commented 2 years ago

For clips, see https://sites.google.com/case.edu/techne-data-requests/home

@turnermarkb Sorry I could not understand your comments. I was thinking to run audio processing from "/mnt/rds/redhen/gallina/tv/2022" folder first and for other years (2021,...) and generate the audio files in my Gallina home. After that I plan to do the tagging and store the results in safe. Is this correct approach or we need to run this on some specified set of file ?

turnermarkb commented 2 years ago

This seems like a good approach to me. How many clips, how will you make them, how much storage? Gallina is vast. m

On Jun 21, 2022, at 9:47 PM, Sabyasachi Ghosal @. @.>> wrote:

For clips, see https://sites.google.com/case.edu/techne-data-requests/home https://sites.google.com/case.edu/techne-data-requests/home … <x-msg://10/#> @turnermarkb https://github.com/turnermarkb Sorry I could not understand your comments. I was thinking to run audio processing from "/mnt/rds/redhen/gallina/tv/2022" folder first and for other years (2021,...) and generate the audio files in my Gallina home. After that I plan to do the tagging and store the results in safe. Is this correct approach or we need to run this on some specified set of file ?

— Reply to this email directly, view it on GitHub https://github.com/technosaby/gsoc2022/issues/2#issuecomment-1162530949, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACTVVWNDFV5YYTLEDVTM2OLVQJWBPANCNFSM5YGRVSIA. You are receiving this because you were mentioned.

technosaby commented 2 years ago

@brucearctor I was able to create a tensorflow based local docker image from github workflows. Then I creates a local singularity container and copied it to HPC. Now I plan to run the container in HPC to execute my scripts.

Can you please check if I am going in the correct direction (Blog: https://technosaby.github.io/gsoc/phase1/week5) . The latest code is in main branch.

turnermarkb commented 2 years ago

Just mentioning that we don’t require a Singularity container until near the end of your project. It’s fine to work on the code outside of Singularity until you’ve got it in good shape and then to make the container. m

On Jun 26, 2022, at 6:41 AM, Sabyasachi Ghosal @. @.>> wrote:

@brucearctor https://github.com/brucearctor I was able to create a tensorflow based local docker image from github workflows and pushed it to docker hub. I am planning to clone the image from the dockerhub and make a container in the HPC. Can you please check if I am going in the correct direction (Blog: https://technosaby.github.io/gsoc/phase1/week5 https://technosaby.github.io/gsoc/phase1/week5) . The latest code is in main branch.

— Reply to this email directly, view it on GitHub https://github.com/technosaby/gsoc2022/issues/2#issuecomment-1166489179, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACTVVWJAHPWRS3THIFNONULVRAXWVANCNFSM5YGRVSIA. You are receiving this because you were mentioned.

technosaby commented 2 years ago

@turnermarkb Thanks for your suggestion. Can you please explain what do you mean by "outside of Singularity" ? Do you mean only use the docker and not singularity ? If so, I could not find a way to run the docker containers directly in HPC. Please let me know if there is a resource which I have missed.

So I am using this approach,

Build Scripts (Local) -> Put in Docker container (Local) -> Build Singularity container sif image(Local) -----Copy to HPC ----> Execute containers (HPC).

As there is no existing audio pipleine, I am planning to build all audio data from the videos from /mnt/rds/redhen/gallina/tv/2021 and extend it for other videos (years) later. As the size of this is big, I need to run it in HPC.

Please let me know if this approach is correct or there is some faster way to do things as the tensorflow based sif image is also around 3GB, so takes a good amount of time to copy.

brucearctor commented 2 years ago

1) @technosaby -- I like that you're getting containers going. It does seem there is a possibility that CWRU HPC might support docker, in addition to singularity. The path you're on re: docker/singularity, seems fine for your current stage. Singularity isn't going to hurt anything -- ultimately, the choice of runtime singularity/docker should just be one minor implementation detail [ even though getting to work and run in HPC infrastructure is required ].

2) Prove that the tagging and a 'pipeline' can work for a single video file, then multiple, then more ... don't worry about addressing for year/years at this time. You'll want to explore [ at some times manually ] over many files to ensure you're happy with the performance of your tagger, and that the output produced on a given file is in the desired format.

I think that @turnermarkb is also saying -- no need ( and probably not even desired ... until the end of your project ) to get things running over years of data. It is great if you are prepared to do so, but you'll want to run it over years with what you determine to be the optimal model, which I imagine that you will iterate on throughout the summer.

turnermarkb commented 2 years ago

No, I mean you don’t need to use docker or singularity if it’s convenient not to, until the end; we need your project to end up in a container, but some coders prefer to work without a container.

There are Red Hen audio pipelines: https://www.redhenlab.org/home/the-cognitive-core-research-topics-in-red-hen/audio-processing-pipeline https://www.redhenlab.org/home/the-cognitive-core-research-topics-in-red-hen/audio-processing-pipeline

m

On Jun 26, 2022, at 2:33 PM, Sabyasachi Ghosal @.***> wrote:

@turnermarkb https://github.com/turnermarkb Thanks for your suggestion. Can you please explain what do you mean by "outside of Singularity" ? Do you mean only use the docker and not singularity ? If so, I could not find a way to run the docker containers directly in HPC. Please let me know if there is a resource which I have missed.

So I am using this approach,

Build Scripts (Local) -> Put in Docker container (Local) -> Build Singularity container (Local) -----Copy to HPC ----> Execute containers (HPC).

As there is no existing audio pipleine, I am planning to build all audio data from the videos from /mnt/rds/redhen/gallina/tv/2021 and extend it for other videos (years) later. As the size of this is big, I need to run it in HPC.

Please let me know if my understanding is correct.

— Reply to this email directly, view it on GitHub https://github.com/technosaby/gsoc2022/issues/2#issuecomment-1166610401, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACTVVWIYHUAWTMNFVNKFA7TVRCO6PANCNFSM5YGRVSIA. You are receiving this because you were mentioned.

technosaby commented 2 years ago
  1. @technosaby -- I like that you're getting containers going. It does seem there is a possibility that CWRU HPC might support docker, in addition to singularity. The path you're on re: docker/singularity, seems fine for your current stage. Singularity isn't going to hurt anything -- ultimately, the choice of runtime singularity/docker should just be one minor implementation detail [ even though getting to work and run in HPC infrastructure is required ].
  2. Prove that the tagging and a 'pipeline' can work for a single video file, then multiple, then more ... don't worry about addressing for year/years at this time. You'll want to explore [ at some times manually ] over many files to ensure you're happy with the performance of your tagger, and that the output produced on a given file is in the desired format.

I think that @turnermarkb is also saying -- no need ( and probably not even desired ... until the end of your project ) to get things running over years of data. It is great if you are prepared to do so, but you'll want to run it over years with what you determine to be the optimal model, which I imagine that you will iterate on throughout the summer.

@brucearctor Thanks for your comments. I will keep this task for later work and work on baselining.

For now I am processing the audio using my script.

turnermarkb commented 2 years ago

Yes. m

On Jun 26, 2022, at 9:47 PM, Sabyasachi Ghosal @. @.>> wrote:

I think that @turnermarkb https://github.com/turnermarkb is also saying -- no need ( and probably not even desired ... until the end of your project ) to get things running over years of data. It is great if you are prepared to do so, but you'll want to run it over years with what you determine to be the optimal model, which I imagine that you will iterate on throughout the summer.

technosaby commented 2 years ago

Final model updates and merging to singularity container for delivery will be taken care in the last milestone

technosaby commented 2 years ago

As discussed in last meeting with @turnermarkb , as the tagging is being done properly, it is the correct time to do the packaging and them focus on improving that from there. So I will work on making a singularity image from my codebase.

brucearctor commented 2 years ago

Yes, start with the baseline of things working -- tagging works, now operationalize with good foundations -- then optimize/retrain/improve.

technosaby commented 2 years ago

After copying the video files from the /mnt/rds/rehen/gallina to my scratch folder and then running the scripts using the singularity container from the docker file, all tags get generated properly @brucearctor @turnermarkb