kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.26k stars 5.32k forks source link

official Docker images for Kaldi #3284

Closed mdoulaty closed 2 years ago

mdoulaty commented 5 years ago

Are there any plans to add official docker images for Kaldi on Docker Hub? Running Kaldi inside containers might be quite helpful for some users/workloads and I think having official Kaldi images in Docker Hub would be a good thing to have we can setup automated builds for cpu and gpu based images and I can help with the setup etc if this is something that you think would be beneficial to other users (we've some good experience with running containerized Kaldi ASR workloads, both training and decoding on slurm cluster)

danpovey commented 5 years ago

Docker is not something that I really use myself so I wouldn't be able to help a lot. If you are willing to help I'm open to the idea though.

On Thu, May 2, 2019 at 11:22 AM Mortaza (Morrie) Doulaty < notifications@github.com> wrote:

Are there any plans to add official docker images for Kaldi on Docker Hub? Running Kaldi inside containers might be quite helpful for some users/workloads and I think having official Kaldi images in Docker Hub would be a good thing to have we can setup automated builds for cpu and gpu based images and I can help with the setup etc if this is something that you think would be beneficial to other users (we've some good experience with running containerized Kaldi ASR workloads, both training and decoding on slurm cluster)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3284, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZFLO56WOZQUH372QRFJELPTMBKPANCNFSM4HJ7MGAQ .

mdoulaty commented 5 years ago

yes, happy to help to start with, we'll need to setup a new public repository in Docker Hub (http://hub.docker.com/), which is the container registry that we're going to use and since you're the yoda master, probably makes sense that you own the organisation and the repository (similar to github) - so the orgname would be kaldi-asr and the repository name would be kaldi then the account owner needs to connect those two accounts together (meaning DockerHub and GitHub) so that we can set automated builds whenever something new is pushed Similar to other projects, we can have latest dev images (both CPU and GPU versions) also have images for branches that are more stable (I can see 5.0, 5.1, 5.2, 5.3, 5.4 branches which seems like some stable versions), again both CPU and GPU versions

galv commented 5 years ago

What service supports doing automated builds of docker containers? Does Docker Hub itself support that? I admit that I am not very familiar with this. Right now, we are using Travis CI, which has no concept of Docker.

On Thu, May 2, 2019 at 9:44 AM Mortaza (Morrie) Doulaty < notifications@github.com> wrote:

yes, happy to help to start with, we'll need to setup a new public repository in Docker Hub ( http://hub.docker.com/), which is the container registry that we're going to use and since you're the yoda master, probably makes sense that you own the organisation and the repository (similar to github) - so the orgname would be kaldi-asr and the repository name would be kaldi then the account owner needs to connect those two accounts together (meaning DockerHub and GitHub) so that we can set automated builds whenever something new is pushed Similar to other projects, we can have latest dev images (both CPU and GPU versions) also have images for branches that are more stable (I can see 5.0, 5.1, 5.2, 5.3, 5.4 branches which seems like some stable versions), again both CPU and GPU versions

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3284#issuecomment-488745832, or mute the thread https://github.com/notifications/unsubscribe-auth/ABEL6UAXJOMWUJA6DCXJOFDPTMLABANCNFSM4HJ7MGAQ .

-- Daniel Galvez http://danielgalvez.me https://github.com/galv

galv commented 5 years ago

BTW, something to be aware of is that Kaldi uses absolute paths in a lot of its files. The fact that Docker allows you to mount paths with different names every time you run a container may cause some problems if you change mounts frequently.

On Thu, May 2, 2019 at 9:48 AM Daniel Galvez dt.galvez@gmail.com wrote:

What service supports doing automated builds of docker containers? Does Docker Hub itself support that? I admit that I am not very familiar with this. Right now, we are using Travis CI, which has no concept of Docker.

On Thu, May 2, 2019 at 9:44 AM Mortaza (Morrie) Doulaty < notifications@github.com> wrote:

yes, happy to help to start with, we'll need to setup a new public repository in Docker Hub ( http://hub.docker.com/), which is the container registry that we're going to use and since you're the yoda master, probably makes sense that you own the organisation and the repository (similar to github) - so the orgname would be kaldi-asr and the repository name would be kaldi then the account owner needs to connect those two accounts together (meaning DockerHub and GitHub) so that we can set automated builds whenever something new is pushed Similar to other projects, we can have latest dev images (both CPU and GPU versions) also have images for branches that are more stable (I can see 5.0, 5.1, 5.2, 5.3, 5.4 branches which seems like some stable versions), again both CPU and GPU versions

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3284#issuecomment-488745832, or mute the thread https://github.com/notifications/unsubscribe-auth/ABEL6UAXJOMWUJA6DCXJOFDPTMLABANCNFSM4HJ7MGAQ .

-- Daniel Galvez http://danielgalvez.me https://github.com/galv

-- Daniel Galvez http://danielgalvez.me https://github.com/galv

danpovey commented 5 years ago

I created an organization kaldiasr (no - allowed) but the next steps look complicated. I could add someone else there, e.g. you, or preferably @galv or @kkm .

On Thu, May 2, 2019 at 12:50 PM Daniel Galvez notifications@github.com wrote:

BTW, something to be aware of is that Kaldi uses absolute paths in a lot of its files. The fact that Docker allows you to mount paths with different names every time you run a container may cause some problems if you change mounts frequently.

On Thu, May 2, 2019 at 9:48 AM Daniel Galvez dt.galvez@gmail.com wrote:

What service supports doing automated builds of docker containers? Does Docker Hub itself support that? I admit that I am not very familiar with this. Right now, we are using Travis CI, which has no concept of Docker.

On Thu, May 2, 2019 at 9:44 AM Mortaza (Morrie) Doulaty < notifications@github.com> wrote:

yes, happy to help to start with, we'll need to setup a new public repository in Docker Hub ( http://hub.docker.com/), which is the container registry that we're going to use and since you're the yoda master, probably makes sense that you own the organisation and the repository (similar to github) - so the orgname would be kaldi-asr and the repository name would be kaldi then the account owner needs to connect those two accounts together (meaning DockerHub and GitHub) so that we can set automated builds whenever something new is pushed Similar to other projects, we can have latest dev images (both CPU and GPU versions) also have images for branches that are more stable (I can see 5.0, 5.1, 5.2, 5.3, 5.4 branches which seems like some stable versions), again both CPU and GPU versions

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/kaldi-asr/kaldi/issues/3284#issuecomment-488745832 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ABEL6UAXJOMWUJA6DCXJOFDPTMLABANCNFSM4HJ7MGAQ

.

-- Daniel Galvez http://danielgalvez.me https://github.com/galv

-- Daniel Galvez http://danielgalvez.me https://github.com/galv

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3284#issuecomment-488747638, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZFLO66A7E6CWD5QX6T7HLPTMLTRANCNFSM4HJ7MGAQ .

danpovey commented 5 years ago

I mean, maybe if I add someone else to the team for the kaldiasr organization on dockerhub they can do the next steps. BTW, I don't want to build images for much older versions, the support gets too much. 5.4 is the lowest I'd go.

On Thu, May 2, 2019 at 12:54 PM Daniel Povey dpovey@gmail.com wrote:

I created an organization kaldiasr (no - allowed) but the next steps look complicated. I could add someone else there, e.g. you, or preferably @galv or @kkm .

On Thu, May 2, 2019 at 12:50 PM Daniel Galvez notifications@github.com wrote:

BTW, something to be aware of is that Kaldi uses absolute paths in a lot of its files. The fact that Docker allows you to mount paths with different names every time you run a container may cause some problems if you change mounts frequently.

On Thu, May 2, 2019 at 9:48 AM Daniel Galvez dt.galvez@gmail.com wrote:

What service supports doing automated builds of docker containers? Does Docker Hub itself support that? I admit that I am not very familiar with this. Right now, we are using Travis CI, which has no concept of Docker.

On Thu, May 2, 2019 at 9:44 AM Mortaza (Morrie) Doulaty < notifications@github.com> wrote:

yes, happy to help to start with, we'll need to setup a new public repository in Docker Hub ( http://hub.docker.com/), which is the container registry that we're going to use and since you're the yoda master, probably makes sense that you own the organisation and the repository (similar to github) - so the orgname would be kaldi-asr and the repository name would be kaldi then the account owner needs to connect those two accounts together (meaning DockerHub and GitHub) so that we can set automated builds whenever something new is pushed Similar to other projects, we can have latest dev images (both CPU and GPU versions) also have images for branches that are more stable (I can see 5.0, 5.1, 5.2, 5.3, 5.4 branches which seems like some stable versions), again both CPU and GPU versions

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/kaldi-asr/kaldi/issues/3284#issuecomment-488745832 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ABEL6UAXJOMWUJA6DCXJOFDPTMLABANCNFSM4HJ7MGAQ

.

-- Daniel Galvez http://danielgalvez.me https://github.com/galv

-- Daniel Galvez http://danielgalvez.me https://github.com/galv

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3284#issuecomment-488747638, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZFLO66A7E6CWD5QX6T7HLPTMLTRANCNFSM4HJ7MGAQ .

mdoulaty commented 5 years ago

@galv DockerHub supports building images there - we can also use your existing Travis CI pipeline to build, tag and push images to DockerHub, please have a look here: https://docs.travis-ci.com/user/docker/#building-a-docker-image-from-a-dockerfile There are no issues with absolute paths or what so ever

@danpovey sure, however you feel like it's more appropriate - and sure, will include 5.4 onward

danpovey commented 5 years ago

OK, email me with your id on dockerhub.

On Thu, May 2, 2019 at 1:22 PM Mortaza (Morrie) Doulaty < notifications@github.com> wrote:

@galv https://github.com/galv DockerHub support building images there - we can also use your existing Travis CI pipeline to build, tag and push images to DockerHub, please have a look here: https://docs.travis-ci.com/user/docker/#building-a-docker-image-from-a-dockerfile There are no issues with absolute paths or what so ever

@danpovey https://github.com/danpovey sure, however you feel like it's more appropriate - and sure, will include 5.4 onward

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3284#issuecomment-488758355, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZFLO4OZAV3IYCF47P3QZLPTMPNVANCNFSM4HJ7MGAQ .

galv commented 5 years ago

I don't think you understood my comment on the absolute paths problem. It won't affect the build but it will affect running docker containers.

  1. május 2., csütörtök dátummal Daniel Povey notifications@github.com ezt írta:

OK, email me with your id on dockerhub.

On Thu, May 2, 2019 at 1:22 PM Mortaza (Morrie) Doulaty < notifications@github.com> wrote:

@galv https://github.com/galv DockerHub support building images there

we can also use your existing Travis CI pipeline to build, tag and push images to DockerHub, please have a look here: https://docs.travis-ci.com/user/docker/#building-a- docker-image-from-a-dockerfile There are no issues with absolute paths or what so ever

@danpovey https://github.com/danpovey sure, however you feel like it's more appropriate - and sure, will include 5.4 onward

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3284#issuecomment-488758355, or mute the thread https://github.com/notifications/unsubscribe-auth/ AAZFLO4OZAV3IYCF47P3QZLPTMPNVANCNFSM4HJ7MGAQ .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3284#issuecomment-488768265, or mute the thread https://github.com/notifications/unsubscribe-auth/ABEL6UB56J3TQFXB4TCUBSDPTMS2JANCNFSM4HJ7MGAQ .

-- Daniel Galvez http://danielgalvez.me https://github.com/galv

mdoulaty commented 5 years ago

probably not fully understood what you meant then regardless, inside the container you can train without having to change any folder structure of Kaldi and abs paths are fine (can't think of why it can be an issue?)

galv commented 5 years ago

When you run kaldi inside a container, it will use an absolute path based on how the container has mounted it's filesystem. The host likely has mounted it's filfeystem differently though, so it forces you to do all your work inside the container. Not necessarily bad.

  1. május 2., csütörtök dátummal Mortaza (Morrie) Doulaty < notifications@github.com> ezt írta:

probably not fully understood what you meant then regardless, inside the container you can train without having to change any folder structure of Kaldi and abs paths are fine (can't think of why it can be an issue?)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3284#issuecomment-488783129, or mute the thread https://github.com/notifications/unsubscribe-auth/ABEL6UFGWCZAE6LCUZO4HQTPTMX7FANCNFSM4HJ7MGAQ .

-- Daniel Galvez http://danielgalvez.me https://github.com/galv

mdoulaty commented 5 years ago

probably the easiest would be: I create my proposed changes in my own forks, both in github and dockerhub, then you guys have a look and if all looks good, then we integrate in the main Kaldi repo in github and docker hub and continue there.

danpovey commented 5 years ago

Sounds good.

On Thu, May 2, 2019 at 5:11 PM Mortaza (Morrie) Doulaty < notifications@github.com> wrote:

probably the easiest would be: I create my proposed changes in my own forks, both in github and dockerhub, then you guys have a look and if all looks good, then we integrate in the main Kaldi repo in github and docker hub and continue there.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3284#issuecomment-488833887, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZFLO4M3GAJFXG4CDU3LRTPTNKI7ANCNFSM4HJ7MGAQ .

mdoulaty commented 5 years ago

so here is the first version: https://github.com/mdoulaty/kaldi/tree/master/docker

It includes both CPU and GPU based images. I also pushed both images to DockerHub: https://cloud.docker.com/repository/docker/mdoulaty/kaldi/tags

I plan to add more image variants, a minimal image and etc.

We also need to automate the building and pushing process, which can eventually be done (not entirely sure about building GPU based images in DockerHub, we may need to build them somewhere else that we have access to a GPU)

danpovey commented 5 years ago

Great! @galv do you have time to look into this? Sorry I have a lot to do today.

mdoulaty commented 5 years ago

Sure, @galv please have a look and let me know how would you like to proceed

mdoulaty commented 5 years ago

@danpovey @galv were you guys able to check the sample files?

galv commented 5 years ago

Seems okay to me, although I'm not sure that you need this line anymore: https://github.com/mdoulaty/kaldi/blob/75338cbd787943537322cae194e3d1ae11e7f103/docker/ubuntu16.04-gpu/Dockerfile#L26

My understanding was that the default python was python 2.7 on all linux distros except Arch.

mdoulaty commented 5 years ago

as far as I remember in debian:9.8 there was no python and had to explicitly softlink python2.7 will double check for both images and remove if that line is redundant. after double checking that, should I create a PR to the main repo?

danpovey commented 5 years ago

I will let @galv comment on that.

fabito commented 5 years ago

so here is the first version: https://github.com/mdoulaty/kaldi/tree/master/docker

It includes both CPU and GPU based images. I also pushed both images to DockerHub: https://cloud.docker.com/repository/docker/mdoulaty/kaldi/tags

I plan to add more image variants, a minimal image and etc.

We also need to automate the building and pushing process, which can eventually be done (not entirely sure about building GPU based images in DockerHub, we may need to build them somewhere else that we have access to a GPU)

Just tested the cpu image (for diarization). It works like a charm..

danpovey commented 5 years ago

Thanks a lot!! My preference is that @galv reviews this and lets me know whether to merge, but if that doesn't happen by, say, Friday, ping me and I'll work on a backup plan.

On Tue, May 14, 2019 at 5:18 PM Fábio Franco Uechi notifications@github.com wrote:

so here is the first version: https://github.com/mdoulaty/kaldi/tree/master/docker

It includes both CPU and GPU based images. I also pushed both images to DockerHub: https://cloud.docker.com/repository/docker/mdoulaty/kaldi/tags

I plan to add more image variants, a minimal image and etc.

We also need to automate the building and pushing process, which can eventually be done (not entirely sure about building GPU based images in DockerHub, we may need to build them somewhere else that we have access to a GPU)

Just tested the cpu image (for diarization). It works like charm..

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3284?email_source=notifications&email_token=AAZFLO6NPW55DOJB4JZFQ53PVMUANA5CNFSM4HJ7MGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVM2DTQ#issuecomment-492413390, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZFLO6F3L7BGSPALXJLTXLPVMUANANCNFSM4HJ7MGAQ .

mdoulaty commented 5 years ago

@fabito thanks for testing! those are temporary locations and hopefully they will be moved to the official Kaldi repo here on GitHub as well as Docker Hub very soon

galv commented 5 years ago

Make a PR for it. I will look at it but I won't take the time to test it in any capacity. I'm most interested in how we can do CI with these FYI.

  1. május 14., kedd dátummal Mortaza (Morrie) Doulaty < notifications@github.com> ezt írta:

@fabito https://github.com/fabito thanks for testing! those are temporary locations and hopefully they will be moved to the official Kaldi repo here on GitHub as well as Docker Hub very soon

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3284?email_source=notifications&email_token=ABEL6UBQJR3NGH3DOAWZKRTPVMVGDA5CNFSM4HJ7MGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVM24BA#issuecomment-492416516, or mute the thread https://github.com/notifications/unsubscribe-auth/ABEL6UHPJUSJTD7DIBH5JQLPVMVGDANCNFSM4HJ7MGAQ .

-- Daniel Galvez http://danielgalvez.me https://github.com/galv

mdoulaty commented 5 years ago

https://github.com/kaldi-asr/kaldi/pull/3322

fabito commented 5 years ago

@mdoulaty , what are your thoughts about the "minimal" image ? The idea is to remove the all build dependencies and copying over only the compiled binaries and utility scripts ?

galv commented 5 years ago

I'll go ahead and express my own thoughts on that. It's a moving target, and it is better handled by a build system. In particular, cmake's cpack packaging system is a good bet.

On Tue, May 14, 2019 at 5:09 PM Fábio Franco Uechi notifications@github.com wrote:

@mdoulaty https://github.com/mdoulaty , what are your thoughts about the "minimal" image ? The idea is to remove the all build dependencies and copying over only the compiled binaries and utility scripts ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3284?email_source=notifications&email_token=ABEL6UFAI374QF4HRSCDP6TPVNICRA5CNFSM4HJ7MGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVNELHA#issuecomment-492455324, or mute the thread https://github.com/notifications/unsubscribe-auth/ABEL6UAUIEUGYWLMQ2GCGJ3PVNICRANCNFSM4HJ7MGAQ .

-- Daniel Galvez http://danielgalvez.me https://github.com/galv

mdoulaty commented 5 years ago

@mdoulaty , what are your thoughts about the "minimal" image ? The idea is to remove the all build dependencies and copying over only the compiled binaries and utility scripts ?

yes, something along those lines, have two envs in the Docker file, one for building Kaldi and one with just the compiled artifacts Still a bit unsure if it's a good idea to include the scripts or just have the core binaries there

danpovey commented 5 years ago

Kaldi wouldn't be much use without the scripts.

On Wed, May 15, 2019 at 4:30 AM Mortaza (Morrie) Doulaty < notifications@github.com> wrote:

@mdoulaty https://github.com/mdoulaty , what are your thoughts about the "minimal" image ? The idea is to remove the all build dependencies and copying over only the compiled binaries and utility scripts ?

yes, something along those lines, have two envs in the Docker file, one for building Kaldi and one with just the compiled artifacts Still a bit unsure if it's a good idea to include the scripts or just have the core binaries there

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3284?email_source=notifications&email_token=AAZFLO4X7ANNBX2CSDKPJELPVPC25A5CNFSM4HJ7MGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVN5ZAA#issuecomment-492559488, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZFLO6GYW7ZI6HGUSJKPKDPVPC25ANCNFSM4HJ7MGAQ .

jtrmal commented 5 years ago

I agree with Dan -- Kaldi itself is not a product itself -- it's building blocks for ASR research and scripts are part of it. WIthout the scripts, it's not of much use. There is a certain gap between the needs of the industry(product oriented people) and our conception as being ASR toolbox. @kkm000 or perhaps @dgalvez can comment on how much work it is to bridge the gap. y.

On Wed, May 15, 2019 at 8:32 PM Daniel Povey notifications@github.com wrote:

Kaldi wouldn't be much use without the scripts.

On Wed, May 15, 2019 at 4:30 AM Mortaza (Morrie) Doulaty < notifications@github.com> wrote:

@mdoulaty https://github.com/mdoulaty , what are your thoughts about the "minimal" image ? The idea is to remove the all build dependencies and copying over only the compiled binaries and utility scripts ?

yes, something along those lines, have two envs in the Docker file, one for building Kaldi and one with just the compiled artifacts Still a bit unsure if it's a good idea to include the scripts or just have the core binaries there

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/kaldi-asr/kaldi/issues/3284?email_source=notifications&email_token=AAZFLO4X7ANNBX2CSDKPJELPVPC25A5CNFSM4HJ7MGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVN5ZAA#issuecomment-492559488 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAZFLO6GYW7ZI6HGUSJKPKDPVPC25ANCNFSM4HJ7MGAQ

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3284?email_source=notifications&email_token=ACUKYX62YFRRZFV4KGAF4LDPVRQNLA5CNFSM4HJ7MGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVPWZ5Q#issuecomment-492793078, or mute the thread https://github.com/notifications/unsubscribe-auth/ACUKYX3SXODNWCK2KKMKRITPVRQNLANCNFSM4HJ7MGAQ .

jtrmal commented 5 years ago

Sorry -- one more thought -- the things I have mentioned is one of the reasons we don't really care about virtualization and packing -- there is not a strong benefit for the researchers to go that way (or they have their own infrastructure already set up and taken care of by the support team in their company). y.

On Thu, May 16, 2019 at 2:51 PM Jan Trmal jtrmal@gmail.com wrote:

I agree with Dan -- Kaldi itself is not a product itself -- it's building blocks for ASR research and scripts are part of it. WIthout the scripts, it's not of much use. There is a certain gap between the needs of the industry(product oriented people) and our conception as being ASR toolbox. @kkm000 or perhaps @dgalvez can comment on how much work it is to bridge the gap. y.

On Wed, May 15, 2019 at 8:32 PM Daniel Povey notifications@github.com wrote:

Kaldi wouldn't be much use without the scripts.

On Wed, May 15, 2019 at 4:30 AM Mortaza (Morrie) Doulaty < notifications@github.com> wrote:

@mdoulaty https://github.com/mdoulaty , what are your thoughts about the "minimal" image ? The idea is to remove the all build dependencies and copying over only the compiled binaries and utility scripts ?

yes, something along those lines, have two envs in the Docker file, one for building Kaldi and one with just the compiled artifacts Still a bit unsure if it's a good idea to include the scripts or just have the core binaries there

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/kaldi-asr/kaldi/issues/3284?email_source=notifications&email_token=AAZFLO4X7ANNBX2CSDKPJELPVPC25A5CNFSM4HJ7MGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVN5ZAA#issuecomment-492559488 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAZFLO6GYW7ZI6HGUSJKPKDPVPC25ANCNFSM4HJ7MGAQ

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3284?email_source=notifications&email_token=ACUKYX62YFRRZFV4KGAF4LDPVRQNLA5CNFSM4HJ7MGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVPWZ5Q#issuecomment-492793078, or mute the thread https://github.com/notifications/unsubscribe-auth/ACUKYX3SXODNWCK2KKMKRITPVRQNLANCNFSM4HJ7MGAQ .

mdoulaty commented 5 years ago

okay then the minimal images will include all the scripts I think reproducibility issues in the ML community are more in center of attention than in speech community and containers are a good starting point. So I don't think it's just about productionising - research will also benefit

sayint-ai commented 5 years ago

Does anyone actually use Kaldi dockers for training? Just curious.

hwiorn commented 5 years ago

@sayint-ai Actually I am using kaldi docker container in my company. But my composition is complicated.

I have included executable kaldi compiled binaries(CPU, GPU), some audio utilities (eg ffmpeg, sox), SGE configuration and etc. I have not installed any kaldi scripts here. This container image can be pulled from the internal docker registry and executed by mounting the volume path with kaldi-egs, recipe scripts and data.

I build this container on k8s and use it for model training.

mdoulaty commented 5 years ago

@galv I enabled automatic builds in Docker Hub (for CPU only image), apparently there is a 4-hour timout limit and with the VM that they provide, the image can't be built in 4 hours (a sample failed build can be found here: https://cloud.docker.com/repository/registry-1.docker.io/mdoulaty/kaldi/builds/650bc55f-9f18-4aeb-b98f-1ced857246bd)

Then I tried integrating automatic builds in Travis, updated travis yaml and enabled Docker builds there (see https://github.com/mdoulaty/kaldi/blob/master/.travis.yml for ref on how to enable Docker builds) - this wasn't successful either, since Travis has a max limit of 50 mins (https://docs.travis-ci.com/user/customizing-the-build/#build-timeouts) Anyway neither of those was offering GPU support and we any way had to use some other VMs that had GPUs. Now I guess we'll have to build CPU images there as well. So not a big deal.

I'll prepare some scripts to create a VM with GPU (will use some cloud provider agnostic tech, such as Terraform) to create a VM, pull Kaldi, build the images and push them to DockerHub. Then we can put this inside another Docker image and build this small image in Travis (which triggers the main build and returns well before the 50min timeout). Any thoughts?

danpovey commented 5 years ago

Great, thanks a lot! It may be possible to avoid the timeout by reducing optimization levels and using shared libraries. I don't have time to get into this in too big a way right now but I appreciate your work. Maybe @galv can give some advice.

On Sat, May 18, 2019 at 1:17 PM Mortaza (Morrie) Doulaty < notifications@github.com> wrote:

@galv https://github.com/galv I enabled automatic builds in Docker Hub (for CPU only image), apparently there is a 4-hour timout limit and with the VM that they provide, the image can't be built in 4 hours (a sample failed build can be found here: https://cloud.docker.com/repository/registry-1.docker.io/mdoulaty/kaldi/builds/650bc55f-9f18-4aeb-b98f-1ced857246bd )

Then I tried integrating automatic builds in Travis, updated travis yaml and enabled Docker builds there (see https://github.com/mdoulaty/kaldi/blob/master/.travis.yml for ref on how to enable Docker builds) - this wasn't successful either, since Travis has a max limit of 50 mins ( https://docs.travis-ci.com/user/customizing-the-build/#build-timeouts) Anyway neither of those was offering GPU support and we any way had to use some other VMs that had GPUs. Now I guess we'll have to build CPU images there as well. So not a big deal.

I'll prepare some scripts to create a VM with GPU (will use some cloud provider agnostic tech, such as Terraform) to create a VM, pull Kaldi, build the images and push them to DockerHub. Then we can put this inside another Docker image and build this small image in Travis (which triggers the main build and returns well before the 50min timeout). Any thoughts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3284?email_source=notifications&email_token=AAZFLOYALYZUMNXFZENKEV3PWA2Y3A5CNFSM4HJ7MGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVWSOAQ#issuecomment-493692674, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZFLOZNQSV5A2B5NVQYLFLPWA2Y3ANCNFSM4HJ7MGAQ .

mdoulaty commented 5 years ago

the simple work around (and what they officially suggest) is to break down into smaller images for example have one base image with some dependencies and build more layers on top - that way we can control build time of each layer (and of course what you suggest about shared libs can help in some of those layers, but not all) certainly doable, but would make it less understandable with several layers as I said, we any way have to use some external vms for building the GPU images, so probably running on our own infra we will have more freedom to keep the images simple and more understandable

mdoulaty commented 5 years ago

@galv here is a working version of the automated builds: https://github.com/mdoulaty/kaldi-image-builder it uses Terraform (which is a provider agnostic tool) to provision a VM in the cloud, then build and push the images from there. Currently it's scheduled to run and push nightly builds. Please have a look around and let me know if you have any questions. I'll then expand this to build GPU images as well.

mdoulaty commented 5 years ago

@galv did you have a chance to check this?

mdoulaty commented 5 years ago

updated GPU image scripts Also added GPU images to the daily builds, meaning the GPU images will be built and pushed every day as well

danpovey commented 5 years ago

Thanks! Let me know if there is anything you need from me, e.g. merging something.

mdoulaty commented 5 years ago

Sure, just sent a PR with some minor changes This initial part can be considered done. Two images are provided in the main repo (CPU-based and GPU-based images). Also have a side repo which contains the automatic build & push scripts for the daily builds - probably better to keep that as a separate repo (but can move to kaldi-asr org if that makes sense). That repo includes some code for provisioning VMs (in any public or private cloud provider - as long as it's supported by Terraform, but the examples are with Microsoft Azure). I'm running those builds daily on my account and pushing the latest versions of both CPU and GPU images to Docker hub.

danpovey commented 5 years ago

OK, great. Let's revisit the topic of moving that to kaldi-asr in the future, I am pretty busy right now.

On Thu, Jun 6, 2019 at 11:39 AM Mortaza (Morrie) Doulaty < notifications@github.com> wrote:

Sure, just sent a PR with some minor changes This initial part can be considered done. Two images are provided in the main repo (CPU-based and GPU-based images). Also have a side repo https://github.com/mdoulaty/kaldi-image-builder which contains the automatic build & push scripts for the daily builds - probably better to keep that as a separate repo (but can move to kaldi-asr org if that makes sense). That repo includes some code for provisioning VMs (in any public or private cloud provider - as long as it's supported by Terraform, but the examples are with Microsoft Azure). I'm running those builds daily on my account and pushing the latest versions of both CPU and GPU images to Docker hub.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3284?email_source=notifications&email_token=AAZFLO4QVUA3UF3VYN5D55LPZEVSZA5CNFSM4HJ7MGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXDIIWA#issuecomment-499549272, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZFLO6UZOZBJBK2AVHUVRDPZEVSZANCNFSM4HJ7MGAQ .

luitjens commented 5 years ago

Semi-related NVIDIA maintains a docker Kaldi image with a once a month release cycle. We try to keep the source relatively recent with TOT.

https://ngc.nvidia.com/catalog/containers/nvidia:kaldi

Note this container is tested against NVIDIA hardware to validate that things are functionally correct.

lucgeo commented 5 years ago

@hwiorn : Hi, I wish to perform Kaldi training from multiple docker containers being on different physical machines. I have experience with SGE and Kaldi in the past, but I have troubles making the containers visible for SGE. Could you provide please some hints about how you configured SGE inside containers? My physical machines are in the same LAN. Thanks!

danpovey commented 5 years ago

This is really a gridengine question- you should ask on the gridengine-users list. It's really a networking issue more than anything else, as you need the docker images to be individually addressable on the local network.

mdoulaty commented 5 years ago

An alternative approach could be having the SGE running outside the containers and change queue.pl to call the command as a Docker command. For example when you run queue.pl log.txt somescript.sh p1 p2 p3, it writes the wrapper script as: docker run kaldiasr/kaldi:TAG -v '/shared-fs:/shared-fs .... somescript.sh p1 p2 p3 &> log.txt' this should give your more flexibility

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jtrmal commented 2 years ago

resolved, we now rely on github actions to get the images for docker built