kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.23k stars 5.32k forks source link

Docker images no longer built #4228

Closed kir4h closed 2 years ago

kir4h commented 4 years ago

While checking https://hub.docker.com/r/kaldiasr/kaldi/tags, I noticed Docker images haven't been updated in the latest 3 months.

I was trying to understand what is broken and ended up in https://github.com/kaldi-asr/kaldi/issues/3284 where the Docker discussion started and there is a link to https://github.com/mdoulaty/kaldi-image-builder that I guess has been used for pushing those images.

From that conversation, it seems @mdoulaty is running this on his account, but I'm not sure if this is automated or just a manual trigger from his side. So I guess maybe @mdoulaty can shed some light on what is the current status (as maybe the reason is that there was an automatic procedure that is now stopped, or there was a manual procedure that is no longer done to trigger the builds)

Thank you!

mdoulaty commented 4 years ago

Yes, I am aware of the broken builds - just didn't have time to fix it. Now that there is some demand on those autoamted builds, I'll prioritise the fix. Will update this thread once the fix is deployed and we back filled previous months.

On a side note, it's an automated process using Terraform - here is the repo in case you wanted to see how the automation works etc https://github.com/mdoulaty/kaldi-image-builder

kir4h commented 4 years ago

Thanks @mdoulaty!

The missing piece to me is how the builder is launched (I was aware of the terraform part, just don't know who is triggering its execution and on which environment)

mdoulaty commented 4 years ago

The build scripts are triggered by an scheduled Azure Function

kkm000 commented 4 years ago

@kir4h, @mdoulaty, since you guys automate the process, could you please continue work on reducing the image size? Here's an easy 1G catch for ya).

$ du -hc $(find /opt/intel -name '*.a') | tail -1
936M    total
$ du -hs ~/work/kaldi/.git
139M    /home/kkm/work/kaldi/.git

If you want to go a bit bolder and shave another 350MB, you can remove MKL kernels for pre-AVX CPUs (Pentium 4 and early Core/Core 2) CPUs, and Knight Landing MIC accelerators (hardly anyone has them, they were sparklingly unsuccessful, and very likely won't virtualize in Docker anyway). It's also safe to remove threading layer libraries and the iLP64 interface, since we link against the sequential layer only (libmkl_sequential.so, does not have _thread in file name) and LP64:

$ du -hc $(find /opt/intel -type f -regex '.*\(_mc.?\|_mic\|_thread\|_ilp64\)\.so') | tail -1
352M    total

This and the *.a removes more than half of MKL disk space use.


Also, it often happens that latest commits break stuff. I'd release a week-old code, because it would be more stable. You may need some Git-fu for this; to get the SHA of the latest 1-week-old commit, use git rev-list, e. g.

$ git rev-list --first-parent -n1 --until=1week master
8eb569178d6926b979726df6b1a8b99c0784a52d
$ git -c advice.detachedHead=false checkout $(git rev-list --first-parent -n1 --until=1week master)
HEAD is now at 8eb569178 [scripts] Extend run.pl with more flexibility to repeat failed tasks (#4219)

You cannot do that from a shallow clone, of course. Setting the option -c advice.detachedHead=false only suppresses a whole page of useless chatter, not essential.


Debian 9 is quite old. You can easily go with FROM debian:buster-slim (the "slim" versions do not have manpages and other nonessential stuff, no difference otherwise). Also, I would not really use the minor tag on it. Buster feels pretty dated already: I'm using it as my work distro, and installing a few things from the backport feed already. Unlike Ubuntu, Debian is extremely conservative, so pulling the latest certainly won't cause problems.


Another thing to consider is building with '-O2' and '-mavx'. I do not think anyone could have any luck finding a non-AVX-capable CPU these days. All FST stuff works significantly faster with it.

export CXXFLAGS='-mavx -O2 -fuse-ld=gold -fdiagnostics-color=always'

does the trick for Kaldi, but in tools/, it's required to pass it to make explicitly (it's our flop, AFAICR):

make CXXFLAGS="$CXXFLAGS" OPENFST_CONFIGURE="--disable-dependency-tracking"

ld-gold shaves a bit of build time v. the default classic ld, as does --disable-dependency-tracking for OpenFST.

mdoulaty commented 4 years ago

Thanks for the suggestions @kkm000 - initially I was thinking of haveing a minimal image as well, but then dropped the idea. I'll integrate these in the main images. However, I'll first try to make sure automated builds are running (since I left MS earlier this year, I no longer have free Azure credits to run these - exploring options at FB at the moment, also talking to my MS colleagues to see if we can use their credits for this)

kkm000 commented 4 years ago

@mdoulaty, an OT question: is FB also in the cloud business now? FaceBook?

I did my stuff in GCP. They give you 120 free build minutes a day forever, but only on a small, 1-CPU machine. Everything except Kaldi builds in 7-15 minutes: MKL, CUDA images, slurm, srilm, other stuff. But Kaldi is another beast. I have to order a 32-CPU machine, but it runs OOM with -j>25, otherwise takes about 15 min. One build of Kaldi cost between $.90 and $1. Storage at this volume is insignificant. 5 free Git repo×users, also forever free, $1/mo per repo×user over that. You can't make them public, but there's Github for that. Network ingress is always free for anything whatsoever. Network egress is $0.12/GB, which bites if you push to docker hub. Private registries available, inexpensive. They also give you $300 to spend however you want in 12 months when you activate billing.

I keep a few TB of personal backups there in cold storage. About $4/mo or so. Recovery would cost (egress traffic), but, finger crossed, I won't need it...

mdoulaty commented 4 years ago

No, they are not. I meant I thought about hooking this into some internal build systems.

I initially tried this with GitHub Actions and got OOM and time-out issues, not to mention GPUs are not avaialbe in GitHub Actions (at least back then). Because of those issues, I setup everything in Azure VMs.

Yes, Kaldi is not cheap to build - the non-GPU images were built on Azure Standard_DS5_v2 VMs that have 16 CPUs and 56 GB of RAM and they cost $0.4959/hour (if I'm not mistaken each build took around 30+ mins). And GPU images were built on Standard_NC6 which has Teslas and they cost $0.90/hour. I didn't include other costs (bandwidth etc. here), but as you mentioned each build should cost around a dollar or so.

So far the images are downloaded over 150,000 times (https://hub.docker.com/v2/repositories/kaldiasr/kaldi/) so I think keeping the automated builds is a good idea.

kkm000 commented 4 years ago

Are you planning to build every commit or e. g. every week or 2 weeks? The second case is a pocket change. Also, you do not really need a GPU for the build, only for tests. GPUs are expensive, yup. GCB does not offer them yet in the GCB environment. But we leave this exercise to the curious reader,” at least for starters.

I cannot really read Docker hub's json. Do they have a chart for unique downloads per month, or something equally sensible? One robot that went bananas can do 150K downloads in a week...

kkm000 commented 4 years ago

@mdoulaty, just in case, look at my approach that I explained to someone today who asked the right question (Scroll to the quotation "if we would install the shared MKL libraries through a package manager"). The links are to scripts you can use right from the start. The only thing there are a lot of them in many places, as I am building a different thing. But splitting MKL, CUDA and Kaldi is a good idea, since pulling debs, and installing them takes too much time. Here, I can do it for free (except image storage, but that should be cheap).

The stuff may be too complicated... or maybe not at all. The build_kaldi.sh may look scary, but there is a lot, and I mean a lot of hacks, esp. interrogating make for targets, taking only a subset of the binaries etc.

kir4h commented 4 years ago

@kir4h, @mdoulaty, since you guys automate the process, could you please continue work on reducing the image size? Here's an easy 1G catch for ya).

$ du -hc $(find /opt/intel -name '*.a') | tail -1
936M    total
$ du -hs ~/work/kaldi/.git
139M    /home/kkm/work/kaldi/.git

If you want to go a bit bolder and shave another 350MB, you can remove MKL kernels for pre-AVX CPUs (Pentium 4 and early Core/Core 2) CPUs, and Knight Landing MIC accelerators (hardly anyone has them, they were sparklingly unsuccessful, and very likely won't virtualize in Docker anyway). It's also safe to remove threading layer libraries and the iLP64 interface, since we link against the sequential layer only (libmkl_sequential.so, does not have _thread in file name) and LP64:

$ du -hc $(find /opt/intel -type f -regex '.*\(_mc.?\|_mic\|_thread\|_ilp64\)\.so') | tail -1
352M    total

This and the *.a removes more than half of MKL disk space use.

These are straightforward, if ok I will create the PR straight away

Also, it often happens that latest commits break stuff. I'd release a week-old code, because it would be more stable. You may need some Git-fu for this; to get the SHA of the latest 1-week-old commit, use git rev-list, e. g.

$ git rev-list --first-parent -n1 --until=1week master
8eb569178d6926b979726df6b1a8b99c0784a52d
$ git -c advice.detachedHead=false checkout $(git rev-list --first-parent -n1 --until=1week master)
HEAD is now at 8eb569178 [scripts] Extend run.pl with more flexibility to repeat failed tasks (#4219)

You cannot do that from a shallow clone, of course. Setting the option -c advice.detachedHead=false only suppresses a whole page of useless chatter, not essential.

Indeed, the fact of not having versioning/releases brings this kind of fun to the table. I'd say we can focus on this later, once we have a leaner image.

Debian 9 is quite old. You can easily go with FROM debian:buster-slim (the "slim" versions do not have manpages and other nonessential stuff, no difference otherwise). Also, I would not really use the minor tag on it. Buster feels pretty dated already: I'm using it as my work distro, and installing a few things from the backport feed already. Unlike Ubuntu, Debian is extremely conservative, so pulling the latest certainly won't cause problems.

Agreed. My only concern is how to ensure nothing is broken (once the build passes)

Another thing to consider is building with '-O2' and '-mavx'. I do not think anyone could have any luck finding a non-AVX-capable CPU these days. All FST stuff works significantly faster with it.

export CXXFLAGS='-mavx -O2 -fuse-ld=gold -fdiagnostics-color=always'

does the trick for Kaldi, but in tools/, it's required to pass it to make explicitly (it's our flop, AFAICR):

make CXXFLAGS="$CXXFLAGS" OPENFST_CONFIGURE="--disable-dependency-tracking"

ld-gold shaves a bit of build time v. the default classic ld, as does --disable-dependency-tracking for OpenFST.

Not really familiar with this, I can bilndly follow your recommendations =)

kkm000 commented 4 years ago

@mdoulaty, @kir4h, so this is done? Tag me if you want to reopen.

mdoulaty commented 4 years ago

@kkm000 no, it's not done I'm still trying to figure out where/how to resume building the images...

kir4h commented 4 years ago

@kkm000 let's keep it open until the automated building of images is back.

@mdoulaty Since this might take a while, could we manually build/push current version? So that we can start taking advantage of the reduced image size (asking because you have the required push permissions)

mdoulaty commented 4 years ago

sure, just pushed the latest CPU image based on the updated Docker file to dockerhub

kir4h commented 4 years ago

sure, just pushed the latest CPU image based on the updated Docker file to dockerhub

Thanks!

kkm000 commented 4 years ago

@mdoulaty, @kir4h, thanks for the feedback, I'll keep this open then.

jtrmal commented 2 years ago

resolved, the docker images were updated and now will be updated on weekly schedule automatically