kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.24k stars 5.32k forks source link

'nnet3-latgen-faster' takes longer time in the first run compared to later runs #4630

Closed emirdemirel closed 3 years ago

emirdemirel commented 3 years ago

Hi,

I developed a Kaldi speech-to-text module for my research deployed via a Docker image, and I realised that every time I create a new Docker container, the code 'nnet3-latgen-faster' takes a longer time when it is first run, but its runtime gets lower in the later runs.

Question: Does 'nnet3-latgen-faster' create some cached files in its initial runs which can later be used for later runs? Can this be the reason why the first run takes a longer time?

Thanks in advance

danpovey commented 3 years ago

No cached files, but possibly either a docker-related issue or an issue related to files cached in memory by the OS.

On Wed, Sep 22, 2021 at 10:59 PM Emir Demirel @.***> wrote:

Hi,

I developed a Kaldi speech-to-text module for my research deployed via a Docker image, and I realised that every time I create a new Docker container, the code 'nnet3-latgen-faster' takes a longer time when it is first run, but its runtime gets lower in the later runs.

Question: Does 'nnet3-latgen-faster' create some cached files in its initial runs which can later be used for later runs? Can this be the reason why the first run takes a longer time?

Thanks in advance

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/4630, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLOYCH6WF67IWN2TXTBLUDHVO5ANCNFSM5ERQ7QRA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

kkm000 commented 3 years ago

It's your normal page cache at work. When you run a container, an overlay filesystem is mounted. ld.so needs to load a lot of .so libraries from disk, including MKL and other huge stuff when starting a process. Some sections (code, rodata etc) of SO files are mapped into memory as R/O pages backed by the file (so that they simply dropped if kernel needs to swap them out). Even when a SO is no longer referenced, the kernel still keeps the mapped pages opportunistically. When a container is stopped, the overlay filesystem is unmounted and the cached mappings purged, as they are associated to the filesystem inode, and the backing filesystem is now gone.

You can mount /var/lib/docker to a dedicated SSD NVME device to improve its disk performance.

I am closing this issue for now. If you believe that your issue has not been addressed, please feel free to ping me, and I'll reopen it. @-mention me for a faster response!