sepinf-inc / IPED

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.
Other
924 stars 217 forks source link

Vosk transcription may slow down during large cases processing #1909

Closed wladimirleite closed 11 months ago

wladimirleite commented 11 months ago

As discussed in #1899, when using Vosk audio transcription in a very large case (with many audio files), @paulobreim noticed that CPU usage fell after some point. I was able to reproduce the issue by processing a large sample (~150K) of audio files.

Later I wrote the small standalone program below, which made it easier to reproduce the problem (using a PC running Windows and 48 logical processors). With this program and the sample audio below, the issue (CPU usage decreases and transcription slows down) is noticeable after a couple of minutes.

import java.io.File;
import java.io.InputStream;
import java.util.SplittableRandom;

import javax.sound.sampled.AudioSystem;

import org.vosk.Model;
import org.vosk.Recognizer;

public class VoskTest {
    public static void main(String[] args) throws Exception {
        Model model = new Model("vosk-model-small-en-us-0.15");
        Thread[] threads = new Thread[Runtime.getRuntime().availableProcessors()];
        for (int i = 0; i < threads.length; i++) {
            (threads[i] = new Thread() {
                public void run() {
                    try {
                        byte[] buf = new byte[1 << 20];
                        SplittableRandom rnd = new SplittableRandom();
                        Recognizer recognizer = new Recognizer(model, 16000);
                        recognizer.setWords(true);
                        for (int rep = 0; rep < 10000; rep++) {
                            InputStream ais = AudioSystem.getAudioInputStream(new File("sample-audio.wav"));
                            int nbytes = 0;
                            while ((nbytes = ais.read(buf)) >= 0) {
                                if (recognizer.acceptWaveForm(buf, nbytes)) {
                                    recognizer.getResult();
                                } else {
                                    recognizer.getPartialResult();
                                }
                            }
                            ais.close();
                            recognizer.getFinalResult();
                            recognizer.reset();
                            System.out.println(rep + ":" + Thread.currentThread().getName());
                            Thread.sleep(rnd.nextInt(10));
                        }
                        recognizer.close();
                    } catch (Exception e) {
                        e.printStackTrace();
                    }
                }
            }).start();
        }
        for (Thread t : threads) {
            t.join();
        }
        model.close();
    }
}

The sample audio I used: sample-audio.zip

wladimirleite commented 11 months ago

Things that I tried but did NOT make any difference regarding the described behavior:

After a lot of failed attempts, I finally found out that limiting the reading buffer (e.g. to 64 KB) solved the issue (currently a 1 MB buffer is used). I guess that there is some kind of internal (native) memory buffer used by Vosk, handled by a synchronized piece of code, that somehow was having trouble dealing with large inputs and many threads.