The J.A.R.V.I.S. Speech API is designed to be simple and efficient, using the speech engines created by Google to provide functionality for parts of the API. Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer. While this requires an Internet connection, it provides a complete, modern, and fully functional speech API in Java.
This VAD algorithm suggests to calculate the energy of each frame. ...Is that the same as RMS?
This code of Sciss/SpeechRecognitionHMM seems to be using a different algorithm:
MicrophoneAnalyser: