If I understand correct, WAV is what Vosk requires, so initial recording could be left as is, but after speech recognition it could be encoded with Opus or AAC. This could save quite some space for heavy users.
In WAV, 12 seconds recording is ~1 MB. In Opus (free and open format), this could be ~140 KB (at 96 kb/s, which is more than enough for mono, even if this is a music recording) or ~96 KB (at 64 kb/s, which is ok for mono voice recordings).
At first glance on code, AAC is already used on iOS.
If I understand correct, WAV is what Vosk requires, so initial recording could be left as is, but after speech recognition it could be encoded with Opus or AAC. This could save quite some space for heavy users.
In WAV, 12 seconds recording is ~1 MB. In Opus (free and open format), this could be ~140 KB (at 96 kb/s, which is more than enough for mono, even if this is a music recording) or ~96 KB (at 64 kb/s, which is ok for mono voice recordings).
At first glance on code, AAC is already used on iOS.