soupslurpr / Transcribro

Private and on-device speech recognition keyboard and service for Android.
ISC License
353 stars 5 forks source link

Model chooser screen in settings #18

Open soupslurpr opened 4 months ago

soupslurpr commented 4 months ago

A screen in settings to download more models from huggingface from the app itself, pick the model that will be used, and manage/delete them. The models would be downloaded from a repo from my huggingface account and the hashes of the files would be checked against hashes included with Transcribro to ensure integrity even in the event of a huggingface server compromise.

There should be a text box at the top of the screen to test the selected model using the Voice Input Keyboard.

The most recommended models would be shown first and there shouldn't be an overwhelming amount of choice for no benefit. Test different model quants to choose enough models for a sensible variety of speed vs accuracy vs multilingualism and clearly communicate those properties in the interface. If needed, there can be a "more models" button that goes to a screen with the other models to keep the list from being too long.

Additionally, there should be an option to import a model from a file which can show up below the ones downloaded from the app but in a separate section to not mistake them from the official ones.

soupslurpr commented 4 months ago

File sizes (in bytes) will also be verified while downloading to prevent a waste of bandwidth from a compromised account or server.

soupslurpr commented 3 months ago

This will have to be delayed to avoid complications for when we switch to using Rust for running the Whisper models instead of whisper.cpp.

machiav3lli commented 1 month ago

This would also solve #27 as far as I see. What are the blockers for using a generic interface that would be hot-swapped when the rust engine is implemented?

soupslurpr commented 1 month ago

It could open up more work for the future, such as needing to make a model converter if the Rust engine uses a different format. Switching to using Rust to run the models is under high priority and I'm currently researching different Rust machine learning libraries to run Whisper with to find one that's at least close to whisper.cpp's speed.

machiav3lli commented 1 month ago

@soupslurpr thanks for the explaination. Although it would surprise me if a cpp-rust-adapter would ignore the established formats just for sake of being something else.

That said, it's the FOSS world we're talking about here and logic doesn't always prevail.

soupslurpr commented 1 month ago

Whisper.cpp itself doesn't use GGUF. It uses its own custom .bin format. No idea why they didn't switch. Also, the Rust library I choose (may be burn) may use a different format and don't know how difficult it would be to convert.