persephone-tools / persephone

A tool for automatic phoneme transcription
Apache License 2.0
155 stars 26 forks source link

Allowing changing the settings to computationally demanding mode for optimal transcription quality #38

Open alexis-michaud opened 6 years ago

alexis-michaud commented 6 years ago

A comment from the linguist's point of view, assuming that some interested linguists visit the persephone project on GitHub (which I think is not at all unlikely):

Currently the intro (README.md) says "I've made the settings less computationally demanding than it would be for optimal transcription quality". That could come across as "you will get suboptimal quality". But for a linguist, the aim is to see persephone's very best. Not 'state-of-the-art with a rebate' but 'full-SOTA': that's part of the magic! 🥇 The linguists' perspective (I think) is that, in order to see what the tool can do, we want to be able to choose the highest settings, even if it's 10 times as computationally intensive to gain a couple percent in accuracy.

On the other hand, the training will likely need to be fired dozens of times with different settings, and we don't want to wait 1 week (or more) for each test.

To solve this Issue, what about providing a simple scenario such as: testing & tuning with the computationally less demanding settings, then raising the settings & firing once more (being warned that this final training with high settings may take a week or more)? Would that make sense?

The suggestion amounts to: adding 1 sentence in README.md:

"I've made the settings less computationally demanding than it would be for optimal transcription quality. The settings can be raised by changing the values in \<name of settings file>"

Just an idea!

oadams commented 6 years ago

Just for the record, I completely agree. This is how it should be done and will be done.

I guess one remaining question is what the right interface is. Linguists might not want to specify the number of layers and hidden units in the neural network. Perhaps two modes will suffice, "simple" for testing installation and getting prelim results, and "full" mode. For now I can put sensible defaults there, but the best hyperparameters for the full mode are unknown and will vary according to training data quantity and quality. I guess this is where hyperparameter optimization (#26) should eventually come in. For now I can just go with sensible defaults for "simple" and "full".