naeruru / mimiuchi

a free, customizable, osc capable speech-to-text interface for relaying text to different types of applications
https://mimiuchi.com
GNU General Public License v3.0
41 stars 4 forks source link

revise word replace #34

Closed fuwako closed 6 months ago

fuwako commented 7 months ago

Description

This is a collection of changes to the Word Replace system.

Case Sensitivity

Word Replace will become dependent on capitalization.

For example, Apple and apple will be treated as two different entries.

This is semantically important.

For example, the user may only want to transform the capitalization of a word.

This enables users to specifically target common nouns and proper nouns.

However, case-sensitivity is more explicit, so it may be more difficult to use. When using speech-to-text, the user will have to declare their replacement keys according to the output of the Web Speech API.

Sanitization

When text is being processed for replacement, regex metacharacters in replacement keys will be automatically escaped to prevent the accidental injection of regular expressions.

The user's entries will not be modified and will remain as the user has declared them.

This fixes a crash that required a refresh of the running instance.

Additionally, users can now replace words/phrases that contain special characters (e.g., words with many asterisks).

Replacement Prioritization

Longer replacement keys will have higher priority than shorter replacement keys.

For example, we may have 2 entries. Hello worldapple Hellobanana

Hello world is the longer key; therefore, it has higher priority. The transcript Hello world will become apple instead of banana world.

(Other)

This has only been tested for the English language. I haven't fully determined whether it's functional and stable with other languages.

What is the purpose of this pull request?

naeruru commented 7 months ago

I like the pr, but I think that having an option for case insensitivity is still important. Due to the nature of speech-to-text providers being all inherently different, capitalization of words can vary based on sentence structure and proper nouns and a user who is not aware of this could have a more confusing time. Capitalization is also different between browsers. For example Edge will capitalize and punctuate sentences. this means someone might always have to include two different entries for words they didn't care for the case to be, because there is a non zero chance that the word can be at the beginning of the sentence.

fuwako commented 7 months ago

I've implemented "Match whole word only" and "Match case" as options, instead. If you don't want them, I would be okay with stripping down the code and removing them. This pull request unexpectedly became complicated.

Before

After

naeruru commented 7 months ago

awesome, give me some time to look over it^^