seifane / whisper-rhasspy-http

Rhasspy Whisper integration
16 stars 5 forks source link

Add comma symbol to recommended --filter-chars #4

Closed Zeevai closed 1 year ago

Zeevai commented 1 year ago

As stated in the title - I'm not sure how --filter-chars is interpreted, so I escaped the comma to be safe.

--filter-chars ".\,?'"\!\"":;<>[]{}()"

Otherwise this works very well out of the box, thank you!

seifane commented 1 year ago

Hmmm, Filter chars takes every char given in the string individually to remove them from the recovered prompt.

The only escape that is needed is shell related. I don't think you have to escape the comma in the this case! Let me know if it's an issue though and i will update the readme.

Zeevai commented 1 year ago

Hi, sorry for the delayed response: I settled on this: --filter-chars ".,-?'\!\":;<>[]{}()"

This may be off-topic a little, but I have found a way to improve Whisper's output by converting the Rhasspy sentences.ini into hint phrases, which are given to Whisper via the initial_prompt argument.

Whisper remains a really good transcription tool, but Rhasspy would definitely benefit more from an improved key phrase identifier, which Whisper is not made to be. Regardless, the inital_prompt argument does improve it in that regard.

Now, before I open a merge request, I want you to know that this extensive modification to your main.py was entirely written by BingChat. I have no moral issues running this code on my own system, but publishing it is something else entirely (even though I plan to make it abundantly clear that it is not my code). How do you feel about this?

seifane commented 1 year ago

@Zeevai You can open a PR for this and I will look at the code. I am just little concerned on licensing. Is there any words about what license is the generated code under ?

Thanks for bringing this up.

Zeevai commented 1 year ago

That's the issue - GPT4 was trained on the entirety of Stack Overflow and GitHub. This means that every piece of code generated by it will be contributed to by everyone. There is currently no legal framework for AI code which I'm aware of.As for the quality of the code, I believe it's best-practice to assume it's bad until proven otherwise, but that's a different discussion. -------- Ursprüngliche Nachricht --------Von: Seïfane Idouchach @.> Datum: 18.10.23 10:04 (GMT+01:00) An: seifane/whisper-rhasspy-http @.> Cc: Zeevai @.>, Mention @.> Betreff: Re: [seifane/whisper-rhasspy-http] Add comma symbol to recommended --filter-chars (Issue #4) @Zeevai You can open a PR for this and I will look at the code. I am just little concerned on licensing. Is there any words about what license is the generated code under ? Thanks for bringing this up.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

seifane commented 1 year ago

GPT4 was trained on the entirety of Stack Overflow and GitHub. This means that every piece of code generated by it will be contributed to by everyone

Okay, submit the PR and if someone requests a takedown I will revert the changes.

I believe it's best-practice to assume it's bad until proven otherwise

That's true for all code ;)

Zeevai commented 1 year ago

Will do when I get home. Your solution is appropriate. -------- Ursprüngliche Nachricht --------Von: Seïfane Idouchach @.> Datum: 18.10.23 10:43 (GMT+01:00) An: seifane/whisper-rhasspy-http @.> Cc: Zeevai @.>, Mention @.> Betreff: Re: [seifane/whisper-rhasspy-http] Add comma symbol to recommended --filter-chars (Issue #4)

GPT4 was trained on the entirety of Stack Overflow and GitHub. This means that every piece of code generated by it will be contributed to by everyone

Okay, submit the PR and if someone requests a takedown I will revert the changes.

I believe it's best-practice to assume it's bad until proven otherwise

That's true for all code ;)

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

seifane commented 1 year ago

Closing this. PR is #5