kaixxx / noScribe

Cutting edge AI technology for automated audio transcription. A nice GUI for OpenAIs Whisper and pyannote (speaker identification)
GNU General Public License v3.0
481 stars 99 forks source link

Escape special characters using HTML entities #98

Open tobbi opened 1 month ago

tobbi commented 1 month ago

I noticed that when selecting HTML as the output format, the transcript is saved verbatim in HTML, including all special characters and umlauts. It would be better if you could escape those to HTML entities, so that ä becomes ä for example.

kaixxx commented 4 weeks ago

Have you had any problems importing the resulting HTML into another software? The output is marked as utf-8, so using the wider character set should be ok. But I agree that it would be cleaner and more in-line with the HTML standard to escape umlauts etc.

tobbi commented 4 weeks ago

Have you had any problems importing the resulting HTML into another software?

No, I have not. Just felt it would be cleaner.