Jigasi: a server-side application acting as a gateway to Jitsi Meet conferences. Currently allows regular SIP clients to join meetings and provides transcription capabilities.
Apache License 2.0
525
stars
295
forks
source link
Add UTF-8 Support for SEND JSON POST Requests #503
This ensures proper handling of non-ASCII characters, especially for languages like Hindi, Tamil, Japanese, etc.
Changes:
Explicitly set the Content-Type header to application/json; charset=UTF-8 to indicate that the JSON data is UTF-8 encoded.
Modified the byte conversion of the JSON string to use UTF-8 encoding.
While the transcriptions worked well in English, issues arose when changing the language to Hindi or others. The received text contained numerous question marks, indicating an encoding issue. By ensuring the data is sent using UTF-8 encoding, this PR aims to resolve such issues and ensure the correct interpretation of non-ASCII characters.
Testing:
Tested the transcription feature with multiple languages, including Hindi, Tamil, and Japanese.
Verified that the JSON POST requests in jigasi sip-communicator.properties are being sent with the correct UTF-8 encoding.
This change ensures that Jigasi can handle transcription for a wide variety of languages without any encoding-related issues, enhancing its versatility and robustness.
Additional Notes (if any):
Mention any related issues, potential side effects, or further improvements that can be made.
Description
This PR adds UTF-8 encoding support for SEND JSON POST requests in the transcription module of Jigasi.
This ensures proper handling of non-ASCII characters, especially for languages like Hindi, Tamil, Japanese, etc.
Changes:
Explicitly set the Content-Type header to application/json; charset=UTF-8 to indicate that the JSON data is UTF-8 encoded. Modified the byte conversion of the JSON string to use UTF-8 encoding.
Change-1:
To:
Change-2:
To:
Motivation:
While the transcriptions worked well in English, issues arose when changing the language to Hindi or others. The received text contained numerous question marks, indicating an encoding issue. By ensuring the data is sent using UTF-8 encoding, this PR aims to resolve such issues and ensure the correct interpretation of non-ASCII characters.
Testing:
Tested the transcription feature with multiple languages, including Hindi, Tamil, and Japanese. Verified that the JSON POST requests in jigasi sip-communicator.properties are being sent with the correct UTF-8 encoding.
Impact:
This change ensures that Jigasi can handle transcription for a wide variety of languages without any encoding-related issues, enhancing its versatility and robustness.
Additional Notes (if any):
Mention any related issues, potential side effects, or further improvements that can be made.