jitsi / jigasi

Jigasi: a server-side application acting as a gateway to Jitsi Meet conferences. Currently allows regular SIP clients to join meetings and provides transcription capabilities.
Apache License 2.0
525 stars 295 forks source link

Add UTF-8 Support for SEND JSON POST Requests #503

Closed VewMet closed 12 months ago

VewMet commented 12 months ago

Description

This PR adds UTF-8 encoding support for SEND JSON POST requests in the transcription module of Jigasi.

org.jitsi.jigasi.transcription.SEND_JSON_REMOTE_URLS=https://ts.meet.jit.si/transcriptions

This ensures proper handling of non-ASCII characters, especially for languages like Hindi, Tamil, Japanese, etc.

Changes:

Explicitly set the Content-Type header to application/json; charset=UTF-8 to indicate that the JSON data is UTF-8 encoded. Modified the byte conversion of the JSON string to use UTF-8 encoding.

Change-1:

conn.setRequestProperty("Content-Type", "application/json");

To:

conn.setRequestProperty("Content-Type", "application/json; charset=UTF-8");

Change-2:

os.write(json.toString().getBytes());

To:

os.write(json.toString().getBytes("UTF-8"));

Motivation:

While the transcriptions worked well in English, issues arose when changing the language to Hindi or others. The received text contained numerous question marks, indicating an encoding issue. By ensuring the data is sent using UTF-8 encoding, this PR aims to resolve such issues and ensure the correct interpretation of non-ASCII characters.

Testing:

Tested the transcription feature with multiple languages, including Hindi, Tamil, and Japanese. Verified that the JSON POST requests in jigasi sip-communicator.properties are being sent with the correct UTF-8 encoding.

org.jitsi.jigasi.transcription.SEND_JSON_REMOTE_URLS=https://ts.meet.jit.si/transcriptions

Impact:

This change ensures that Jigasi can handle transcription for a wide variety of languages without any encoding-related issues, enhancing its versatility and robustness.

Additional Notes (if any):

Mention any related issues, potential side effects, or further improvements that can be made.