specify / specify7

Specify 7
https://www.specifysoftware.org/products/specify-7/
GNU General Public License v2.0
66 stars 36 forks source link

Combined emoji are not supported in text fields #2495

Open grantfitzsimmons opened 2 years ago

grantfitzsimmons commented 2 years ago
(1366, "Incorrect string value: '\\xF0\\x9F\\x91\\xA9\\xF0\\x9F...' for column `herb_rbge`.`collectingevent`.`Text1` at row 1")

qE654V3kd7

Recreated in edge and testability.

Specify 7 Crash Report - 2022-11-25T19_04_56.813Z.txt

https://herbrbge-edge.test.specifysystems.org/specify/view/collectionobject/349648/?recordsetid=178 https://herbrbge-testability.test.specifysystems.org/specify/view/collectionobject/349648/?recordsetid=178

maxpatiiuk commented 2 years ago

:sadface: 😒

grantfitzsimmons commented 1 year ago

This is only for combined emoji like πŸ‘΄πŸ» (actually πŸ‘΄ 🏻) Regular emoji like βš™οΈ are fine!

grantfitzsimmons commented 1 year ago

MySQL's utf8 only supports basic multilingual plane, and you need to use utf8mb4 instead:

For a supplementary character, utf8 cannot store the character at all, while utf8mb4 requires four bytes to store it. Since utf8 cannot store the character at all, you do not have any supplementary characters in utf8 columns and you need not worry about converting characters or losing data when upgrading utf8 data from older versions of MySQL.

So to support these characters, your MySQL needs to be 5.5+ and you need to use utf8mb4 everywhere. Connection encoding needs to be utf8mb4, character set needs to be utf8mb4 and collaction needs to be utf8mb4. For java it's still just "utf-8", but MySQL needs a distinction.

I don't know what driver you are using but a driver agnostic way to set connection charset is to send the query:

SET NAMES 'utf8mb4' Right after making the connection.

Source

grantfitzsimmons commented 1 year ago

From @carlosmbe:

Describe the bug If an Emoji is present in an Agent's First, Last Name or Middle Initial. Specify will crash

Agent Email is fair game though

To Reproduce Steps to reproduce the behavior:

  1. Go to Data Entry -> Agents
  2. Click on any of the fields stated above
  3. Add Emoji and Save
  4. See error Specify 7 Crash Report - 2023-04-27T21_26_02.395Z (1).txt Specify 7 Crash Report - 2023-04-27T21_26_02.395Z.txt

Expected behavior Not a crash or at least a request to remove UTF-8 Characters and only have ASII characters

Screenshots

Screenshot 2023-04-27 at 4 25 09 PM Screenshot 2023-04-27 at 4 25 57 PM Screenshot 2023-04-27 at 4 26 14 PM

Desktop: OS: Mac OSx Ventura Browser: Chrome Specify 7 Version: 7.8.10-prerelease

Database Name: morpaleo

Reported By @carlosmbe

carlosmbe commented 1 year ago

Edit: Still present in 7.8.10 pre release

Below is an instance that happened in Paleo Context Remarks

Specify 7 Crash Report - 2023-05-01T21_19_03.235Z.txt Specify 7 Crash Report - 2023-05-01T21_19_03.235Z (1).txt

Screenshot 2023-05-01 at 4 21 51 PM

https://user-images.githubusercontent.com/53784701/235533632-aa8926fa-13cf-4f10-b8c9-045b9d0da879.mov

grantfitzsimmons commented 1 year ago

Happens with Good Ol Fashioned Simple Emojis too. Below is an instance that happened in Paleo Context Remarks

Emojipedia: (Source)

The Flag: United States emoji is a flag sequence combining πŸ‡Ί Regional Indicator Symbol Letter U and πŸ‡Έ Regional Indicator Symbol Letter S. These display as a single emoji on supported platforms.

It is πŸ‡Ί + πŸ‡Έ = πŸ‡ΊπŸ‡Έ

emenslin commented 4 months ago

Can recreate in edge (7.9.6)