jsonkenl / xlsxir

Xlsx parser for the Elixir language.
MIT License
212 stars 83 forks source link

Phonetic guides are added to column values #104

Open woylie opened 3 years ago

woylie commented 3 years ago

Excel has a feature called phonetic guides for Japanese characters. When kanji are entered in a column, Excel adds phonetic guides as katakana. In Excel, these guides can be hidden or shown.

When opening an xlsx file like that with LibreOffice, Apple Numbers or Google Sheets, you can only see the actual column content (without the guides). However, when parsing the data with xlsxir, the phonetic guides are appended to the column content. For example, instead of 國學院大学, a column will contain 國學院大学ダイガク.

While googling for a solution, I found the same issue description in other tools: SDL, ACS, MicroStrategy.

According to this post,

For storing each unique text from a cell, Excel uses something called a "shared string table" and the content of each cell is the index of the text from that table. When we implemented the filter we erroneously thought that every "shared string" item contains only the text of the cell and some formatting belonging to that text. However, after this post, we found out that the phonetic translations are also found there.