mangstadt / ez-vcard

A vCard parser library for Java
Other
405 stars 93 forks source link

vCards broken on systems with locale with non-Western digits #113

Closed rfc2822 closed 4 years ago

rfc2822 commented 4 years ago

Hi,

Thanks for ez-vcard, which works really reliable. I have however found a problem when the default locale is set to something with non-Western (like Arabic, Farsi, …) digits. In this case, some properties which are internally stored as numbers are printed with localized numbers instead of ASCII digits. For instance, on a system with an Iran (Persian; native digits) locale, a birthday on 5 Sep 1998 is generated as ۱۹۹۸۰۹۰۵.

See here for the original discovery and more information.

I have written some tests:

import ezvcard.Ezvcard
import ezvcard.VCard
import ezvcard.VCardVersion
import ezvcard.property.Geo
import org.junit.Assert.assertEquals
import org.junit.Before
import org.junit.Test
import java.util.*

class TestLocaleNonWesternDigits {

    companion object {
        val locale = Locale("fa", "ir", "u-un-arabext")
    }

    @Before
    fun verifyLocale() {
        assertEquals("Persian (Iran) locale not available", "fa-IR", locale.toLanguageTag())
        Locale.setDefault(locale)
    }

    @Test
    fun testLocale_StringFormat() {
        assertEquals("۲۰۲۰", String.format("%d", 2020))    // does not fail if the Locale with Persian digits is available
    }

    @Test
    fun testLocale_StringFormat_Root() {
        assertEquals("2020", String.format(Locale.ROOT, "%d", 2020))
    }

    @Test
    fun testLocale_ezVCard() {
        val vCard = VCard(VCardVersion.V4_0)
        vCard.geo = Geo(1.0, 2.0)
        assertEquals("BEGIN:VCARD\r\n" +
                "VERSION:4.0\r\n" +
                "PRODID:ez-vcard 0.11.1\r\n" +
                "GEO:geo:1.0,2.0\r\n" +             // fails: is "GEO:geo:۱.۰,۲.۰\r\n" instead
                "END:VCARD\r\n", Ezvcard.write(vCard).go())
    }

}

Sorry, I have too late noticed that they're in Kotlin instead of Java, but it's just a few lines. I hope they demonstrate the problem nevertheless. For testing, it's important that the system / JVM supports native digits and the selected locale. For testing, you can use other non-Western locales, like ar_SA or th_TH. I was not able to select such a locale on my Linux system with OpenJDK 1.8.0_242-release, but it works with Android's JVM (tested with Android 10 and 11 emulator).

I think the numbers are converted to strings in a locale-sensitive manner like String.printf("%d", number) or SimpleDateFormatter. My suggestion is to use String.printf(Locale.US, "%d", number) (or Locale.ROOT?) wherever vCard content is generated because this is what's intended: the numbers shall be printed in Western digits.

Maybe this is applicable to parsing, too. Haven't tested that yet (parse properties with correct Western digits with ez-vcard on systems with non-Western digit locale).

(ez-vcard 0.11.1)

mangstadt commented 4 years ago

Thanks for the report and the unit test!

Fixed in dce98f310bf7dd73dc73102c8c19b0a144077c09 and 4e9da4fc1fb37f27ba71332cea320728befac6aa (there were some unit test failures during CI).

Locale.ROOT seems to work fine. It is probably a better choice than Locale.US because it's more "neutral".

I wrote unit tests that test the new code over every locale on the local JVM. The fix works for all of them on my machine and the CI builds. Also, my testing shows that ez-vcard is able to parse numbers that are in the vCard decimal format (i.e. using western-style numbers) no matter the default locale.

rfc2822 commented 4 years ago

Wow, this was fast! :) I think this would apply to VCardDateFormat, too. SimpleDateFormat also uses Locale-dependent digits.

mangstadt commented 4 years ago

You're right, it does! Thanks for thinking of that! Fixed in e6a1b2ca51695c4b6ac6584c073fa017814a8784.

rfc2822 commented 3 years ago

I just found that PartialDate's toISO8601 uses NumberFormat, which is also localizable and thus causes the same problem for partial dates.

NumberFormat can be adjusted so that it uses the ROOT locale, too.

mangstadt commented 3 years ago

Thanks for pointing that out. I found several other places that are affected by this as well.

I must have assumed that these instances would not be affected by this issue because the numbers were not using decimal or grouping separators.

Fixed in b8b0b0c8e1dff237975942022a42c6c25b9d9741.