ical4j / ical4j

A Java library for parsing and building iCalendar data models
https://www.ical4j.org
BSD 3-Clause "New" or "Revised" License
752 stars 200 forks source link

iCalendars broken on systems with locale with non-Western digits #458

Open rfc2822 opened 3 years ago

rfc2822 commented 3 years ago

Hi,

Thanks for ical4j, which works really reliable. I have just updated to v3.0.20. I have however found a problem when the default locale is set to something with non-Western (like Arabic, Farsi, …) digits. In this case, some properties which are internally stored as numbers are printed with localized numbers instead of ASCII digits. For instance, on a system with an Iran (Persian; native digits) locale, a TZOFFSETFROM:+0100 is generated as TZOFFSETFROM:+۰۱۰۰.

See here for the original discovery and more information.

I have written some tests:

package at.bitfire.ical4android

import net.fortuna.ical4j.model.property.TzOffsetFrom
import org.junit.Assert.assertEquals
import org.junit.Before
import org.junit.ComparisonFailure
import org.junit.Test
import java.time.ZoneOffset
import java.util.*

class TestLocaleNonWesternDigits {

    companion object {
        val locale = Locale("fa", "ir", "u-un-arabext")
    }

    @Before
    fun verifyLocale() {
        assertEquals("Persian (Iran) locale not available", "fa-IR", locale.toLanguageTag())
        Locale.setDefault(locale)
    }

    @Test
    fun testLocale_StringFormat() {
        // does not fail if the Locale with Persian digits is available
        assertEquals("۲۰۲۰", String.format("%d", 2020))
    }

    @Test
    fun testLocale_StringFormat_Root() {
        assertEquals("2020", String.format(Locale.ROOT, "%d", 2020))
    }

    @Test(expected = ComparisonFailure::class)      // should not fail in future
    fun testLocale_ical4j() {
        val offset = TzOffsetFrom(ZoneOffset.ofHours(1))
        val iCal = offset.toString()
        assertEquals("TZOFFSETFROM:+0100\r\n", iCal)        // fails: is "TZOFFSETFROM:+۰۱۰۰\r\n" instead
    }

}

Sorry, I have too late noticed that they're in Kotlin instead of Java, but it's just a few lines. I hope they demonstrate the problem nevertheless. For testing, it's important that the system / JVM supports native digits and the selected locale. For testing, you can use other non-Western locales, like ar_SA or th_TH. I was not able to select such a locale on my Linux system with OpenJDK 1.8.0_242-release, but it works with Android's JVM (tested with Android 10 and 11 emulator).

I think the numbers are converted to strings in a locale-sensitive manner like String.printf("%d", number) or SimpleDateFormatter. My suggestion is to use String.printf(Locale.US, "%d", number) (or Locale.ROOT?) wherever vCard content is generated because this is what's intended: the numbers shall be printed in Western digits.

Maybe this is applicable to parsing, too. Haven't tested that yet (parse properties with correct Western digits with ical4j on systems with non-Western digit locale).

(ical4j 3.0.20)

MohammadFakhraee commented 1 year ago

Hi there,

Thanks for the great library. I was working around with ical4j library and figured out that the mentioned problem still exists. So the library converts originally-generated ASCII numbers to Persian (or Arabic) numbers if the current application locale is set to fa or ar.

I searched for the problem and figured out that the conversion happens inside toString method of ZoneOffsetAdapter class on lines 32 and 34 which calls for String.format method. This method checks for default locale internally. So here is what is happening:

  1. The application locale is set to Persian, for instance, by calling Locale.setDefault(Locale("fa")).
  2. ZoneOffSet holds originally generated time zone for Asia/Tehran which is: 03:25:44
  3. By calling toString method, it is firstly dividing and storing numbers into hours, minutes, and seconds (which is perfectly fine until this step).
  4. Then the method calls for String.format("%+03d%02d%02d", hours, minutes, seconds); where the problem arises. String.format calls for default locale and converts divided numbers to Persian digits.

Here is a debug result of the ZoneOffsetAdapter:

1. Before conversion debug1

2. After conversion debug2

I should say a lot of non-English language people are using calendaring applications which use ical4j as their parser. So it would be so great if you fix this problem. I found out ical4android's developers suggested a solution to this problem here: bitfireAT/ical4android#78. I however, for the sake of simplicity paste the solution here:

/**
 * Converts an string with arabic numbers to their decimal equivalents.
 * @param number The number to convert.
 * @return An string that has all the arabic numbers found in [number] replaced by their English
 * representations.
 */
fun arabicToDecimal(number: String): String {
    val chars = CharArray(number.length)
    for (i in number.indices) {
        var ch = number[i].code
        if (ch in 0x0660..0x669)
            ch -= 0x0660 - '0'.code
        else if (ch in 0x06f0..0x06F9)
            ch -= 0x06f0 - '0'.code
        chars[i] = ch.toChar()
    }
    return String(chars)
}

And this test:

@Test
fun testArabicToDecimal() {
    val arabic = "+۰۳۲۵۴۴"
    val decimal = MiscUtils.arabicToDecimal(arabic)
    assertEquals(decimal, "+032544")
}

shows that the function works properly.

rfc2822 commented 1 year ago

Thanks for the great debugging that shows where the Arabic numbers come into play!

String.format() accepts a locale, so I'd just use String.format(Locale.US, …) and then no conversion to decimal number is needed.

MohammadFakhraee commented 1 year ago

That would be the best solution. And thanks a lot for your help.

benfortuna commented 1 year ago

Thanks all for the additional analysis and solution. Apologies for not addressing this earlier.

I just had a quick search and found we also use String.format() in TemporalAmountAdapter (see below), so I guess we should apply the US Locale to both formatting solutions correct? If no objections I'll include in the next release (3.x/4.x).

image
rfc2822 commented 1 year ago

I just had a quick search and found we also use String.format() in TemporalAmountAdapter (see below), so I guess we should apply the US Locale to both formatting solutions correct?

I think String.format(Locale.US, …) should be used everywhere instead of String.format(…) to make sure that numbers are correctly formatted. (Maybe Locale.ROOT should be used? But I think US is better)

If no objections I'll include in the next release (3.x/4.x).

That would be nice :)

MohammadFakhraee commented 1 year ago

I think String.format(Locale.US, …) should be used everywhere instead of String.format(…) to make sure that numbers are correctly formatted. (Maybe Locale.ROOT should be used? But I think US is better)

I'd agree with @rfc2822. If any user needs to show time (or any numbers), applications can easily convert numbers to desired locale.