meeting-room-booking-system / mrbs-code

MRBS application code
Other
127 stars 62 forks source link

Romanian letter "ț" #2014

Open jberanek opened 7 years ago

jberanek commented 7 years ago

Hi guys

I want to display Marți or Marti instead of Mar?i It seems like the letter "ț" is not recognized. Is there any way I can do this?

Thanks

Reported by: *anonymous

Original Ticket: mrbs/support-requests/1298

Attachments: https://sourceforge.net/p/mrbs/support-requests/1298/attachment/Marti.png

jberanek commented 7 years ago

I just get this problem when I run on a Windows server. It works properly on a Linux server. I'll investigate.

Original comment by: campbell-m

jberanek commented 7 years ago

Campbell, is $lang_map_windows populated with an entry for "ro" ?

Original comment by: jberanek

jberanek commented 7 years ago

Oh, yes, it looks like it is, I had a stale repo.

Original comment by: jberanek

jberanek commented 7 years ago

Out of interest, is it only the "ț" or also other characters in dates like in "sâmbătă"?

Original comment by: jberanek

jberanek commented 7 years ago

Well, I can also reproduce it on my test xampp installation:

Baza de date MySQL 10.1.10-MariaDB System Windows NT WINFRED 10.0 build 15063 (Windows 10) i586 Timp server 24.08.2017 08:41:56 Software server Apache/2.4.18 (Win32) OpenSSL/1.0.2e PHP/7.0.4 PHP 7.0.4

Original comment by: jberanek

jberanek commented 7 years ago

As far as I can see this is a well known problem with CP1250 affecting the letters Ș/ș and Ț/ț. I'll do some investigation to see if Windows has any chance of supporting UTF8 these days. If not, the best thing I can think of is to put a hack in utf8_convert_from_locale() to look for CP1250 and then look for those characters and replace them with S/s and T/t.

Original comment by: campbell-m

jberanek commented 7 years ago

It still amazes me how backward Windows still is in some ways!

Original comment by: jberanek

jberanek commented 7 years ago

This could be the solution: http://php.net/manual/en/class.intldateformatter.php

Original comment by: jberanek

jberanek commented 7 years ago

I'll take a look at it, but one issue that I've met before is that the Intl extension is not installed on all systems.

Original comment by: campbell-m

jberanek commented 7 years ago

Here's PHP's "intl" extension and IntlDateFormatter outputting in UTF-8 on a Windows server:

Tuesday, February 9, 2016 marți, 9 februarie 2016

Original comment by: jberanek

jberanek commented 7 years ago

OK, that looks good. I think I'll put in a hack for utf8_convert_from_locale() anyway in the short term as there'll always be that problem with CP1250. Then we need a slightly better general solution as well, which is probably one or both of

Original comment by: campbell-m

jberanek commented 7 years ago

Using IntlDateFormatter is definitely a long-term fix, as it requires us changing all uses of date() and strftime() where we expect localised output

Original comment by: jberanek

jberanek commented 7 years ago

This is proving trickier than I thought. I don't think the hack I was suggesting will work. As far as I can see strftime() is returning a string that contains the '?' character, rather than some other character that can be intercepted and converted into an acceptable substitute. I think (but I'm not 100% sure) that what is happening is that strftime is correctly producing a Romanian t-comma, but because the t-comma doesn't exist in CP1250 (only the incorrect t-cedilla) it's being converted to a '?' before we can do anything about it. I think. See this post.

I can't get Windows to accept a UTF-8 locale, so I think IntlDateFormatter is probably the way to go, except that it doesn't exist on all systems. However it may be possible to just define our own namespaced versions of date() and strftime() which look something like this:

function strftime()
{
  if (!class_exist('IntlDateFormatter'))
  {
    // use the global function
    return \strftime()
  }
  else
  {
    // do IntlDateFormatter stuff
  }
}

I don't know: it could be that converting formats from one to the other is too difficult. Or we could write some kind of fallback for IntlDateFormatter if it doesn't exist, but that's a lot of work.

There is one even hackier solution in the short term - which is that if we know that the only day and month in Romanian that contains a t-comma or s-comma is Marți, and we know that our strftime formats don't contain any question marks, then we could, for Romanian on Windows, just convert the '?' to a t-comma because we know it must be Tuesday!

Original comment by: campbell-m

jberanek commented 7 years ago

As a very short term workaround could you try replacing the function utf8_convert_from_locale() in language.inc with this code:

function utf8_convert_from_locale($string, $locale=NULL)
{
  global $windows_locale, $winlocale_codepage_map, $server_os;

  if ($server_os == "windows")
  {
    if (!isset($locale))
    {
      $locale = $windows_locale;
    }
    if (array_key_exists($locale, $winlocale_codepage_map))
    {
      $codepage = $winlocale_codepage_map[$locale];
      $string = iconv($codepage, "utf-8", $string);
      // Horrible hack to get round the fact that Windows strftime cannot handle
      // Romanian Tuesday
      if (($codepage == 'CP1250') && ($locale == 'rom'))
      {
        $string = str_replace('?', 'È›', $string);
      }
    }
  }
  else if ($server_os == "aix")
  {
    $string = utf8_convert_aix($string, $locale);
  }
  return $string;
}

Note that this assumes your file is saved as an ANSI file (which it is by default). If you save it as a UTF-8 file, then replace 'È›' by ' ț'.

This is a very inelegant hack, but may solve the problem in the short term.

To help us try and come up with a more general, longer term solution, could you please run the attached test program and let us know what the output is. The test program will tell us what extensions are supported on your server and will produce output something like this:

gettext: yes Locale: no IntlDateFormatter: no

Original comment by: campbell-m

Attachments: https://sourceforge.net/p/mrbs/support-requests/_discuss/thread/7d052dd7/f3c0/attachment/test2.php

jberanek commented 7 years ago

Interesting article here.

Original comment by: campbell-m

jberanek commented 7 years ago

Ah yes, that's the article I read - interesting how he's trying to perfect the localisation. In that, he's further forward than MRBS is now, with our $strftime_format config variables - though at least we do allow for user customisation.

If PHP did add further date/time localisation I imagine it would only be for the latest PHP 7 releases, so we'd not be able to take advantage of them for quite some time. I happen to think that with the support in PHP 5.3 for IntlDateFormatter we could manage an OS-independent way of formatting date/time strings by just using a number of date/time format specifiers like we do now for strftime, but with IntDateFormatter format strings.

What the article points out is that you can't assume things about date/time formats like all languages will want a format exactly like "DATE TIME" .

Original comment by: jberanek

jberanek commented 7 years ago

Of course, there would be no need to change the uses of date(), as that's not locale-aware anyway. The only one would be strftime(), although there are quite a few uses of that.

Original comment by: campbell-m