Open jberanek opened 7 years ago
I just get this problem when I run on a Windows server. It works properly on a Linux server. I'll investigate.
Original comment by: campbell-m
Campbell, is $lang_map_windows populated with an entry for "ro" ?
Original comment by: jberanek
Oh, yes, it looks like it is, I had a stale repo.
Original comment by: jberanek
Out of interest, is it only the "ț" or also other characters in dates like in "sâmbătă"?
Original comment by: jberanek
Well, I can also reproduce it on my test xampp installation:
Baza de date MySQL 10.1.10-MariaDB System Windows NT WINFRED 10.0 build 15063 (Windows 10) i586 Timp server 24.08.2017 08:41:56 Software server Apache/2.4.18 (Win32) OpenSSL/1.0.2e PHP/7.0.4 PHP 7.0.4
Original comment by: jberanek
As far as I can see this is a well known problem with CP1250 affecting the letters Ș/ș and Ț/ț. I'll do some investigation to see if Windows has any chance of supporting UTF8 these days. If not, the best thing I can think of is to put a hack in utf8_convert_from_locale() to look for CP1250 and then look for those characters and replace them with S/s and T/t.
Original comment by: campbell-m
It still amazes me how backward Windows still is in some ways!
Original comment by: jberanek
This could be the solution: http://php.net/manual/en/class.intldateformatter.php
Original comment by: jberanek
I'll take a look at it, but one issue that I've met before is that the Intl extension is not installed on all systems.
Original comment by: campbell-m
Here's PHP's "intl" extension and IntlDateFormatter outputting in UTF-8 on a Windows server:
Tuesday, February 9, 2016 marți, 9 februarie 2016
Original comment by: jberanek
OK, that looks good. I think I'll put in a hack for utf8_convert_from_locale() anyway in the short term as there'll always be that problem with CP1250. Then we need a slightly better general solution as well, which is probably one or both of
Original comment by: campbell-m
Using IntlDateFormatter is definitely a long-term fix, as it requires us changing all uses of date() and strftime() where we expect localised output
Original comment by: jberanek
This is proving trickier than I thought. I don't think the hack I was suggesting will work. As far as I can see strftime() is returning a string that contains the '?' character, rather than some other character that can be intercepted and converted into an acceptable substitute. I think (but I'm not 100% sure) that what is happening is that strftime is correctly producing a Romanian t-comma, but because the t-comma doesn't exist in CP1250 (only the incorrect t-cedilla) it's being converted to a '?' before we can do anything about it. I think. See this post.
I can't get Windows to accept a UTF-8 locale, so I think IntlDateFormatter is probably the way to go, except that it doesn't exist on all systems. However it may be possible to just define our own namespaced versions of date() and strftime() which look something like this:
function strftime()
{
if (!class_exist('IntlDateFormatter'))
{
// use the global function
return \strftime()
}
else
{
// do IntlDateFormatter stuff
}
}
I don't know: it could be that converting formats from one to the other is too difficult. Or we could write some kind of fallback for IntlDateFormatter if it doesn't exist, but that's a lot of work.
There is one even hackier solution in the short term - which is that if we know that the only day and month in Romanian that contains a t-comma or s-comma is Marți, and we know that our strftime formats don't contain any question marks, then we could, for Romanian on Windows, just convert the '?' to a t-comma because we know it must be Tuesday!
Original comment by: campbell-m
As a very short term workaround could you try replacing the function utf8_convert_from_locale() in language.inc with this code:
function utf8_convert_from_locale($string, $locale=NULL)
{
global $windows_locale, $winlocale_codepage_map, $server_os;
if ($server_os == "windows")
{
if (!isset($locale))
{
$locale = $windows_locale;
}
if (array_key_exists($locale, $winlocale_codepage_map))
{
$codepage = $winlocale_codepage_map[$locale];
$string = iconv($codepage, "utf-8", $string);
// Horrible hack to get round the fact that Windows strftime cannot handle
// Romanian Tuesday
if (($codepage == 'CP1250') && ($locale == 'rom'))
{
$string = str_replace('?', 'È›', $string);
}
}
}
else if ($server_os == "aix")
{
$string = utf8_convert_aix($string, $locale);
}
return $string;
}
Note that this assumes your file is saved as an ANSI file (which it is by default). If you save it as a UTF-8 file, then replace 'È›' by ' ț'.
This is a very inelegant hack, but may solve the problem in the short term.
To help us try and come up with a more general, longer term solution, could you please run the attached test program and let us know what the output is. The test program will tell us what extensions are supported on your server and will produce output something like this:
gettext: yes Locale: no IntlDateFormatter: no
Original comment by: campbell-m
Attachments: https://sourceforge.net/p/mrbs/support-requests/_discuss/thread/7d052dd7/f3c0/attachment/test2.php
Ah yes, that's the article I read - interesting how he's trying to perfect the localisation. In that, he's further forward than MRBS is now, with our $strftime_format config variables - though at least we do allow for user customisation.
If PHP did add further date/time localisation I imagine it would only be for the latest PHP 7 releases, so we'd not be able to take advantage of them for quite some time. I happen to think that with the support in PHP 5.3 for IntlDateFormatter we could manage an OS-independent way of formatting date/time strings by just using a number of date/time format specifiers like we do now for strftime, but with IntDateFormatter format strings.
What the article points out is that you can't assume things about date/time formats like all languages will want a format exactly like "DATE TIME" .
Original comment by: jberanek
Of course, there would be no need to change the uses of date(), as that's not locale-aware anyway. The only one would be strftime(), although there are quite a few uses of that.
Original comment by: campbell-m
Hi guys
I want to display Marți or Marti instead of Mar?i It seems like the letter "ț" is not recognized. Is there any way I can do this?
Thanks
Reported by: *anonymous
Original Ticket: mrbs/support-requests/1298
Attachments: https://sourceforge.net/p/mrbs/support-requests/1298/attachment/Marti.png