Closed pine3ree closed 8 months ago
Thank you for bringing this up.
So the IntlDateFormatter::LONG
pattern for zh_Hans_HK
seems to be y年M月d日 z ah:mm:ss
.
Then the FormDateTimeSelect
tries to split that pattern here:
The last ([ \-,.:\/]+)
is what we need to focus on: it considers few basic chars as the split sequence, but not those chinese chars inside y年M月d日
.
We could try to fix this by replacing any non-ASCII char with a space, so the preg_split
behaves as expected again, what do you think?
Hello @Slamdunk ,
I believe that those non-ASCII characters (mostly present in asian languages, I remember Japanese uses them kanji
too) are meaningful delimiters and they should be captured as such (like the 'at'
delimiter for en_US
) so that are displayed later on before/after the corresponding "select" element.
They usually mean "day", "month", "year",...and so on. I am not sure that simply surrounding them with single quotes before parsing will work.
Different locale also use them differently:
zh_*
locales use pictograms down to IntlDateFormatter::MEDIUM
ja_*
locales use pictograms down to IntlDateFormatter::FULL
Anyway, we should either (1) add tests and make the helpers work for all supported locales, or (2) limit the supported locales and add a generic simple alternative for those we do not (won't or can't) support.
kind regards
PS I guess that after JavaScript selectors appeared many years ago, very few developers are nowadays using "select" element groups for "datetime" related inputs.
(2) limit the supported locales and add a generic simple alternative for those we do not (won't or can't) support.
That sounds fair enough to me: would you like to propose such change?
PS As a quick fix (what I added in my plates functions for laminas-form)
if (!isset($result['month'])) {
$result['month'] = 'M';
}
and similar for other missed captures
(edit) not related to your answer, I saw it after posting
btw, this string, wrapping pictograms inside single quotes, y'年'M'月'd'日' z ah:mm:ss
is parsed correctly
Premise: I deleted all previous comments, since I believe to have found a simpler generic regular common expression for splitting the intl date-time pattern, in expanded format:
const SPLIT_REGEX = <<<EOR
/
(
[^a-z']*
(?:
\('[^']+'\)
|
'[^']+'
|
[^a-z']+
)+
[^a-z']*
)+
/xiu
EOR;
together with the modified method:
function getPattern(string $locale, IntlDateFormatter $intl = null): string
{
$intl = new IntlDateFormatter($this->getLocale(), $this->dateType, $this->timeType);
$pattern = $intl->getPattern();
// Remove time zone format character present in various forms
$pattern = str_replace(['(z)', '[z]', 'z ', ' z ' , ' z'], ' ', $pattern);
// Remove time meridiem character present in various forms
$pattern = str_replace(['(a)', '[a]', ' a ' , 'a ', ' a'], ' ', $pattern);
// Cleanup extra inner spaces
$pattern = preg_replace('/\s+/', ' ', $pattern);
// Remove trailing commas from previous operations
$pattern = trim($pattern, ", \t\n\r\0\x0B");
return $pattern;
}
The regex works like this:
ref: https://www.unicode.org/reports/tr35/tr35-dates.html#Date_Field_Symbol_Table ref: https://unicode-org.github.io/icu/userguide/format_parse/datetime/#date-field-symbol-table
the splitting string may have:
('a')
, ('e')
'at'
non alfabetic-ascii chars include standard date time separators like /
, -
, :
, etc and all unicode symbols for year, month, day etc
result: https://onlinephp.io/c/97ff2
The
AbstractFormDateSelect::parsePattern()
method https://github.com/laminas/laminas-form/blob/a41bb38a759590141e14fe907c107f80c7c3569b/src/View/Helper/AbstractFormDateSelect.php#L74seems unable to handle less common locales. Added a failing test with
zh_Hans_HK
locale. In this case themonth
part is not extracted (https://github.com/laminas/laminas-form/pull/229)