pradosoft / prado

Prado - Component Framework for PHP
Other
187 stars 70 forks source link

#864 TUtfConverter takes language for national localization and TEscCharsetConversion for converting ESC charsets #901

Closed belisoful closed 1 year ago

belisoful commented 1 year ago
toUTF8($string, $from, $lang = null)
fromUTF8($string, $to, $lang = null)

these functions add the $lang parameter for setting the PHP setLocale(LC_CTYPE, $lang) because various countries/languages have slightly different character sets despite being the same encoding. eg ASCII has different national standards.

This is the most comprehensive list of ESC character set encodings i was able to find in reasonable time.

belisoful commented 1 year ago

While I was sleeping on this PR, it came to me that if $lang is null, to check the $from/$to for a '.', and then pull the encoding and lang apart.

Basically, the encoding can have ".fr" appended to it to designate the French language of the encoding.

belisoful commented 1 year ago

I have a IPTC class for reading and writing IPTC that will be using this updated TUtfConverter and Esc charset converter at some point..

BTW, PHP has weak support for reading and writing IPTC. The class i have does much better and encodes the various constants for field names/ids that would otherwise be up to each implementation.

ctrlaltca commented 1 year ago

While I was sleeping on this PR, it came to me that if $lang is null, to check the $from/$to for a '.', and then pull the encoding and lang apart.

Looking at the output of iconv -l i can see that some charset already contains a dot in their name. It looks like they are all quite exotic:

ANSI_X3.4-1968, ANSI_X3.4-1986, ANSI_X3.4, ANSI_X3.110-1983, ANSI_X3.110 CSA_Z243.4-1985-1, CSA_Z243.4-1985-2, CSA_Z243.419851, CSA_Z243.419852 ISO_646.IRV:1991, JUS_I.B1.002, MSZ_7795.3, T.61-8BIT, T.61, T.618BIT, TIS620.2529-1, TIS620.2533-0 I guess we can live without these.. LGTM